How to Save Scraping Data to MySQL Database in Python
How to Save Scraping Data to MySQL Database in Python
Fatadev.com — Many people ask about how to save data scraping results to MySQL with Python. So, here are 2 easy steps to scrape data and save the results in a MySQL table.
As is known, MySQL is the most widely used database format today and the most popular when it comes to small to medium databases.
Here are the steps.
The first step is to prepare a database and tables in MySQL with PhpMyadmin. I named this database “detik”, with a table called “popular” and several fields in the table are id, title, clean_category, link and link_image.
The second step to save the scraping results is to do the following 3 easy things.
– Connect to the database
– Prepare data in list/json form
– Add data and save it to a MySQL table
The first part to add to the scraping code is about connecting to MySql. Here is the code
# Step 1 : Connection
cnx = mysql.connector.connect(user='root', password='', host='127.0.0.1', database='detik')
cursor = cnx.cursor()
output_list = []
The meaning and function of the code above is
cnx = mysql.connector.connect(user=’root’, password=”, host=’127.0.0.1′, database=’sec’)
This line connects to the MySQL database. It uses the following connection information:
user=’root’
This is the MySQL database username used to log in.
password=”
This is the password for that user (in this example, the password is blank, but in real use, you’ll want to replace it with a secure password).
host=’127.0.0.1′
This is the IP address or hostname of the MySQL database server. In this example, the server is at localhost (127.0.0.1), but you will change it to your server address.
database=’detik’
This is the name of the database to be used.
cursor = cnx.cursor()
This line creates a cursor object, which will be used to execute SQL statements on the database. With a cursor object, you can send SQL commands and retrieve the results from the database.
output_list = []
This line simply defines the output_list variable as an empty list. This variable may be used later in the code to store results retrieved from the database.
Then the second part is preparing the data in list/json form with the following code.
# Step 2 : change data to type data list / json
item_data = {
'title': title,
'clean_category': clean_category,
'link': link,
'link_image': link_image
}
output_list.append(item_data)
The meaning of the code above is
item_data
This is a variable which is a dictionary (dictionary) containing several pairs of keys and values.
title, clean_category, link, and link_image
This is a variable that is used as a value for each key in the item_data dictionary. These values may come from other variables in your code that are not visible in the code snippet you provided.
‘title’, ‘clean_category’, ‘link’, and ‘link_image’
These are the keys in the item_data dictionary, which are used to identify each value associated with a particular attribute. For example, ‘title’ is used to store titles, ‘clean_category’ is used to store categories that have been cleaned (perhaps of special characters or formatting), ‘link’ is used to store links, and ‘link_image’ is used to store associated image links.
output_list
This is a variable that may be a pre-existing list or created somewhere else in your code. This list is used to store each item_data dictionary that has been created.
output_list.append(item_data)
This is the command used to add the item_data dictionary to the output_list list. Thus, each time this code is executed, a new dictionary with associated data will be added to the output_list list.
The third part is adding data and saving it to a MySQL table. This code is included in the scraping loop
# step 3 : add data and saving to mysql part 2
add_data = (
"INSERT INTO popular (title, clean_category, link, link_image) VALUES (%(title)s, %(clean_category)s, %(link)s, %(link_image)s )")
cursor.execute(add_data, item_data)
cnx.commit()
The meaning of the code above is
add_data
This is a string containing SQL commands to add data to the ‘popular’ table. This SQL command uses placeholders in the format %(column_name)s, where column_name is the name of the column in the ‘popular’ table. These placeholders will be populated with the appropriate values when the SQL command is executed.
cursor.execute(add_data, item_data)
This is a statement that takes a SQL command from the add_data variable and executes it by filling the appropriate values from item_data into the appropriate placeholders. In other words, it will send the actual SQL command to the database with the given values.
cnx.commit()
This is the statement used to submit changes made to the database. In this context, the commit() command is used after the SQL command is executed to save data changes to the database.
So the entire code that can be created is as follows
If we run the code above, it will automatically save the scraping results to the MySQL database in phpMyAdmin.
Those are 2 easy steps in storing data scraping results in SQL or MySQL form. Hope it is useful.