Creating a simple database in Python
When applying to an intern position I was given mini project to complete. It comprised of two tasks (1) Analysis of an AirBnB dataset. (2) A Data engineering task that required me to create a SQLite database using the structure provided and then load the data into it.
Creating ‘The Wine Mart’
Python contains the sqlite3 module API that allows you to easily create and query SQLite databases.
Here are the methods that I used to create database:
Step 1: Create a database file and create a connection to it.
# connect to the database
conn = sqlite3.connect(“./the_wine_mart.sqlite”)
c = conn.cursor() # The database will be saved in the location where your ‘py’ file is saved
The Cursor of SQLite (and similar libraries) is used to execute statements to communicate with the SQLite database.
Using the methods of it you can execute SQL statements, fetch data from the result sets, call procedures.
You can create a Cursor object using the cursor() method of the Connection object/class.
Step 1: Execute SQL commands to create the table and the relationship between them using c.execute(..)
# Create table - DimWinery
c.execute('''CREATE TABLE DimWinery
([Winery_Id] INTEGER PRIMARY KEY,
[Winery_Name] TEXT)''')# Create table - DimGeography
c.execute('''CREATE TABLE DimGeography
([Geography_Id] INTEGER PRIMARY KEY,
[Country] TEXT,
[Province] TEXT,
[County] TEXT)''')# Create table - DimVariety
c.execute('''CREATE TABLE DimVariety
([Variety_Id] INTEGER PRIMARY KEY, [Variety] TEXT)''')# Create table - FactWine
c.execute('''CREATE TABLE FactWine
([Wine_Id] INTEGER PRIMARY KEY,
[Title] TEXT,
[Winery_Id] INTEGER,
[Geography_Id] INTEGER,
[Variety_Id] INTEGER,
[Points] INTEGER,
[Price] REAL,
[Vintage] TEXT,
FOREIGN KEY (Winery_Id) REFERENCES DimWinery (Winery_Id),
FOREIGN KEY (Geography_Id) REFERENCES DimGeography (Geography_Id),
FOREIGN KEY (Variety_Id) REFERENCES DimVariety (Variety_Id))''')
Step 3: Commit the changes to the database otherwise once you close the connection they will not persist and then close the connection using conn.commit() & conn.close()
conn.commit()
conn.close()
This creates the database, in part 2 of this blog post I’ll show how I loaded the data from a .csv file, clean the data and then load into the database that was just created.