Streamlined Data Ingestion With Pandas Chapter3
Streamlined Data Ingestion With Pandas Chapter3
databases
S T R E A M L I N E D D ATA I N G E S T I O N W I T H PA N D A S
Amany Mahfouz
Instructor
Relational Databases
Data about entities is organized into tables
Each row or record is an instance of an entity
Support more data, multiple simultaneous users, and data quality controls
2. Query database
Arguments
query : String containing SQL query to run or table to load
print(weather.head())
[5 rows x 13 columns]
Amany Mahfouz
Instructor
SELECTing Columns
SELECT [column names] FROM [table name];
Example:
SELECT date, tavg
FROM weather;
SELECT [column_names]
FROM [table_name]
WHERE [condition];
Example:
SELECT *
FROM weather
WHERE tmax > 32;
Example:
/* Get records about incidents in Brooklyn */
SELECT *
FROM hpd311calls
WHERE borough = 'BROOKLYN';
['BROOKLYN']
(2016, 8)
(10684, 8)
Amany Mahfouz
Instructor
Getting DISTINCT Values
Get unique values for one or more columns with SELECT DISTINCT
Syntax:
SELECT DISTINCT [column names] FROM [table];
AVG
MAX
MIN
COUNT
COUNT
Get number of rows that meet query conditions
SELECT COUNT(*) FROM [table_name];
borough COUNT(*)
0 BRONX 2016
1 BROOKLYN 2702
2 MANHATTAN 1413
3 QUEENS 808
4 STATEN ISLAND 178
Amany Mahfouz
Instructor
Keys
Database records have unique identifiers, or keys
Default join only returns records whose key values appear in both tables
Make sure join keys are the same data type or nothing will match
FROM
JOIN
WHERE
GROUP BY