Query Optimization - 1
Query Optimization - 1
COUNT(*) vs COUNT(1)
SELECT
COUNT(*) AS num_of_rows
FROM bigquery-public-data.san_francisco.bikeshare_trips;
SELECT
COUNT(1) AS num_of_rows
FROM bigquery-public-data.san_francisco.bikeshare_trips
Tip 1: Only select columns that you really need
SELECT *
FROM bigquery-public-data.san_francisco.bikeshare_trips
SELECT
trip_id,
start_station_name,
end_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips
Always filter your data according to requirements
SELECT *
FROM bigquery-public-data.san_francisco.bikeshare_trips
WHERE EXTRACT(year from start_date) = 2015
Tip: Read lesser amount of data
How long bike trips usually are? Calculate the average duration of one-way bike trips in any one
of the cities in SF.
SELECT
start_station_name,
end_station_name,
AVG(duration_sec) AS avg_time
FROM bigquery-public-data.san_francisco.bikeshare_trips
WHERE start_station_name != end_station_name
GROUP BY start_station_name, end_station_name
Tip 4: Use GROUP BY instead of DISTINCT
Unique list of stations
SELECT
DISTINCT
start_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips
SELECT
start_station_name
FROM bigquery-public-data.san_francisco.bikeshare_trips
GROUP BY start_station_name
Tip 5: Order your JOINs from larger table to smaller tables
Find the number of bikes and docks currently available at all stations in SF so that proper
restocking can be done.
SELECT
t2.station_id,
t2.name,
t1.bikes_available,
t1.docks_available
FROM `bigquery-public-data.san_francisco.bikeshare_status` AS t1
JOIN `bigquery-public-data.san_francisco.bikeshare_stations` AS t2
ON t2.station_id = t1.station_id
WHERE t2.landmark = "San Francisco"