100% found this document useful (1 vote)
79 views

REYES WS 5 Cleaning Data in Python Nos. 1 6 PDF

This document contains a worksheet on cleaning data in Python. It provides 6 questions and code snippets to work with various CSV datasets. The first question loads a CSV called "dob_job_application_filings.csv" and determines it contains 3959 records. Subsequent questions analyze borough names and counts, check statements about the data, visualize existing zoning square footage with log scales, plot horsepower by weight using another CSV, and compare cylinders by origin.

Uploaded by

Eliezer Nitro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
79 views

REYES WS 5 Cleaning Data in Python Nos. 1 6 PDF

This document contains a worksheet on cleaning data in Python. It provides 6 questions and code snippets to work with various CSV datasets. The first question loads a CSV called "dob_job_application_filings.csv" and determines it contains 3959 records. Subsequent questions analyze borough names and counts, check statements about the data, visualize existing zoning square footage with log scales, plot horsepower by weight using another CSV, and compare cylinders by origin.

Uploaded by

Eliezer Nitro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Name APPLIED DATA SCIENCE

REYES, ALLEAH F. B2

WORKSHEET #5: CLEANING DATA IN PYTHON

1 Date:
Import dob_job_application_filings.csv into a DataFrame named data. Write codes to determine the first five
columns of the dataset. How many records are in the dataset?
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
data.head()
data[["Doc #","Borough","House #","Street Name","Block"]]
data.shape
Answer to Question
3959 records or rows of attributes are in the datasheet.

2 Date:
Using the same DataFrame, write codes to determine the names and frequency counts of the boroughs.
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
data.info()
data.Borough.value_counts(dropna=False)

Answer to Question
NAMES: FREQUENCY OF BOROUGHS:

Page 1 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2

WORKSHEET #5: CLEANING DATA IN PYTHON

3 Date:
Refer to the same DataFrame. Which of the following statements is False?
A. The mean of Existing Height is 94.022809.
B. There are 12846 entries in the DataFrame.
B
C. The standard deviation of Street Frontage is 11.874080.
D. The maximum of Proposed Height is 4200.

4 Date:
Visualize the Existing Zoning Sqftcolumn.Use the log scale for both x and y axes.
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
import matplotlib.pyplot as plt
plt.xscale("log")
plt.yscale("log")
data["Existing Zoning Sqft"].plot("hist")
plt.show()

Output

Page 2 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2

WORKSHEET #5: CLEANING DATA IN PYTHON

5 Date:
Import auto.csv. Visualize how hpvaries with weight.
Code
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("auto.csv", index_col=0)
data.boxplot(column="hp", by="weight")
plt.show()

Output

Page 3 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2

WORKSHEET #5: CLEANING DATA IN PYTHON

6 Date:
Using the auto dataset, compare the number of cylinders(cyl)across the origin. Write the code here and submit a copy of
the output through Cardinal Edge Worksheet Submission.
Code

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("auto.csv", index_col=0)
data.boxplot(column="cyl", by="origin")
plt.show()

Output

Page 4 of 4

You might also like