REYES WS 5 Cleaning Data in Python Nos. 1 6 PDF
REYES WS 5 Cleaning Data in Python Nos. 1 6 PDF
REYES, ALLEAH F. B2
1 Date:
Import dob_job_application_filings.csv into a DataFrame named data. Write codes to determine the first five
columns of the dataset. How many records are in the dataset?
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
data.head()
data[["Doc #","Borough","House #","Street Name","Block"]]
data.shape
Answer to Question
3959 records or rows of attributes are in the datasheet.
2 Date:
Using the same DataFrame, write codes to determine the names and frequency counts of the boroughs.
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
data.info()
data.Borough.value_counts(dropna=False)
Answer to Question
NAMES: FREQUENCY OF BOROUGHS:
Page 1 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2
3 Date:
Refer to the same DataFrame. Which of the following statements is False?
A. The mean of Existing Height is 94.022809.
B. There are 12846 entries in the DataFrame.
B
C. The standard deviation of Street Frontage is 11.874080.
D. The maximum of Proposed Height is 4200.
4 Date:
Visualize the Existing Zoning Sqftcolumn.Use the log scale for both x and y axes.
Code
import pandas as pd
data = pd.read_csv("dob_job_application_filings.csv", index_col=0)
import matplotlib.pyplot as plt
plt.xscale("log")
plt.yscale("log")
data["Existing Zoning Sqft"].plot("hist")
plt.show()
Output
Page 2 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2
5 Date:
Import auto.csv. Visualize how hpvaries with weight.
Code
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("auto.csv", index_col=0)
data.boxplot(column="hp", by="weight")
plt.show()
Output
Page 3 of 4
Name APPLIED DATA SCIENCE
REYES, ALLEAH F. B2
6 Date:
Using the auto dataset, compare the number of cylinders(cyl)across the origin. Write the code here and submit a copy of
the output through Cardinal Edge Worksheet Submission.
Code
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("auto.csv", index_col=0)
data.boxplot(column="cyl", by="origin")
plt.show()
Output
Page 4 of 4