0% found this document useful (0 votes)
0 views

AI in HC 4

The document outlines an experiment aimed at analyzing birth rates in the United States using Python and data from the CDC. It details the process of data cleaning, visualization of birth trends by decade and day of the week, and highlights that male births consistently outnumber female births. Additionally, it explores average births by date of the year, revealing interesting trends in birth rates throughout the year.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

AI in HC 4

The document outlines an experiment aimed at analyzing birth rates in the United States using Python and data from the CDC. It details the process of data cleaning, visualization of birth trends by decade and day of the week, and highlights that male births consistently outnumber female births. Additionally, it explores average births by date of the year, revealing interesting trends in birth rates throughout the year.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Experiment - 4

Aim:- Birth Rate Analysis using Python.

Theory:- Let’s take a look at the freely available data on births


in the United States, provided by the Centers for Disease Control
(CDC). This data can be found at births.csv

import pandas as pd
births = pd.read_csv("births.csv") print(births.head())
births['day'].fillna(0, inplace=True) births['day'] =
births['day'].astype(int)

Output:-

births['decade'] = 10 * (births['year'] // 10)


births.pivot_table('births', index='decade', columns='gender',
aggfunc='sum')
print(births.head())

We immediately see that male births outnumber female births in


every decade. To see this trend a bit more clearly, we can use the
built-in plotting tools in Pandas to visualize the total number of
births by year :

import matplotlib.pyplot as plt


import seaborn as sns
sns.set()
birth_decade = births.pivot_table('births', index='decade',
columns='gender', aggfunc='sum')
birth_decade.plot()
plt.ylabel("Total births per year")
plt.show()

Output:-

Further data exploration:


There are a few interesting features we can pull out of this
dataset using the Pandas tools. We must start by cleaning the
data a bit, removing outliers caused by mistyped dates or
missing values. One easy way to remove these all at once is to
cut outliers, we’ll do this via a robust sigma-clipping operation:
import numpy as np
quartiles = np.percentile(births['births'], [25, 50, 75])
mu = quartiles[1]
sig = 0.74 * (quartiles[2] - quartiles[0])

This final line is a robust estimate of the sample mean, where the
0.74 comes from the interquartile range of a Gaussian
distribution. With this we can use the query() method to filter out
rows with births outside these values:

births = births.query('(births > @mu - 5 * @sig) & (births < @mu


+ 5 * @sig)')
births['day'] = births['day'].astype(int)
births.index = pd.to_datetime(10000 * births.year + 100 *
births.month + births.day, format='%Y%m%d')
births['dayofweek'] = births.index.dayofweek

Using this we can plot births by weekday for several decades:

births.pivot_table('births', index='dayofweek',
columns='decade', aggfunc='mean').plot()
plt.gca().set_xticklabels(['Mon', 'Tues', 'Wed', 'Thurs', 'Fri',
'Sat', 'Sun'])
plt.ylabel('mean births by day');
plt.sho w(
)

Output:-
Apparently, births are slightly less common on weekends than on
weekdays! Note that the 1990s and 2000s are missing because
the CDC data contains only the month of birth starting in 1989.

1
births_month = births.pivot_table('births', [births.index.month,
births.index.day])
print(births_month.head())
births_month.index = [pd.datetime(2012, month, day)for (month,
day) in births_month.index]
print(births_month.head())

Output:-

Focusing on the month and day only, we now have a time series
reflecting the average number of births by date of the year. From
this, we can use the plot method to plot the data. It reveals some
interesting trends:

fig, ax = plt.subplots(figsize=(12, 4))


births_month.plot(ax=ax)
plt.show()
Output:-

You might also like