0% found this document useful (0 votes)
81 views18 pages

Pandas Practice

The document analyzes a dataset from a Chipotle restaurant containing 4622 orders. It finds that the most ordered item was the Chicken Bowl with 761 orders. The most ordered choice_description was [Diet Coke] with 159 orders. It calculates the total revenue from the period as $39,237.02. It filters the dataset to show only unique item and quantity pairs and sorts the item prices from highest to lowest for single items.

Uploaded by

A.K.A KNT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views18 pages

Pandas Practice

The document analyzes a dataset from a Chipotle restaurant containing 4622 orders. It finds that the most ordered item was the Chicken Bowl with 761 orders. The most ordered choice_description was [Diet Coke] with 159 orders. It calculates the total revenue from the period as $39,237.02. It filters the dataset to show only unique item and quantity pairs and sorts the item prices from highest to lowest for single items.

Uploaded by

A.K.A KNT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2/28/2019 Let's Do Together_pandas

In [1]: import numpy as np


import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

In [6]: url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.ts

In [7]: # read the above file

In [14]: df = pd.read_csv(url, sep = '\t')

In [9]: #print first 5 and last 7 records

In [15]: df.head(5)
Out[15]: order_id quantity item_name choice_description item_price

0 1 1 Chips and Fresh Tomato Salsa NaN $2.39

1 1 1 Izze [Clementine] $3.39

2 1 1 Nantucket Nectar [Apple] $3.39

Chips and Tomatillo-Green


3 1 1 NaN $2.39
Chili Salsa

[Tomatillo-Red Chili Salsa (Hot), [Black


4 2 2 Chicken Bowl $16.98
Beans...

In [16]: df.tail(7)
Out[16]: order_id quantity item_name choice_description item_price

[Fresh Tomato Salsa, [Rice, Cheese, Sour


4615 1832 1 Chicken Soft Tacos $8.75
Cream]]

Chips and
4616 1832 1 NaN $4.45
Guacamole

[Fresh Tomato Salsa, [Rice, Black Beans, Sour


4617 1833 1 Steak Burrito $11.75
...

[Fresh Tomato Salsa, [Rice, Sour Cream,


4618 1833 1 Steak Burrito $11.75
Cheese...

[Fresh Tomato Salsa, [Fajita Vegetables,


4619 1834 1 Chicken Salad Bowl $11.25
Pinto...

[Fresh Tomato Salsa, [Fajita Vegetables,


4620 1834 1 Chicken Salad Bowl $8.75
Lettu...

[Fresh Tomato Salsa, [Fajita Vegetables,


4621 1834 1 Chicken Salad Bowl $8.75
Pinto...

In [12]: # print total records and type of variables

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 1/18
2/28/2019 Let's Do Together_pandas

In [17]: df.info()#

# OR

df.shape[0]
# 4622 observations

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id 4622 non-null int64
quantity 4622 non-null int64
item_name 4622 non-null object
choice_description 3376 non-null object
item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB

Out[17]: 4622

In [18]: #Print the name of all the columns.

In [19]: df.columns

Out[19]: Index(['order_id', 'quantity', 'item_name', 'choice_description',


'item_price'],
dtype='object')

In [20]: #How is the dataset indexed?

In [21]: df.index
Out[21]: RangeIndex(start=0, stop=4622, step=1)

In [22]: #Which was the most ordered item? and How many items were ordered?

In [23]: c = df.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)

Out[23]: order_id quantity

item_name

Chicken Bowl 713926 761

In [24]: #What was the most ordered item in the choice_description column?

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 2/18
2/28/2019 Let's Do Together_pandas

In [25]: c = df.groupby('choice_description').sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
Out[25]: order_id quantity

choice_description

[Diet Coke] 123455 159

In [26]: #Turn the item price into a float

In [27]: dollar = lambda x: float(x[1:-1])


df.item_price = df.item_price.apply(dollar)

In [28]: #How much was the revenue for the period in the dataset?

In [30]: revenue = (df['quantity']* df['item_price']).sum()

print('Revenue was: $' + str(np.round(revenue,2)))


Revenue was: $39237.02

In [31]: #print a data frame with only two columns item_name and item_price

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 3/18
2/28/2019 Let's Do Together_pandas

In [32]: # delete the duplicates in item_name and quantity


filtered = df.drop_duplicates(['item_name','quantity'])

# select only the products with quantity equals to 1


one_prod = filtered[filtered.quantity == 1]

# select only the item_name and item_price columns


price_per_item = one_prod[['item_name', 'item_price']]

# sort the values from the most to less expensive


price_per_item.sort_values(by = "item_price", ascending = False)
Out[32]: item_name item_price

606 Steak Salad Bowl 11.89

1229 Barbacoa Salad Bowl 11.89

1132 Carnitas Salad Bowl 11.89

7 Steak Burrito 11.75

168 Barbacoa Crispy Tacos 11.75

39 Barbacoa Bowl 11.75

738 Veggie Soft Tacos 11.25

186 Veggie Salad Bowl 11.25

62 Veggie Bowl 11.25

57 Veggie Burrito 11.25

250 Chicken Salad 10.98

5 Chicken Bowl 10.98

8 Steak Soft Tacos 9.25

554 Carnitas Crispy Tacos 9.25

237 Carnitas Soft Tacos 9.25

56 Barbacoa Soft Tacos 9.25

92 Steak Crispy Tacos 9.25

664 Steak Salad 8.99

54 Steak Bowl 8.99

3750 Carnitas Salad 8.99

21 Barbacoa Burrito 8.99

27 Carnitas Burrito 8.99

33 Carnitas Bowl 8.99

11 Chicken Crispy Tacos 8.75

12 Chicken Soft Tacos 8.75

44 Chicken Salad Bowl 8.75

1653 Veggie Crispy Tacos 8.49

16 Chicken Burrito 8.49

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 4/18
2/28/2019 Let's Do Together_pandas

item_name item_price

1694 Veggie Salad 8.49

1414 Salad 7.40

510 Burrito 7.40

520 Crispy Tacos 7.40

673 Bowl 7.40

298 6 Pack Soft Drink 6.49

10 Chips and Guacamole 4.45

1 Izze 3.39

2 Nantucket Nectar 3.39

674 Chips and Mild Fresh Tomato Salsa 3.00

111 Chips and Tomatillo Red Chili Salsa 2.95

233 Chips and Roasted Chili Corn Salsa 2.95

38 Chips and Tomatillo Green Chili Salsa 2.95

3 Chips and Tomatillo-Green Chili Salsa 2.39

300 Chips and Tomatillo-Red Chili Salsa 2.39

191 Chips and Roasted Chili-Corn Salsa 2.39

0 Chips and Fresh Tomato Salsa 2.39

40 Chips 2.15

6 Side of Chips 1.69

263 Canned Soft Drink 1.25

28 Canned Soda 1.09

34 Bottled Water 1.09

In [33]: #What was the quantity of the most expensive item ordered?

In [34]: df.sort_values(by = "item_price", ascending = False).head(1)

Out[34]: order_id quantity item_name choice_description item_price

3598 1443 15 Chips and Fresh Tomato Salsa NaN 44.25

In [35]: #How many times were a Veggie Salad Bowl ordered?

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 5/18
2/28/2019 Let's Do Together_pandas

In [36]: df[df.item_name == "Veggie Salad Bowl"]

Out[36]: order_id quantity item_name choice_description item_price

Veggie Salad
186 83 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl

Veggie Salad
295 128 1 [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... 11.25
Bowl

Veggie Salad
455 195 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl

Veggie Salad [Fresh Tomato Salsa, [Rice, Lettuce,


496 207 1 11.25
Bowl Guacamole...

Veggie Salad
960 394 1 [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... 8.75
Bowl

Veggie Salad
1316 536 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
Bowl

Veggie Salad
1884 760 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl

Veggie Salad
2156 869 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl

Veggie Salad
2223 896 1 [Roasted Chili Corn Salsa, Fajita Vegetables] 8.75
Bowl

Veggie Salad
2269 913 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
Bowl

Veggie Salad
2683 1066 1 [Roasted Chili Corn Salsa, [Fajita Vegetables,... 8.75
Bowl

Veggie Salad
3223 1289 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl

Veggie Salad [Fresh Tomato Salsa, [Rice, Black Beans,


3293 1321 1 8.75
Bowl Chees...

Veggie Salad
4109 1646 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl

Veggie Salad
4201 1677 1 [Fresh Tomato Salsa, [Fajita Vegetables, Black... 11.25
Bowl

Veggie Salad
4261 1700 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl

Veggie Salad
4541 1805 1 [Tomatillo Green Chili Salsa, [Fajita Vegetabl... 8.75
Bowl

Veggie Salad
4573 1818 1 [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... 8.75
Bowl

In [15]: import numpy as np


import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 6/18
2/28/2019 Let's Do Together_pandas

In [16]: df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/
df.head()
Out[16]: country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent

0 Afghanistan 0 0 0 0.0 AS

1 Albania 89 132 54 4.9 EU

2 Algeria 25 0 14 0.7 AF

3 Andorra 245 138 312 12.4 EU

4 Angola 217 57 45 5.9 AF

In [17]: #Which continent drinks more beer on average?

In [18]: df.groupby('continent').beer_servings.mean()
Out[18]: continent
AF 61.471698
AS 37.045455
EU 193.777778
OC 89.687500
SA 175.083333
Name: beer_servings, dtype: float64

In [19]: #For each continent print the statistics for wine consumption.

In [20]: df.groupby('continent').wine_servings.describe()
Out[20]: count mean std min 25% 50% 75% max

continent

AF 53.0 16.264151 38.846419 0.0 1.0 2.0 13.00 233.0

AS 44.0 9.068182 21.667034 0.0 0.0 1.0 8.00 123.0

EU 45.0 142.222222 97.421738 0.0 59.0 128.0 195.00 370.0

OC 16.0 35.625000 64.555790 0.0 1.0 8.5 23.25 212.0

SA 12.0 62.416667 88.620189 1.0 3.0 12.0 98.50 221.0

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 7/18
2/28/2019 Let's Do Together_pandas

In [21]: url = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Ap


crime = pd.read_csv(url)
crime.head()

Out[21]: Year Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

0 1960 179323175 3384200 288460 3095700 9110 17190 107840 1543

1 1961 182992000 3488000 289390 3198600 8740 17220 106670 1567

2 1962 185771000 3752200 301510 3450700 8530 17550 110860 1645

3 1963 188483000 4109500 316970 3792500 8640 17650 116470 1742

4 1964 191141000 4564600 364220 4200400 9360 21420 130390 2030

In [22]: #Convert the type of the column Year to datetime64

In [23]: crime.Year = pd.to_datetime(crime.Year, format='%Y')


crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
Year 55 non-null datetime64[ns]
Population 55 non-null int64
Total 55 non-null int64
Violent 55 non-null int64
Property 55 non-null int64
Murder 55 non-null int64
Forcible_Rape 55 non-null int64
Robbery 55 non-null int64
Aggravated_assault 55 non-null int64
Burglary 55 non-null int64
Larceny_Theft 55 non-null int64
Vehicle_Theft 55 non-null int64
dtypes: datetime64[ns](1), int64(11)
memory usage: 5.2 KB

In [24]: #Set the Year column as the index of the dataframe¶

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 8/18
2/28/2019 Let's Do Together_pandas

In [25]: crime = crime.set_index('Year', drop = True)


crime.head()
Out[25]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assault

Year

1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01

1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01

1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01

1963-
188483000 4109500 316970 3792500 8640 17650 116470 174210
01-01

1964-
191141000 4564600 364220 4200400 9360 21420 130390 203050
01-01

In [26]: #Group the year by decades and sum the values¶

In [27]: # Uses resample to sum each decade


crimes = crime.resample('10AS').sum()

# Uses resample to get the max value only for the "Population" column
population = crime['Population'].resample('10AS').max()

# Updating the "Population" column


crimes['Population'] = population

crimes
Out[27]: Population Total Violent Property Murder Forcible_Rape Robbery Agg

Year

1960-
201385000.0 49295900.0 4134930.0 45160900.0 106180.0 236720.0 1633510.0
01-01

1970-
220099000.0 100991600.0 9607930.0 91383800.0 192230.0 554570.0 4159020.0
01-01

1980-
248239000.0 131123369.0 14074328.0 117048900.0 206439.0 865639.0 5383109.0
01-01

1990-
272690813.0 136582146.0 17527048.0 119053499.0 211664.0 998827.0 5748930.0
01-01

2000-
307006550.0 115012044.0 13968056.0 100944369.0 163068.0 922499.0 4230366.0
01-01

2010-
318857056.0 50167967.0 6072017.0 44095950.0 72867.0 421059.0 1749809.0
01-01

2020-
NaN NaN NaN NaN NaN NaN NaN
01-01

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 9/18
2/28/2019 Let's Do Together_pandas

In [28]: crime.head()

Out[28]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assault

Year

1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01

1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01

1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01

1963-
188483000 4109500 316970 3792500 8640 17650 116470 174210
01-01

1964-
191141000 4564600 364220 4200400 9360 21420 130390 203050
01-01

In [31]: #Return the first 3 rows of the DataFrame df.

df = crime

In [32]: df.iloc[:3]
Out[32]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assault

Year

1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01

1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01

1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01

In [33]: #Select just the 'Murder' and 'Robbery' columns from the DataFrame df and print f

In [35]: df.loc[:, ['Murder', 'Robbery']].head()

Out[35]: Murder Robbery

Year

1960-01-01 9110 107840

1961-01-01 8740 106670

1962-01-01 8530 110860

1963-01-01 8640 116470

1964-01-01 9360 130390

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 10/18
2/28/2019 Let's Do Together_pandas

In [37]: df[['Murder', 'Robbery']].head()


Out[37]: Murder Robbery

Year

1960-01-01 9110 107840

1961-01-01 8740 106670

1962-01-01 8530 110860

1963-01-01 8640 116470

1964-01-01 9360 130390

In [38]: #Select the data in rows [3, 4, 8] and in columns ['Murder', 'Robbery']

In [39]: df.loc[df.index[[3, 4, 8]], ['Murder', 'Robbery']]


Out[39]: Murder Robbery

Year

1963-01-01 8640 116470

1964-01-01 9360 130390

1968-01-01 13800 262840

In [45]: #Select only the rows where the number of murder is greater than 24,000

In [46]: df[df['Murder'] > 24000]


Out[46]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

Year

1991-
252177000 14872900 1911770 12961100 24700 106590 687730 10927
01-01

1993-
257908000 14144800 1926020 12218800 24530 106010 659870 11356
01-01

In [47]: #Select the rows the murder is between 20k and 24k (inclusive)

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 11/18
2/28/2019 Let's Do Together_pandas

In [51]: df[df['Murder'].between(20000, 24000)]

Out[51]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

Year

1974-
211392000 10253400 974720 9278700 20710 55400 442400 4562
01-01

1975-
213124000 11292400 1039710 10252700 20510 56090 470500 4926
01-01

1979-
220099000 12249500 1208030 11041500 21460 76390 480700 6294
01-01

1980-
225349264 13408300 1344520 12063700 23040 82990 565840 6726
01-01

1981-
229146000 13423800 1361820 12061900 22520 82500 592910 6639
01-01

1982-
231534000 12974400 1322390 11652000 21010 78770 553130 6694
01-01

1986-
240132887 13211869 1489169 11722700 20613 91459 542775 8343
01-01

1987-
242282918 13508700 1483999 12024700 20096 91110 517704 8550
01-01

1988-
245807000 13923100 1566220 12356900 20680 92490 542970 9100
01-01

1989-
248239000 14251400 1646040 12605400 21500 94500 578330 9517
01-01

1990-
248709873 14475600 1820130 12655500 23440 102560 639270 10548
01-01

1992-
255082000 14438200 1932270 12505900 23760 109060 672480 11269
01-01

1994-
260341000 13989500 1857670 12131900 23330 102220 618950 11131
01-01

1995-
262755000 13862700 1798790 12063900 21610 97470 580510 10992
01-01

In [52]: #Calculate the mean murder for each different year in df.

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 12/18
2/28/2019 Let's Do Together_pandas

In [53]: df.groupby('Year')['Murder'].mean()

Out[53]: Year
1960-01-01 9110
1961-01-01 8740
1962-01-01 8530
1963-01-01 8640
1964-01-01 9360
1965-01-01 9960
1966-01-01 11040
1967-01-01 12240
1968-01-01 13800
1969-01-01 14760
1970-01-01 16000
1971-01-01 17780
1972-01-01 18670
1973-01-01 19640
1974-01-01 20710
1975-01-01 20510
1976-01-01 18780
1977-01-01 19120
1978-01-01 19560
1979-01-01 21460
1980-01-01 23040
1981-01-01 22520
1982-01-01 21010
1983-01-01 19310
1984-01-01 18690
1985-01-01 18980
1986-01-01 20613
1987-01-01 20096
1988-01-01 20680
1989-01-01 21500
1990-01-01 23440
1991-01-01 24700
1992-01-01 23760
1993-01-01 24530
1994-01-01 23330
1995-01-01 21610
1996-01-01 19650
1997-01-01 18208
1998-01-01 16914
1999-01-01 15522
2000-01-01 15586
2001-01-01 16037
2002-01-01 16229
2003-01-01 16528
2004-01-01 16148
2005-01-01 16740
2006-01-01 17030
2007-01-01 16929
2008-01-01 16442
2009-01-01 15399
2010-01-01 14772
2011-01-01 14661
2012-01-01 14866
2013-01-01 14319
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 13/18
2/28/2019 Let's Do Together_pandas

2014-01-01 14249
Name: Murder, dtype: int64

In [55]: #Sort df first by the values in the 'Murder' in decending order,


#then by the value in the 'Violent' column in ascending order.

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 14/18
2/28/2019 Let's Do Together_pandas

In [58]: df.sort_values(by=['Murder', 'Violent'], ascending=[False, True])


Out[58]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

Year

1991-
252177000 14872900 1911770 12961100 24700 106590 687730 10927
01-01

1993-
257908000 14144800 1926020 12218800 24530 106010 659870 11356
01-01

1992-
255082000 14438200 1932270 12505900 23760 109060 672480 11269
01-01

1990-
248709873 14475600 1820130 12655500 23440 102560 639270 10548
01-01

1994-
260341000 13989500 1857670 12131900 23330 102220 618950 11131
01-01

1980-
225349264 13408300 1344520 12063700 23040 82990 565840 6726
01-01

1981-
229146000 13423800 1361820 12061900 22520 82500 592910 6639
01-01

1995-
262755000 13862700 1798790 12063900 21610 97470 580510 10992
01-01

1989-
248239000 14251400 1646040 12605400 21500 94500 578330 9517
01-01

1979-
220099000 12249500 1208030 11041500 21460 76390 480700 6294
01-01

1982-
231534000 12974400 1322390 11652000 21010 78770 553130 6694
01-01

1974-
211392000 10253400 974720 9278700 20710 55400 442400 4562
01-01

1988-
245807000 13923100 1566220 12356900 20680 92490 542970 9100
01-01

1986-
240132887 13211869 1489169 11722700 20613 91459 542775 8343
01-01

1975-
213124000 11292400 1039710 10252700 20510 56090 470500 4926
01-01

1987-
242282918 13508700 1483999 12024700 20096 91110 517704 8550
01-01

1996-
265228572 13493863 1688540 11805300 19650 96250 535590 10370
01-01

1973-
209851000 8718100 875910 7842200 19640 51400 384220 4206
01-01

1978-
218059000 11209000 1085550 10123400 19560 67610 426930 5714
01-01

1983-
233981000 12108600 1258090 10850500 19310 78920 506570 6532
01-01

1977-
216332000 10984500 1029580 9955000 19120 63500 412610 5343
01-01

1985-
238740000 12431400 1328800 11102600 18980 88670 497870 7232
01-01

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 15/18
2/28/2019 Let's Do Together_pandas

Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

Year

1976-
214659000 11349700 1004210 10345500 18780 57080 427810 5005
01-01

1984-
236158000 11881800 1273280 10608500 18690 84230 485010 6853
01-01

1972-
208230000 8248800 834900 7413900 18670 46850 376290 3930
01-01

1997-
267637000 13194571 1634770 11558175 18208 96153 498534 10232
01-01

1971-
206212000 8588200 816500 7771700 17780 42260 387700 3687
01-01

2006-
299398484 11401511 1418043 9983568 17030 92757 447403 8608
01-01

2007-
301621157 11251828 1408337 9843481 16929 90427 445125 8558
01-01

1998-
270296000 12475634 1531044 10944590 16914 93103 446625 9744
01-01

2005-
296507061 11565499 1390745 10174754 16740 94347 417438 8622
01-01

2003-
290690788 11826538 1383676 10442862 16528 93883 414235 8590
01-01

2008-
304374846 11160543 1392628 9767915 16442 90479 443574 8421
01-01

2002-
287973924 11878954 1423677 10455277 16229 95235 420806 8914
01-01

2004-
293656842 11679474 1360088 10319386 16148 95089 401470 8473
01-01

2001-
285317559 11876669 1439480 10437480 16037 90863 423557 9090
01-01

1970-
203235298 8098000 738820 7359200 16000 37990 349860 3349
01-01

2000-
281421906 11608072 1425486 10182586 15586 90178 408016 9117
01-01

1999-
272690813 11634378 1426044 10208334 15522 89411 409371 9117
01-01

2009-
307006550 10762956 1325896 9337060 15399 89241 408742 8125
01-01

2012-
313873685 10219059 1217067 9001992 14866 85141 355051 7620
01-01

2010-
309330219 10363873 1251248 9112625 14772 85593 369089 7818
01-01

1969-
201385000 7410900 661870 6749000 14760 37170 298850 3110
01-01

2011-
311587816 10258774 1206031 9052743 14661 84175 354772 7524
01-01

2013-
316497531 9850445 1199684 8650761 14319 82109 345095 7265
01-01

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 16/18
2/28/2019 Let's Do Together_pandas

Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa

Year

2014-
318857056 9475816 1197987 8277829 14249 84041 325802 7412
01-01

1968-
199399000 6720200 595010 6125200 13800 31670 262840 2867
01-01

1967-
197457000 5903400 499930 5403500 12240 27620 202910 2571
01-01

1966-
195576000 5223500 430180 4793300 11040 25820 157990 2353
01-01

1965-
193526000 4739400 387390 4352000 9960 23410 138690 2153
01-01

1964-
191141000 4564600 364220 4200400 9360 21420 130390 2030
01-01

1960-
179323175 3384200 288460 3095700 9110 17190 107840 1543
01-01

1961-
182992000 3488000 289390 3198600 8740 17220 106670 1567
01-01

1963-
188483000 4109500 316970 3792500 8640 17650 116470 1742
01-01

1962-
185771000 3752200 301510 3450700 8530 17550 110860 1645
01-01

In [59]: #read the following data set


#https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datase

In [60]: df = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/

In [61]: df.head()

Out[61]: Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb

0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

In [62]: #For each cyl type and each number of gears, find the mean mileage.

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 17/18
2/28/2019 Let's Do Together_pandas

In [65]: df.pivot_table(index='cyl', columns='gear', values='mpg', aggfunc='mean')

Out[65]: gear 3 4 5

cyl

4 21.50 26.925 28.2

6 19.75 19.750 19.7

8 15.05 NaN 15.4

In [ ]:

http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 18/18

You might also like