Pandas Practice
Pandas Practice
In [15]: df.head(5)
Out[15]: order_id quantity item_name choice_description item_price
In [16]: df.tail(7)
Out[16]: order_id quantity item_name choice_description item_price
Chips and
4616 1832 1 NaN $4.45
Guacamole
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 1/18
2/28/2019 Let's Do Together_pandas
In [17]: df.info()#
# OR
df.shape[0]
# 4622 observations
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id 4622 non-null int64
quantity 4622 non-null int64
item_name 4622 non-null object
choice_description 3376 non-null object
item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB
Out[17]: 4622
In [19]: df.columns
In [21]: df.index
Out[21]: RangeIndex(start=0, stop=4622, step=1)
In [22]: #Which was the most ordered item? and How many items were ordered?
In [23]: c = df.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
item_name
In [24]: #What was the most ordered item in the choice_description column?
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 2/18
2/28/2019 Let's Do Together_pandas
In [25]: c = df.groupby('choice_description').sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
Out[25]: order_id quantity
choice_description
In [28]: #How much was the revenue for the period in the dataset?
In [31]: #print a data frame with only two columns item_name and item_price
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 3/18
2/28/2019 Let's Do Together_pandas
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 4/18
2/28/2019 Let's Do Together_pandas
item_name item_price
1 Izze 3.39
40 Chips 2.15
In [33]: #What was the quantity of the most expensive item ordered?
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 5/18
2/28/2019 Let's Do Together_pandas
Veggie Salad
186 83 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl
Veggie Salad
295 128 1 [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... 11.25
Bowl
Veggie Salad
455 195 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl
Veggie Salad
960 394 1 [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... 8.75
Bowl
Veggie Salad
1316 536 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
Bowl
Veggie Salad
1884 760 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl
Veggie Salad
2156 869 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl
Veggie Salad
2223 896 1 [Roasted Chili Corn Salsa, Fajita Vegetables] 8.75
Bowl
Veggie Salad
2269 913 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
Bowl
Veggie Salad
2683 1066 1 [Roasted Chili Corn Salsa, [Fajita Vegetables,... 8.75
Bowl
Veggie Salad
3223 1289 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl
Veggie Salad
4109 1646 1 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.25
Bowl
Veggie Salad
4201 1677 1 [Fresh Tomato Salsa, [Fajita Vegetables, Black... 11.25
Bowl
Veggie Salad
4261 1700 1 [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
Bowl
Veggie Salad
4541 1805 1 [Tomatillo Green Chili Salsa, [Fajita Vegetabl... 8.75
Bowl
Veggie Salad
4573 1818 1 [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... 8.75
Bowl
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 6/18
2/28/2019 Let's Do Together_pandas
In [16]: df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/
df.head()
Out[16]: country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 AS
2 Algeria 25 0 14 0.7 AF
In [18]: df.groupby('continent').beer_servings.mean()
Out[18]: continent
AF 61.471698
AS 37.045455
EU 193.777778
OC 89.687500
SA 175.083333
Name: beer_servings, dtype: float64
In [19]: #For each continent print the statistics for wine consumption.
In [20]: df.groupby('continent').wine_servings.describe()
Out[20]: count mean std min 25% 50% 75% max
continent
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 7/18
2/28/2019 Let's Do Together_pandas
Out[21]: Year Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assa
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
Year 55 non-null datetime64[ns]
Population 55 non-null int64
Total 55 non-null int64
Violent 55 non-null int64
Property 55 non-null int64
Murder 55 non-null int64
Forcible_Rape 55 non-null int64
Robbery 55 non-null int64
Aggravated_assault 55 non-null int64
Burglary 55 non-null int64
Larceny_Theft 55 non-null int64
Vehicle_Theft 55 non-null int64
dtypes: datetime64[ns](1), int64(11)
memory usage: 5.2 KB
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 8/18
2/28/2019 Let's Do Together_pandas
Year
1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01
1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01
1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01
1963-
188483000 4109500 316970 3792500 8640 17650 116470 174210
01-01
1964-
191141000 4564600 364220 4200400 9360 21420 130390 203050
01-01
# Uses resample to get the max value only for the "Population" column
population = crime['Population'].resample('10AS').max()
crimes
Out[27]: Population Total Violent Property Murder Forcible_Rape Robbery Agg
Year
1960-
201385000.0 49295900.0 4134930.0 45160900.0 106180.0 236720.0 1633510.0
01-01
1970-
220099000.0 100991600.0 9607930.0 91383800.0 192230.0 554570.0 4159020.0
01-01
1980-
248239000.0 131123369.0 14074328.0 117048900.0 206439.0 865639.0 5383109.0
01-01
1990-
272690813.0 136582146.0 17527048.0 119053499.0 211664.0 998827.0 5748930.0
01-01
2000-
307006550.0 115012044.0 13968056.0 100944369.0 163068.0 922499.0 4230366.0
01-01
2010-
318857056.0 50167967.0 6072017.0 44095950.0 72867.0 421059.0 1749809.0
01-01
2020-
NaN NaN NaN NaN NaN NaN NaN
01-01
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 9/18
2/28/2019 Let's Do Together_pandas
In [28]: crime.head()
Year
1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01
1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01
1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01
1963-
188483000 4109500 316970 3792500 8640 17650 116470 174210
01-01
1964-
191141000 4564600 364220 4200400 9360 21420 130390 203050
01-01
df = crime
In [32]: df.iloc[:3]
Out[32]: Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assault
Year
1960-
179323175 3384200 288460 3095700 9110 17190 107840 154320
01-01
1961-
182992000 3488000 289390 3198600 8740 17220 106670 156760
01-01
1962-
185771000 3752200 301510 3450700 8530 17550 110860 164570
01-01
In [33]: #Select just the 'Murder' and 'Robbery' columns from the DataFrame df and print f
Year
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 10/18
2/28/2019 Let's Do Together_pandas
Year
In [38]: #Select the data in rows [3, 4, 8] and in columns ['Murder', 'Robbery']
Year
In [45]: #Select only the rows where the number of murder is greater than 24,000
Year
1991-
252177000 14872900 1911770 12961100 24700 106590 687730 10927
01-01
1993-
257908000 14144800 1926020 12218800 24530 106010 659870 11356
01-01
In [47]: #Select the rows the murder is between 20k and 24k (inclusive)
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 11/18
2/28/2019 Let's Do Together_pandas
Year
1974-
211392000 10253400 974720 9278700 20710 55400 442400 4562
01-01
1975-
213124000 11292400 1039710 10252700 20510 56090 470500 4926
01-01
1979-
220099000 12249500 1208030 11041500 21460 76390 480700 6294
01-01
1980-
225349264 13408300 1344520 12063700 23040 82990 565840 6726
01-01
1981-
229146000 13423800 1361820 12061900 22520 82500 592910 6639
01-01
1982-
231534000 12974400 1322390 11652000 21010 78770 553130 6694
01-01
1986-
240132887 13211869 1489169 11722700 20613 91459 542775 8343
01-01
1987-
242282918 13508700 1483999 12024700 20096 91110 517704 8550
01-01
1988-
245807000 13923100 1566220 12356900 20680 92490 542970 9100
01-01
1989-
248239000 14251400 1646040 12605400 21500 94500 578330 9517
01-01
1990-
248709873 14475600 1820130 12655500 23440 102560 639270 10548
01-01
1992-
255082000 14438200 1932270 12505900 23760 109060 672480 11269
01-01
1994-
260341000 13989500 1857670 12131900 23330 102220 618950 11131
01-01
1995-
262755000 13862700 1798790 12063900 21610 97470 580510 10992
01-01
In [52]: #Calculate the mean murder for each different year in df.
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 12/18
2/28/2019 Let's Do Together_pandas
In [53]: df.groupby('Year')['Murder'].mean()
Out[53]: Year
1960-01-01 9110
1961-01-01 8740
1962-01-01 8530
1963-01-01 8640
1964-01-01 9360
1965-01-01 9960
1966-01-01 11040
1967-01-01 12240
1968-01-01 13800
1969-01-01 14760
1970-01-01 16000
1971-01-01 17780
1972-01-01 18670
1973-01-01 19640
1974-01-01 20710
1975-01-01 20510
1976-01-01 18780
1977-01-01 19120
1978-01-01 19560
1979-01-01 21460
1980-01-01 23040
1981-01-01 22520
1982-01-01 21010
1983-01-01 19310
1984-01-01 18690
1985-01-01 18980
1986-01-01 20613
1987-01-01 20096
1988-01-01 20680
1989-01-01 21500
1990-01-01 23440
1991-01-01 24700
1992-01-01 23760
1993-01-01 24530
1994-01-01 23330
1995-01-01 21610
1996-01-01 19650
1997-01-01 18208
1998-01-01 16914
1999-01-01 15522
2000-01-01 15586
2001-01-01 16037
2002-01-01 16229
2003-01-01 16528
2004-01-01 16148
2005-01-01 16740
2006-01-01 17030
2007-01-01 16929
2008-01-01 16442
2009-01-01 15399
2010-01-01 14772
2011-01-01 14661
2012-01-01 14866
2013-01-01 14319
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 13/18
2/28/2019 Let's Do Together_pandas
2014-01-01 14249
Name: Murder, dtype: int64
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 14/18
2/28/2019 Let's Do Together_pandas
Year
1991-
252177000 14872900 1911770 12961100 24700 106590 687730 10927
01-01
1993-
257908000 14144800 1926020 12218800 24530 106010 659870 11356
01-01
1992-
255082000 14438200 1932270 12505900 23760 109060 672480 11269
01-01
1990-
248709873 14475600 1820130 12655500 23440 102560 639270 10548
01-01
1994-
260341000 13989500 1857670 12131900 23330 102220 618950 11131
01-01
1980-
225349264 13408300 1344520 12063700 23040 82990 565840 6726
01-01
1981-
229146000 13423800 1361820 12061900 22520 82500 592910 6639
01-01
1995-
262755000 13862700 1798790 12063900 21610 97470 580510 10992
01-01
1989-
248239000 14251400 1646040 12605400 21500 94500 578330 9517
01-01
1979-
220099000 12249500 1208030 11041500 21460 76390 480700 6294
01-01
1982-
231534000 12974400 1322390 11652000 21010 78770 553130 6694
01-01
1974-
211392000 10253400 974720 9278700 20710 55400 442400 4562
01-01
1988-
245807000 13923100 1566220 12356900 20680 92490 542970 9100
01-01
1986-
240132887 13211869 1489169 11722700 20613 91459 542775 8343
01-01
1975-
213124000 11292400 1039710 10252700 20510 56090 470500 4926
01-01
1987-
242282918 13508700 1483999 12024700 20096 91110 517704 8550
01-01
1996-
265228572 13493863 1688540 11805300 19650 96250 535590 10370
01-01
1973-
209851000 8718100 875910 7842200 19640 51400 384220 4206
01-01
1978-
218059000 11209000 1085550 10123400 19560 67610 426930 5714
01-01
1983-
233981000 12108600 1258090 10850500 19310 78920 506570 6532
01-01
1977-
216332000 10984500 1029580 9955000 19120 63500 412610 5343
01-01
1985-
238740000 12431400 1328800 11102600 18980 88670 497870 7232
01-01
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 15/18
2/28/2019 Let's Do Together_pandas
Year
1976-
214659000 11349700 1004210 10345500 18780 57080 427810 5005
01-01
1984-
236158000 11881800 1273280 10608500 18690 84230 485010 6853
01-01
1972-
208230000 8248800 834900 7413900 18670 46850 376290 3930
01-01
1997-
267637000 13194571 1634770 11558175 18208 96153 498534 10232
01-01
1971-
206212000 8588200 816500 7771700 17780 42260 387700 3687
01-01
2006-
299398484 11401511 1418043 9983568 17030 92757 447403 8608
01-01
2007-
301621157 11251828 1408337 9843481 16929 90427 445125 8558
01-01
1998-
270296000 12475634 1531044 10944590 16914 93103 446625 9744
01-01
2005-
296507061 11565499 1390745 10174754 16740 94347 417438 8622
01-01
2003-
290690788 11826538 1383676 10442862 16528 93883 414235 8590
01-01
2008-
304374846 11160543 1392628 9767915 16442 90479 443574 8421
01-01
2002-
287973924 11878954 1423677 10455277 16229 95235 420806 8914
01-01
2004-
293656842 11679474 1360088 10319386 16148 95089 401470 8473
01-01
2001-
285317559 11876669 1439480 10437480 16037 90863 423557 9090
01-01
1970-
203235298 8098000 738820 7359200 16000 37990 349860 3349
01-01
2000-
281421906 11608072 1425486 10182586 15586 90178 408016 9117
01-01
1999-
272690813 11634378 1426044 10208334 15522 89411 409371 9117
01-01
2009-
307006550 10762956 1325896 9337060 15399 89241 408742 8125
01-01
2012-
313873685 10219059 1217067 9001992 14866 85141 355051 7620
01-01
2010-
309330219 10363873 1251248 9112625 14772 85593 369089 7818
01-01
1969-
201385000 7410900 661870 6749000 14760 37170 298850 3110
01-01
2011-
311587816 10258774 1206031 9052743 14661 84175 354772 7524
01-01
2013-
316497531 9850445 1199684 8650761 14319 82109 345095 7265
01-01
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 16/18
2/28/2019 Let's Do Together_pandas
Year
2014-
318857056 9475816 1197987 8277829 14249 84041 325802 7412
01-01
1968-
199399000 6720200 595010 6125200 13800 31670 262840 2867
01-01
1967-
197457000 5903400 499930 5403500 12240 27620 202910 2571
01-01
1966-
195576000 5223500 430180 4793300 11040 25820 157990 2353
01-01
1965-
193526000 4739400 387390 4352000 9960 23410 138690 2153
01-01
1964-
191141000 4564600 364220 4200400 9360 21420 130390 2030
01-01
1960-
179323175 3384200 288460 3095700 9110 17190 107840 1543
01-01
1961-
182992000 3488000 289390 3198600 8740 17220 106670 1567
01-01
1963-
188483000 4109500 316970 3792500 8640 17650 116470 1742
01-01
1962-
185771000 3752200 301510 3450700 8530 17550 110860 1645
01-01
In [60]: df = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/
In [61]: df.head()
In [62]: #For each cyl type and each number of gears, find the mean mileage.
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 17/18
2/28/2019 Let's Do Together_pandas
Out[65]: gear 3 4 5
cyl
In [ ]:
http://localhost:8888/notebooks/Let's%20Do%20Together_pandas.ipynb 18/18