0% found this document useful (0 votes)
189 views

Hands-On Lab - Importing Data in R

Uploaded by

vlad vlad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views

Hands-On Lab - Importing Data in R

Uploaded by

vlad vlad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

download.file("https://cf-courses-data.s3.us.

cloud-object-
storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/movies-db.xls",
destfile="movies-db.xls") download.file("https://cf-courses-data.s3.us.cloud-object-
storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/movies-db.csv",
destfile="movies-db.csv")

In [6]: my_data <- read.csv("movies-db.csv")


my_data

A data.frame: 30 × 8
name year length_min genre average_rating cost_millions foreign age_restriction

<fct> <int> <int> <fct> <dbl> <dbl> <int> <int>

Toy Story 1995 81 Animation 8.3 30.0 0 0

Akira 1998 125 Animation 8.1 10.4 1 14

The Breakfast Club 1985 97 Drama 7.9 1.0 0 14

The Artist 2011 100 Romance 8.0 15.0 1 12

Modern Times 1936 87 Comedy 8.6 1.5 0 10

Fight Club 1999 139 Drama 8.9 63.0 0 18

City of God 2002 130 Crime 8.7 3.3 1 18

The Untouchables 1987 119 Drama 7.9 25.0 0 14

Star Wars Episode IV 1977 121 Action 8.7 11.0 0 10

American Beauty 1999 122 Drama 8.4 15.0 0 14

Room 2015 118 Drama 8.3 13.0 1 14

Dr. Strangelove 1964 94 Comedy 8.5 1.8 1 10

The Ring 1998 95 Horror 7.3 1.2 1 18

Monty Python and the


1975 91 Comedy 8.3 0.4 1 18
Holy Grail

High School Musical 2006 98 Comedy 5.2 4.2 0 0

Shaun of the Dead 2004 99 Horror 8.0 6.1 1 18

Taxi Driver 1976 113 Crime 8.3 1.3 1 14

The Shawshank
1994 142 Crime 9.3 25.0 0 16
Redemption

Interstellar 2014 169 Adventure 8.6 165.0 0 10

Casino 1995 178 Biography 8.2 50.0 0 18

The Goodfellas 1990 145 Biography 8.7 25.0 0 14

Blue is the Warmest


2013 179 Romance 7.8 4.5 1 18
Colour

Black Swan 2010 108 Thriller 8.0 13.0 0 16

Back to the Future 1985 116 Sci-fi 8.5 19.0 0 0

The Wave 2008 107 Thriller 7.6 5.5 1 16

Whiplash 2014 106 Drama 8.5 3.3 1 12

The Grand Hotel


2014 100 Crime 8.1 25.5 0 14
Budapest

Jumanji 1995 104 Fantasy 6.9 65.0 0 12


The Eternal Sunshine of 2004 108 Drama 8.3 20.0 0 14
the Spotless Mind

Chicago 2002 113 Comedy 7.2 45.0 0 12

In [3]:

Error in read.excel("movies-db.xls"): could not find function "read.excel"


Traceback:

In [7]: head(my_data)

A data.frame: 6 × 8
name year length_min genre average_rating cost_millions foreign age_restriction

<fct> <int> <int> <fct> <dbl> <dbl> <int> <int>

1 Toy Story 1995 81 Animation 8.3 30.0 0 0

2 Akira 1998 125 Animation 8.1 10.4 1 14

3 The Breakfast Club 1985 97 Drama 7.9 1.0 0 14

4 The Artist 2011 100 Romance 8.0 15.0 1 12

5 Modern Times 1936 87 Comedy 8.6 1.5 0 10

6 Fight Club 1999 139 Drama 8.9 63.0 0 18

In [8]: str(my_data)

'data.frame': 30 obs. of 8 variables:


$ name : Factor w/ 30 levels "Akira","American Beauty",..: 29 1 21 20 14 10 8
27 18 2 ...
$ year : int 1995 1998 1985 2011 1936 1999 2002 1987 1977 1999 ...
$ length_min : int 81 125 97 100 87 139 130 119 121 122 ...
$ genre : Factor w/ 12 levels "Action","Adventure",..: 3 3 7 10 5 7 6 7 1 7
...
$ average_rating : num 8.3 8.1 7.9 8 8.6 8.9 8.7 7.9 8.7 8.4 ...
$ cost_millions : num 30 10.4 1 15 1.5 63 3.3 25 11 15 ...
$ foreign : int 0 1 0 1 0 0 1 0 0 0 ...
$ age_restriction: int 0 14 14 12 10 18 18 14 10 14 ...

In [9]: library(readxl)

In [10]: my_excel_data <- read_excel("movies-db.xls")

In [11]: str(my_excel_data)

tibble [30 × 8] (S3: tbl_df/tbl/data.frame)


$ name : chr [1:30] "Toy Story" "Akira" "The Breakfast Club" "The Artist" ...
$ year : num [1:30] 1995 1998 1985 2011 1936 ...
$ length_min : num [1:30] 81 125 97 100 87 139 130 119 121 122 ...
$ genre : chr [1:30] "Animation" "Animation" "Drama" "Romance" ...
$ average_rating : num [1:30] 8.3 8.1 7.9 8 8.6 8.9 8.7 7.9 8.7 8.4 ...
$ cost_millions : num [1:30] 30 10.4 1 15 1.5 63 3.3 25 11 15 ...
$ foreign : num [1:30] 0 1 0 1 0 0 1 0 0 0 ...
$ age_restriction: num [1:30] 0 14 14 12 10 18 18 14 10 14 ...

In [12]: my_data['name']

A data.frame: 30 × 1
name

<fct>

Toy Story
Akira

The Breakfast Club

The Artist

Modern Times

Fight Club

City of God

The Untouchables

Star Wars Episode IV

American Beauty

Room

Dr. Strangelove

The Ring

Monty Python and the Holy Grail

High School Musical

Shaun of the Dead

Taxi Driver

The Shawshank Redemption

Interstellar

Casino

The Goodfellas

Blue is the Warmest Colour

Black Swan

Back to the Future

The Wave

Whiplash

The Grand Hotel Budapest

Jumanji

The Eternal Sunshine of the Spotless Mind

Chicago

In [13]: my_data$name

Toy Story · Akira · The Breakfast Club · The Artist · Modern Times · Fight Club · City of God ·
The Untouchables · Star Wars Episode IV · American Beauty · Room · Dr. Strangelove · The Ring ·
Monty Python and the Holy Grail · High School Musical · Shaun of the Dead · Taxi Driver ·
The Shawshank Redemption · Interstellar · Casino · The Goodfellas · Blue is the Warmest Colour ·
Black Swan · Back to the Future · The Wave · Whiplash · The Grand Hotel Budapest · Jumanji ·
The Eternal Sunshine of the Spotless Mind · Chicago

Levels:

In [14]: my_data[["name"]]

Toy Story · Akira · The Breakfast Club · The Artist · Modern Times · Fight Club · City of God ·
The Untouchables · Star Wars Episode IV · American Beauty · Room · Dr. Strangelove · The Ring ·
Monty Python and the Holy Grail · High School Musical · Shaun of the Dead · Taxi Driver ·
The Shawshank Redemption · Interstellar · Casino · The Goodfellas · Blue is the Warmest Colour ·
Black Swan · Back to the Future · The Wave · Whiplash · The Grand Hotel Budapest · Jumanji ·
The Eternal Sunshine of the Spotless Mind · Chicago

Levels:

In [15]: my_data[1, c("name","length_min")]

A data.frame: 1 × 2
name length_min

<fct> <int>

1 Toy Story 81

In [16]: data()

Data sets
A data.frame: 104 × 3
Package Item Title

<chr> <chr> <chr>

datasets AirPassengers Monthly Airline Passenger Numbers 1949-1960

datasets BJsales Sales Data with Leading Indicator

datasets BJsales.lead (BJsales) Sales Data with Leading Indicator

datasets BOD Biochemical Oxygen Demand

datasets CO2 Carbon Dioxide Uptake in Grass Plants

datasets ChickWeight Weight versus age of chicks on different diets

datasets DNase Elisa assay of DNase

datasets EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998

datasets Formaldehyde Determination of Formaldehyde

datasets HairEyeColor Hair and Eye Color of Statistics Students

datasets Harman23.cor Harman Example 2.3

datasets Harman74.cor Harman Example 7.4

datasets Indometh Pharmacokinetics of Indomethacin

datasets InsectSprays Effectiveness of Insect Sprays

datasets JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share

datasets LakeHuron Level of Lake Huron 1875-1972

datasets LifeCycleSavings Intercountry Life-Cycle Savings Data

datasets Loblolly Growth of Loblolly pine trees

datasets Nile Flow of the River Nile

datasets Orange Growth of Orange Trees

datasets OrchardSprays Potency of Orchard Sprays

datasets PlantGrowth Results from an Experiment on Plant Growth


datasets Puromycin Reaction Velocity of an Enzymatic Reaction

datasets Seatbelts Road Casualties in Great Britain 1969-84

datasets Theoph Pharmacokinetics of Theophylline

datasets Titanic Survival of passengers on the Titanic

datasets ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs

datasets UCBAdmissions Student Admissions at UC Berkeley

datasets UKDriverDeaths Road Casualties in Great Britain 1969-84

datasets UKgas UK Quarterly Gas Consumption

datasets USAccDeaths Accidental Deaths in the US 1973-1978

datasets USArrests Violent Crime Rates by US State

datasets USJudgeRatings Lawyers' Ratings of State Judges in the US Superior Court

datasets USPersonalExpenditure Personal Expenditure Data

datasets UScitiesD Distances Between European Cities and Between US Cities

datasets VADeaths Death Rates in Virginia (1940)

datasets WWWusage Internet Usage per Minute

datasets WorldPhones The World's Telephones

datasets ability.cov Ability and Intelligence Tests

datasets airmiles Passenger Miles on Commercial US Airlines, 1937-1960

datasets airquality New York Air Quality Measurements

datasets anscombe Anscombe's Quartet of 'Identical' Simple Linear Regressions

datasets attenu The Joyner-Boore Attenuation Data

datasets attitude The Chatterjee-Price Attitude Data

datasets austres Quarterly Time Series of the Number of Australian Residents

datasets beaver1 (beavers) Body Temperature Series of Two Beavers

datasets beaver2 (beavers) Body Temperature Series of Two Beavers

datasets cars Speed and Stopping Distances of Cars

datasets chickwts Chicken Weights by Feed Type

datasets co2 Mauna Loa Atmospheric CO2 Concentration

datasets crimtab Student's 3000 Criminals Data

datasets discoveries Yearly Numbers of Important Discoveries

datasets esoph Smoking, Alcohol and (O)esophageal Cancer

datasets euro Conversion Rates of Euro Currencies

datasets euro.cross (euro) Conversion Rates of Euro Currencies

datasets eurodist Distances Between European Cities and Between US Cities

datasets faithful Old Faithful Geyser Data

datasets fdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK

datasets freeny Freeny's Revenue Data

datasets freeny.x (freeny) Freeny's Revenue Data

datasets freeny.y (freeny) Freeny's Revenue Data

datasets infert Infertility after Spontaneous and Induced Abortion

datasets iris Edgar Anderson's Iris Data


datasets iris3 Edgar Anderson's Iris Data

datasets islands Areas of the World's Major Landmasses

datasets ldeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK

datasets lh Luteinizing Hormone in Blood Samples

datasets longley Longley's Economic Regression Data

datasets lynx Annual Canadian Lynx trappings 1821-1934

datasets mdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK

datasets morley Michelson Speed of Light Data

datasets mtcars Motor Trend Car Road Tests

datasets nhtemp Average Yearly Temperatures in New Haven

datasets nottem Average Monthly Temperatures at Nottingham, 1920-1939

datasets npk Classical N, P, K Factorial Experiment

datasets occupationalStatus Occupational Status of Fathers and their Sons

datasets precip Annual Precipitation in US Cities

datasets presidents Quarterly Approval Ratings of US Presidents

datasets pressure Vapor Pressure of Mercury as a Function of Temperature

datasets quakes Locations of Earthquakes off Fiji

datasets randu Random Numbers from Congruential Generator RANDU

datasets rivers Lengths of Major North American Rivers

datasets rock Measurements on Petroleum Rock Samples

datasets sleep Student's Sleep Data

datasets stack.loss (stackloss) Brownlee's Stack Loss Plant Data

datasets stack.x (stackloss) Brownlee's Stack Loss Plant Data

datasets stackloss Brownlee's Stack Loss Plant Data

datasets state.abb (state) US State Facts and Figures

datasets state.area (state) US State Facts and Figures

datasets state.center (state) US State Facts and Figures

datasets state.division (state) US State Facts and Figures

datasets state.name (state) US State Facts and Figures

datasets state.region (state) US State Facts and Figures

datasets state.x77 (state) US State Facts and Figures

datasets sunspot.month Monthly Sunspot Data, from 1749 to "Present"

datasets sunspot.year Yearly Sunspot Data, 1700-1988

datasets sunspots Monthly Sunspot Numbers, 1749-1983

datasets swiss Swiss Fertility and Socioeconomic Indicators (1888) Data

datasets treering Yearly Treering Data, -6000-1979

datasets trees Girth, Height and Volume for Black Cherry Trees

datasets uspop Populations Recorded by the US Census

datasets volcano Topographic Information on Auckland's Maunga Whau Volcano

datasets warpbreaks The Number of Breaks in Yarn during Weaving


datasets women Average Heights and Weights for American Women

Use ‘data(package = .packages(all.available = TRUE))’ to list the data sets in all *available* packages.

In [17]: help(women)

women {datasets} R Documentation

Average Heights and Weights for American Women


Description
This data set gives the average heights and weights for American women aged 30–39.

Usage
women

Format
A data frame with 15 observations on 2 variables.

[,1] height numeric Height (in)

[,2] weight numeric Weight (lbs)

Details
The data set appears to have been taken from the American Society of Actuaries Build and Blood Pressure
Study for some (unknown to us) earlier year.

The World Almanac notes: “The figures represent weights in ordinary indoor clothing and shoes, and heights
with shoes”.

Source
The World Almanac and Book of Facts, 1975.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(graphics)
plot(women, xlab = "Height (in)", ylab = "Weight (lb)",
main = "women data: American women aged 30-39")

[Package datasets version 3.5.1 ]

In [18]: women
A data.frame: 15
×2
height weight

<dbl> <dbl>

58 115

59 117

60 120

61 123

62 126

63 129

64 132

65 135

66 139

67 142

68 146

69 150

70 154

71 159

72 164

In [20]: summary(my_data)

name year length_min genre


Akira : 1 Min. :1936 Min. : 81.00 Drama :7
American Beauty : 1 1st Qu.:1988 1st Qu.: 99.25 Comedy :5
Back to the Future : 1 Median :1998 Median :110.50 Crime :4
Black Swan : 1 Mean :1996 Mean :116.80 Animation:2
Blue is the Warmest Colour: 1 3rd Qu.:2008 3rd Qu.:124.25 Biography:2
Casino : 1 Max. :2015 Max. :179.00 Horror :2
(Other) :24 (Other) :8
average_rating cost_millions foreign age_restriction
Min. :5.200 Min. : 0.400 Min. :0.0 Min. : 0.00
1st Qu.:7.925 1st Qu.: 3.525 1st Qu.:0.0 1st Qu.:12.00
Median :8.300 Median : 13.000 Median :0.0 Median :14.00
Mean :8.103 Mean : 22.300 Mean :0.4 Mean :12.93
3rd Qu.:8.500 3rd Qu.: 25.000 3rd Qu.:1.0 3rd Qu.:16.00
Max. :9.300 Max. :165.000 Max. :1.0 Max. :18.00

In [ ]:

You might also like