BHCS 20B Introduction To R Programming Update Awaited
BHCS 20B Introduction To R Programming Update Awaited
Assessment Methods
Written tests, assignments, quizzes, presentations as announced by the instructor in the class.
Keywords
Android App Development, Activities, Fragments, User interfaces, Intents, Broadcast
sender/receivers, Services, Notifications, SQLite Database
Course Objective
This course introduces R, which is a popular statistical programming language. The course
covers data reading and its manipulation using R, which is widely used for data analysis
114
internationally. The course also covers different control structures and design of user-defined
functions. Loading, installing and building packages are covered.
Detailed Syllabus
Unit 1
Introduction: R interpreter, Introduction to major R data structures like vectors, matrices,
arrays, list and data frames, Control Structures, vectorized if and multiple selection, functions.
Unit 2
Installing, loading and using packages: Read/write data from/in files, extracting data from
web-sites, Clean data, Transform data by sorting, adding/removing new/existing columns,
centring, scaling and normalizing the data values, converting types of values, using string in-built
functions, Statistical analysis of data for summarizing and understanding data, Visualizing data
using scatter plot, line plot, bar chart, histogram and box plot
Unit 3
Designing GUI: Building interactive application and connecting it with database.
Unit 4
Building Packages.
Practical
115
a) simulate a sample of 100 random data points from a normal distribution with mean 100
and
standard deviation 5 and store the result in a vector.
b) visualize the vector created above using different plots.
c) test the hypothesis that the mean equals 100.
d) use wilcox test to test the hypothesis that mean equals 90.
Q2. Using the Algae data set from package DMwR to complete the following tasks.
a) create a graph that you find adequate to show the distribution of the values of algae a6.
b) show the distribution of the values of size 3.
c) check visually if oPO4 follows a normal distribution.
d) produce a graph that allows you to understand how the values of NO3 are distributed
across the sizes of river.
e) using a graph check if the distribution of algae a1 varies with the speed of the river.
f) visualize the relationship between the frequencies of algae a1 and a6. Give the
appropriate graph title, x-axis and y-axis title.
Q3. Read the file Coweeta.CSV and write an R script to do the following:
a) count the number of observations per species.
b) take a subset of the data including only those species with at least 10 observations.
c) make a scatter plot of biomass versus height, with the symbol colour varying by species,
and use filled squares for the symbols. Also add a title to the plot, in italics.
d) log-transform biomass, and redraw the plot.
Q4. The built-in data set mammals contain data on body weight versus brain weight. Write R
commands to:
a) Find the Pearson and Spearman correlation coefficients. Are they similar?
b) Plot the data using the plot command .
c) Plot the logarithm (log) of each variable and see if that makes a difference.
Q5. In the library MASS is a dataset UScereal which contains information about popular
breakfast cereals. Attach the data set and use different kinds of plots to investigate the following
relationships:
a) relationship between manufacturer and shelf
b) relationship between fat and vitamins
c) relationship between fat and shelf
d) relationship between carbohydrates and sugars
e) relationship between fibre and manufacturer
f) relationship between sodium and sugars
Q6. Write R script to:
a) Do two simulations of a binomial number with n = 100 and p = .5. Do you get the same
results each time? What is different? What is similar?
116
b) Do a simulation of the normal two times. Once with n = 10, µ = 10 and σ = 10, the other
with n = 10, µ = 100 and σ = 100. How are they different? How are they similar? Are
both approximately normal?
Q7. Create a database medicines that contains the details about medicines such as {manufacturer,
composition, price}. Create an interactive application using which the user can find an
alternative to a given medicine with the same composition.
Q8. Create a database songs that contains the fields {song_name, mood,
online_link_play_song}. Create an application where the mood of the user is given as input
and the list of songs corresponding to that mood appears as the output. The user can listen to
any song form the list via the online link given.
Mini project using data set of your choice from Open Data Portal (https://data.gov.in/) for the
following exercises
References
1. Cotton, R., Learning R: a step by step function guide to data analysis. 1st edition. O’reilly
Media Inc.
Additional Resources:
2. Gardener, M.(2017). Beginning R: The statistical programming language, WILEY.
3. Lawrence, M., & Verzani, J. (2016). Programming Graphical User Interfaces in R. CRC
press. (ebook)
Web Resources
https://jrnold.github.io/r4ds-exercise-solutions/index.html
https://www.r-project.org/
https://cran.r-project.org/
117
Tentative weekly teaching plan is as follows:
Week Content
1 R interpreter, Introduction to major R data structures like vectors, matrices, arrays, list
and data frames
3 User-defined functions
8 Exploring and summarizing data using statistical methods: mean, median, mode
10 Data visualization using Scatter Plot, line graph, histogram, barchart, boxplot
11 Designing GUI
Assessment Methods
Written tests, assignments, quizzes, presentations as announced by the instructor in the class
Keywords
R data structures, flow control, packages, functions
118