Chapter No. 1 " Starting Bioinformatics with R" In this package, you will find: A Biography of the author of the book A preview chapter from the book, Chapter NO.1 "Starting Bioinformatics with R" A synopsis of the books content Information on where to buy this book
About the Author Paurush Praveen Sinha has been working with R for the past seven years. An engineer by training, he got into the world of bioinformatics and R when he started working as a research assistant at the Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Germany. Later, during his doctorate, he developed and applied various machine learning approaches with the extensive use of R to analyze and infer from biological data. Besides R, he has experience in various other programming languages, which include J ava, C, and MATLAB. During his experience with R, he contributed to several existing R packages and is working on the release of some new packages that focus on machine learning and bioinformatics. In late 2013, he joined the Microsoft Research-University of Trento COSBI in Italy as a researcher. He uses R as the backend engine for developing various utilities and machine learning methods to address problems in bioinformatics. Successful work is a fruitful culmination of efforts by many people. I would like to hereby express my sincere gratitude to everyone who has played a role in making this effort a successful one. First and foremost, I wish to thank David Chiu and Chris Beeley for reviewing the book. Their feedback, in terms of criticism and comments, was significant in bringing improvements to the book and its content. I sincerely thank Kevin Colaco and Ruchita Bhansali at Packt Publishing for their effort as editors. Their cooperation was instrumental in bringing out the book. I appreciate and acknowledge Binny K. Babu and the rest of the team at Packt Publishing, who have been very professional, understanding, and helpful throughout the project. Finally, I would like to thank my parents, brother, and sister for their encouragement and appreciation and the pride they take in my work, despite of not being sure of what Im doing. I thank them all. I dedicate the work to Yashi, J ayita, and Ahaan.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Bioinformatics with R Cookbook In recent years, there have been significant advances in genomics and molecular biology techniques, giving rise to a data boom in the field. Interpreting this huge data in a systematic manner is a challenging task and requires the development of new computational tools, thus bringing an exciting, new perspective to areas such as statistical data analysis, data mining, and machine learning. R, which has been a favorite tool of statisticians, has become a widely used software tool in the bioinformatics community. This is mainly due to its flexibility, data handling and modeling capabilities, and most importantly, due to it being free of cost. R is a free and robust statistical programming environment. It is a powerful tool for statistics, statistical programming, and visualizations; it is prominently used for statistical analysis. It has evolved from S, developed by J ohn Chambers at Bell Labs, which is a birthplace of many programming languages including C. Ross Ihaka and Robert Gentleman developed R in the early 1990s. Roughly around the same time, bioinformatics was emerging as a scientific discipline because of the advent of technological innovations such as sequencing, high throughput screening, and microarrays that revolutionized biology. These techniques could generate the entire genomic sequence of organisms; microarrays could measure thousands of mRNAs, and so on. All this brought a paradigm shift in biology from a small data discipline to one big data discipline, which is continuing till date. The challenges posed by this data shoot-up initially compelled researchers to adopt whatever tools were available at their disposal. Till this time, R was in its initial days and was popular among statisticians. However, following the need and the competence of R during the late 90s (and the following decades), it started gaining popularity in the field of computational biology and bioinformatics. The structure of the R environment is a base program that provides basic programming functionalities. These functionalities can be extended with smaller specialized program modules called packages or libraries. This modular structure empowers R to unify most of the data analysis tasks in one program. Furthermore, as it is a command-line environment, the prerequisite programming skill is minimal; nevertheless, it requires some programming experience. This book presents various data analysis operations for bioinformatics and computational biology using R. With this book in hand, we will solve many interesting problems related to the analysis of biological data coming from different experiments. In almost every chapter, we have interesting visualizations that can be used to present the results. Now, let's look at a conceptual roadmap organization of the book.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
What This Book Covers Chapter 1, Starting Bioinformatics with R, marks the beginning of the book with some groundwork in R. The major topics include package installation, data handling, and manipulations. The chapter is further extended with some recipes for a literature search, which is usually the first step in any (especially biomedical) research. Chapter 2, Introduction to Bioconductor, presents some recipes to solve basic bioinformatics problems, especially the ones related to metadata in biology, with the packages available in Bioconductor. The chapter solves the issues related to ID conversions and functional enrichment of genes and proteins. Chapter 3, Sequence Analysis with R, mainly deals with the sequence data in terms of characters. The recipes cover the retrieval of sequence data, sequence alignment, and pattern search in the sequences. Chapter 4, Protein Structure Analysis with R, illustrates how to work with proteins at sequential and structural levels. Here, we cover important aspects and methods of protein bioinformatics, such as sequence and structure analysis. The recipes include protein sequence analysis, domain annotations, protein structural property analysis, and so on. Chapter 5, Analyzing Microarray Data with R, starts with recipes to read and load the microarray data, followed by its preprocessing, filtering, mining, and functional enrichment. Finally, we introduce a co-expression network as a way to map relations among genes in this chapter. Chapter 6, Analyzing GWAS Data, talks about analyzing the GWAS data in order to make biological inferences. The chapter also covers multiple association analyses as well as CNV data. Chapter 7, Analyzing Mass Spectrometry Data, deals with various aspects of analyzing the mass spectrometry data. Issues related to reading different data formats, followed by analysis and quantifications, have been included in this chapter. Chapter 8, Analyzing NGS Data, illustrates various next generation sequencing data. The recipes in this chapter deal with NGS data processing, RNAseq, ChipSeq, and methylation data. Chapter 9, Machine Learning in Bioinformatics, discusses recipes related to machine learning in bioinformatics. We attempt to reach the issues of clustering classification and Bayesian learning in this chapter to infer from the biological data. Appendix A, Useful Operators and Functions in R, contains some useful general functions in R to perform various generic and non-generic operations. Appendix B, Useful R Packages, contains a list and description of some interesting libraries that contain utilities for different types of analysis and visualizations.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
1 Starting Bioinformatics with R In this chapter, we will cover the following recipes: Getting started and installing libraries Reading and writing data Filtering and subsetting data Basic statistical operations on data Generating probability distributions Performing statistical tests on data Visualizing data Working with PubMed in R Retrieving data from BioMart Introduction Recent developments in molecular biology, such as high throughput array technology or sequencing technology, are leading to an exponential increase in the volume of data that is being generated. Bioinformatics aims to get an insight into biological functioning and the organization of a living system riding on this data. The enormous data generated needs robust statistical handling, which in turn requires a sound computational statistics tool and environment. R provides just that kind of environment. It is a free tool with a large community and leverages the analysis of data via its huge package libraries that support various analysis operations.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 8 Before we start dealing with bioinformatics, this chapter lays the groundwork for upcoming chapters. We rst make sure that you know how to install R, followed by a few sections on the basics of R that will rejuvenate and churn up your memories and knowledge on R programming that we assume you already have. This part of the book will mostly introduce you to certain functions in R that will be useful in the upcoming chapters, without getting into the technical details. The latter part of the chapter (the last two recipes) will introduce Bioinformatics with respect to literature searching and data retrieval in the biomedical arena. Here, we will also discuss the technical details of the R programs used. Getting started and installing libraries Libraries in R are packages that have functions written to serve specic purposes; these include reading specic le formats in the case of a microarray datale or fetching data from certain databases, for example, GenBank (a sequence database). You must have these libraries installed in the system as well as loaded in the R session in order to be able to use them. They can be downloaded and installed from a specic repository or directly from a local path. Two of the most popular repositories of R packages are Comprehensive R Archive Network (CRAN) and Bioconductor. CRAN maintains and hosts identical, up-to-date versions of code and documentation for R on its mirror sites. We can use the install.packages function to install a package from CRAN that has many mirror locations. Bioconductor is another repository of R and the associated tool with a focus on other tools for the analysis of high throughput data. A detailed description on how to work with Bioconductor (http://www.bioconductor.org) is covered in the next chapter. This recipe aims to explain the steps involved in installing packages/libraries as well as local les from these repositories. Getting ready To get started, the prerequisites are as follows: You need an R application installed on your computer. For more details on the R program and its installation, visit http://cran.r-project.org. You need an Internet connection to install packages/libraries from web repositories such as CRAN and Bioconductor.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 9 How to do it The initialization of R depends on the operating system you are using. On Windows and Mac OS platforms, just clicking on the program starts an R session, like any other application for these systems. However, for Linux, R can be started by typing in R into the terminal (for all Linux distributions, namely, Ubuntu, SUSE Debian, and Red Hat). Note that calling R via its terminal or command line is also possible in Windows and Mac systems. This book will mostly use Linux as the operating system; nevertheless, the differences will be explained whenever required. The same commands can be used for all the platforms, but the Linux-based R lacks the default graphical user interface (GUI) of R. At this point, it is worth mentioning some of the code editors and integrated development environments (IDEs) that can be used to work with R. Some popular IDEs for R include RStudio (http:// www.rstudio.com) and the Eclipse IDE (http://www.eclipse.org) with the StatET package. To learn more about the StatET package, visit http://www.walware.de/goto/ statet. Some commonly used code editors are Emacs, Kate, Notepad++, and so on. The R GUI in Windows and Mac has its own code editor that meets all the requirements. Windows and Mac OS GUIs make installing packages pretty straightforward. Just follow the ensuing steps: 1. From the Packages menu in the toolbar, select Install package(s).... 2. If this is the rst time that you are installing a package during this session, R will ask you to pick a mirror. A selection of the nearest mirror (geographically) is more feasible for a faster download. 3. Click on the name of the package that you want to install and then on the OK button. R downloads and installs the selected packages. By default, R fetches packages from CRAN. However, you can change this if necessary just by choosing Select repositories... from the Packages menu. You are required to change the default repository or switch the repository in case the desired package is available in a different repository. Remember that a change in the repository is different from a change in the mirror; a mirror is the same repository at a different location.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 10 The following screenshot shows how to set up a repository for a package installation in the R GUI for Windows: 4. Install an R package in one of the following ways: From a terminal, install it with the following simple command: > install.packages("package_name") From a local directory, install it by setting the repository to null as follows: > install.packages("path/to/mypackage.tar.gz", repos = NULL, type="source") Another way to install packages in Unix (Linux) is without entering R (from the source) itself. This can be achieved by entering the following command in the shell terminal: R CMD INSTALL path/to/mypackage.tar.gz
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 11 5. To check the installed libraries/packages in R, type the following command: > library() 6. To quit an R session, type in q() at the R prompt, and the session will ask whether you want to save the session as a workspace image or not or whether you want to cancel the quit command. Accordingly, you need to type in y, n, or c. In a Windows or Mac OS, you can directly close the R program like any other application. > q() Save workspace image [y/n/c]: n Downloading the example code You can download the example code files for all Packt books that you have purchased from your account at http://www. packtpub.com. If you purchased this book from elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. How it works... An R session can run as a GUI on a Windows or Mac OS platform (as shown in the following screenshot). In the case of Linux, the R session starts in the same terminal. Nevertheless, you can run R within the terminal in Windows as well as Mac OS: The R GUI in Mac OS showing the command window (right), editor (top left), and plot window (bottom left)
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 12 The install.packages command asks the user to choose a mirror (usually the nearest) for the repository. It also checks for the dependencies required for the package being installed, provided we set the dependencies argument to TRUE. Then, it downloads the binaries (Windows and Mac OS) for the package (and the dependencies, if required). This is followed by its installation. The function also checks the compatibility of the package with R, as on occasions, the library cannot be loaded or installed due to an incorrect version or missing dependencies. In such cases, the installed packages are revoked. Installing from the source is required in cases where you have to compile binaries for your own machine in terms of the R version or so. The availability of binaries for the package makes installation easier for naive users. The lenames of the package binaries have a .tgz/.zip extension. The value of repos can be set to any remote source address for a specic remote source. On Windows, however, the function is also encoded in terms of a GUI that graphically and interactively shows the list of binary versions of the packages available for your R version. Nevertheless, the command-line installation is also functional on the Windows version of R. There's more... A few libraries are loaded by default when an R session starts. To load a library in R, run the following command: > load(package_name) Loading a package imports all the functions of this specic package into the R session. The default packages in the session can be viewed using the following getOption command: > getOption("defaultPackages") The currently loaded libraries in a session can be seen with the following command: > print(.packages()) An alternative for this is sessionInfo(), which provides version details as well. All the installed packages can be displayed by running the library function as follows: > library() Besides all this, R has a comprehensive built-in help system. You can get help from R in a number of ways. The Windows and Mac OS platforms offer help as a separate HTML page (as shown in the following screenshot) and Linux offers similar help text in the running terminal. The following is a list of options that can be used to seek help in R: > help.start() > help(sum) # Accesses help file for function sum > ?sum # Searches the help files for function sum > example(sum) # demonstrates the function with an example > help.search("sum") # uses the argument character to search help files
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 13 All of the previous functions provide help in a unique way. The help.start command is the general command used to start the hypertext version of the R documentation. All the help les related to the package can be checked with the following command: > help(package="package_name") The following screenshot shows an HTML help page for the sum function in R: Reading and writing data Before we start with analyzing any data, we must load it into our R workspace. This can be done directly either by loading an external R object (typical le extensions are .rda or .RData, but it is not limited to these extensions) or an internal R object for a package or a TXT, CSV, or Excel le. This recipe explains the methods that can be used to read data from a table or the .csv format and/or write similar les into an R session.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 14 Getting ready We will use an iris dataset for this recipe, which is available with R Base packages. The dataset bears quantied features of the morphologic variation of the three related species of Iris owers. How to do it Perform the following steps to read and write functions in R: 1. Load internal R data (already available with a package or base R) using the following data function: > data(iris) 2. To learn more about iris data, check the help function in R using the following function: > ?iris 3. Load external R data (conventionally saved as .rda or .RData, but not limited to this) with the following load function: > load(file="mydata.RData") 4. To save a data object, say, D, you can use the save function as follows: > save(D, file="myData.RData") 5. To read the tabular data in the form of a .csv le with read.csv or read.table, type the following command: > mydata <- read.table("file.dat", header = TRUE, sep="\t", row. names = 1) > mydata <- read.csv("mydata.csv") 6. It is also possible to read an Excel le in R. You can achieve this with various packages such as xlsx and gdata. The xlsx package requires Java settings, while gdata is relatively simple. However, the xlsx package offers more functionalities, such as read permissions for different sheets in a workbook and the newer versions of Excel les. For this example, we will use the xlsx package. Use the read.xlsx function to read an Excel le as follows: > install.packages("xlsx", dependencies=TRUE) > library(gdata) > mydata <- read.xls("mydata.xls")
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 15 7. To write these data frames or table objects into a CSV or table le, use the read.csv or write.table function as follows: > write.table(x, file = "myexcel.xls", append = FALSE, quote = TRUE, sep = " ") > write.csv(x, col.names = NA, sep = ",") How it works The read.csv or write.csv commands take the lename in the current working directoryif a complete path has not been speciedand based on the separators (usually the sep argument), import the data frames (or export them in case of write commands). To nd out the current working directory, use the getwd() command. In order to change it to your desired directory, use the setwd function as follows: > setwd("path/to desired/directory") The second argument header indicates whether or not the rst row is a set of labels by taking the Boolean values TRUE or FALSE. The read.csv function may not work in the case of incomplete tables with the default argument fill. To overcome such issues, use the value, TRUE for the fill argument. To learn more about optional arguments, take a look at the help section of the read.table function. Both the functions (read.table and read.csv) can use the headers (usually the rst row) as column names and specify certain column numbers as row names. There's more To get further information about the loaded dataset, use the class function for the dataset to get the type of dataset (object class). The data or object type in R can be of numerous types. This is beyond the scope of the book. It is expected that the reader is acquainted with these terms. Here, in the case of the iris data, the type is a data frame with 150 rows and ve columns (type the dim command with iris as the argument). A data frame class is like a matrix but can accommodate objects of different types, such as character, numeric, and factor, within it. You can take a look at the rst or last few rows using the head or tail functions (there are six rows by default) respectively, as follows: > class(iris) > dim(iris) > head(iris) > tail(iris)
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 16 The following WriteXLS package allows us to write an object into an Excel le for the x data object: > install.packages(WriteXLS) > library(WriteXLS) > WriteXLS(x, ExcelFileName = "R.xls") The package also allows us to write a list of data frames into the different sheets of an Excel le. The WriteXLS function uses Perl in the background to carry out tasks. The sheet argument can be set within the function and assigned the sheet number where you want to write the data. The save function in R is a standard way to save an object. However, the saveRDS function offers an advantage as it doesn't save both the object and its name; it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that will be different from the name it had when it was originally serialized. Let's take a look at the following example: > saveRDS(myObj, "myObj.rds") > myObj2 <- readRDS("myObj.rds") > ls() [1] "myObj" "myObj2" Another package named data.table can be used to perform data reading at a faster speed, which is especially suited for larger data. To know more about the package, visit the CRAN page for the package at http://cran.r-project.org/web/packages/data.table/ index.html. The foreign package (http://cran.r-project.org/web/packages/foreign/ index.html) is available to read/write data for other programs such as SPSS and SAS. Filtering and subsetting data The data that we read in our previous recipes exists in R as data frames. Data frames are the primary structures of tabular data in R. By a tabular structure, we mean the row-column format. The data we store in the columns of a data frame can be of various types, such as numeric or factor. In this recipe, we will talk about some simple operations on data to extract parts of these data frames, add a new chunk, or lter a part that satises certain conditions.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 17 Getting ready The following items are needed for this recipe: A data frame loaded to be modied or ltered in the R session (in our case, the iris data) Another set of data to be added to item 1 or a set of lters to be extracted from item 1 How to do it Perform the following steps to lter and create a subset from a data frame: 1. Load the iris data as explained in the earlier recipe. 2. To extract the names of the species and corresponding sepal dimensions (length and width), take a look at the structure of the data as follows: > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 3. To extract the relevant data to the myiris object, use the data.frame function that creates a data frame with the dened columns as follows: > myiris=data.frame(Sepal.Length=iris$Sepal.Length, Sepal.Width= iris$Sepal.Width, Species= iris$Species) 4. Alternatively, extract the relevant columns or remove the irrelevant ones (however, this style of subsetting should be avoided): > myiris <- iris[,c(1,2,5)] 5. Instead of the two previous methods, you can also use the removal approach to extract the data as follows: > myiris <- iris[,-c(3,4)] 6. You can add to the data by adding a new column with cbind or a new row through rbind (the rnorm function generates a random sample from a normal distribution and will be discussed in detail in the next recipe): > Stalk.Length <-c (rnorm(30,1,0.1),rnorm(30,1.3,0.1), rnorm(30,1. 5,0.1),rnorm(30,1.8,0.1), rnorm(30,2,0.1)) > myiris <- cbind(iris, Stalk.Length)
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 18 7. Alternatively, you can do it in one step as follows: > myiris$Stalk.Length = c(rnorm(30,1,0.1),rnorm(30,1.3,0.1), rnorm (30,1.5,0.1),rnorm(30,1.8,0.1), rnorm(30,2,0.1)) 8. Check the new data frame using the following commands: > dim(myiris) [1] 150 6 > colnames(myiris)# get column names for the data frame myiris [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" "Stalk.Length" 9. Use rbind as depicted: newdat <- data.frame(Sepal.Length=10.1, Sepal.Width=0.5, Petal. Length=2.5, Petal.Width=0.9, Species="myspecies") > myiris <- rbind(iris, newdat) > dim(myiris) [1] 151 5 > myiris[151,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 151 10.1 0.5 2.5 0.9 myspecies 10. Extract a part from the data frame, which meets certain conditions, in one of the following ways: One of the conditions is as follows: > mynew.iris <- subset(myiris, Sepal.Length == 10.1) An alternative condition is as follows: > mynew.iris <- myiris[myiris$Sepal.Length == 10.1, ] > mynew.iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 151 10.1 0.5 2.5 0.9 myspecies > mynew.iris <- subset(iris, Species == "setosa") 11. Check the following rst row of the extracted data: > mynew.iris[1,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa You can use any comparative operator as well as even combine more than one condition with logical operators such as & (AND), | (OR), and ! (NOT), if required.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 19 How it works These functions use R indexing with named columns (the $ sign) or index numbers. The $ sign placed after the data followed by the column name species the data in that column. The R indexing system for data frames is very simple, just like other scripting languages, and is represented as [rows, columns]. You can represent several indices for rows and columns using the c operator as implemented in the following example. A minus sign on the indices for rows/ columns removes these parts of the data. The rbind function used earlier combines the data along the rows (row-wise), whereas cbind does the same along the columns (column-wise). There's more Another way to select part of the data is using %in% operators with the data frame, as follows: > mylength <- c(4,5,6,7,7.2) > mynew.iris <- myiris[myiris[,1] %in% mylength,] This selects all the rows from the data that meet the dened condition. The condition here means that the value in column 1 of myiris is the same as (matching) any value in the mylength vector. The extracted rows are then assigned to a new object, mynew.iris. Basic statistical operations on data R being a statistical programming environment has a number of built-in functionalities to perform statistics on data. Nevertheless, some specic functionalities are either available in packages or can easily be written. This section will introduce some basic built-in and useful in-package options. Getting ready The only prerequisite for this recipe is the dataset that you want to work with. We use our iris data in most of the recipes in this chapter. How to do it The steps to perform a basic statistical operation on the data are listed here as follows: 1. R facilitates the computing of various kinds of statistical parameters, such as mean standard deviation, with a simple function. This can be applied on individual vectors or on an entire data frame as follows: > summary(iris) # Shows a summary for each column for table data Sepal.Length Sepal.Width Petal.Length Petal.Width
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 20 Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50 > mean(iris[,1]) [1] 5.843333 > sd(iris[,1]) [1] 0.8280661 2. The cor function allows for the computing of the correlation between two vectors as follows: > cor(iris[,1], iris[,2]) [1] -0.1175698 > cor(iris[,1], iris[,3]) [1] 0.8717538 3. To get the covariance for the data matrix, simply use the cov function as follows: > Cov.mat <- cov(iris[,1:4]) > Cov.mat Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 0.6856935 -0.0424340 1.2743154 0.5162707 Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394 Petal.Length 1.2743154 -0.3296564 3.1162779 1.2956094 Petal.Width 0.5162707 -0.1216394 1.2956094 0.5810063
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 21 How it works Most of the functions we saw in this recipe are part of basic R or generic functions. The summary function in R provides the summaries of the input depending on the class of the input. The function invokes various functions depending on the class of the input object. The returned value also depends on the input object. For instance, if the input is a vector that consists of numeric data, it will present the mean, median, minimum, maximum, and quartiles for the data, whereas if the input is tabular (numeric) data, it will give similar computations for each column. We will use the summary function in upcoming chapters for different types of input objects. The functions accept the data as input and simply compute all these statistical scores on them, displaying them as vector, list, or data frame depending on the input and the function. For most of these functions, we have the possibility of using the na.rm argument. This empowers the user to work with missing data. If we have missing values (called NA in R) in our data, we can set the na.rm argument to TRUE, and the computation will be done only based on non-NA values. Take a look at the following chunk for an example: > a <- c(1:4, NA, 6) > mean(a) # returns NA [1] NA > mean(a, na.rm=TRUE) [1] 3.2 We see here that in the case of missing values, the mean function returns NA by default as it does not know how to handle the missing value. Setting na.rm to TRUE actually computes the mean of ve numbers (1, 2, 3, 4, and 6) in place of 6 (1, 2, 3, 4, NA, and 6), returning 3.2. To compute the correlation between the sepal length and sepal width in our iris data, we simply use the cor function with the two columns (sepal length and sepal width) as the arguments for the function. We can compute the different types of correlation coefcients, namely Pearson, Spearman, Kendall, and so on, by specifying the apt value for the method arguments in the function. For more details, refer to the help (?cor) function.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 22 Generating probability distributions Before we talk about anything in this section, try the ?Distributions function in your R terminal (console). You will see that a help page consisting of different probability distributions opens up. These are part of the base package of R. You can generate all these distributions without the aid of additional packages. Some interesting distributions are listed in the following table. Other distributi ons, for example, multivariate normal distribution (MVN), can be generated by the use of external packages (MASS packages for MVN). Most of these functions follow the same syntax, so if you get used to one, others can be achieved in a similar fashion. In addition to this simple process, you can generate different aspects of the distribution just by adding some prexes. How to do it The following are the steps to generate probability distributions: 1. To generate 100 instances of normally distributed data with a mean equal to 1 and standard deviation equal to 0.1, use the following command: > n.data <- rnorm(n=100, mean=1, sd=0.1) 2. Plot the histogram to observe the distribution as follows: > hist(n.data) 3. Check the density of the distribution and observe the shape by typing the following command: > plot(density(n.data)) Do you see the bell shape in this plot? 4. To identify the corresponding parameters for other prexes, use the following help le example: > ?pnorm The following table depicts the functions that deal with various statistical distributions in R (R Base packages only): Distribution Probability Quantile Density Random Beta pbeta qbeta dbeta rbeta Binomial pbinom qbinom dbinom rbinom Cauchy pcauchy qcauchy dcauchy rcauchy Chi-Square pchisq qchisq dchisq rchisq
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 23 Distribution Probability Quantile Density Random Exponential pexp qexp dexp rexp F pf qf df rf Gamma pgamma qgamma dgamma rgamma Geometric pgeom qgeom dgeom rgeom Hypergeometric phyper qhyper dhyper rhyper Logistic plogis qlogis dlogis rlogis Log Normal plnorm qlnorm dlnorm rlnorm Negative Binomial pnbinom qnbinom dnbinom rnbinom Normal pnorm qnorm dnorm rnorm Poisson ppois qpois dpois rpois Student t pt qt dt rt Studentized Range ptukey qtukey dtukey rtukey Uniform punif qunif dunif runif How it works The rnorm function has three arguments: n (the number of instances you want to generate), the desired mean of the distribution, and the desired standard deviation (sd) in the distribution. The command thus generates a vector of length n, whose mean and standard deviations are as dened by you. If you look closely at the functions described in the table, you can gure out a pattern. The prexes p, q, d, and r are added to every distribution name to generate probability, quintiles, density, and random samples, respectively. There's more To learn more about statistical distribution, visit the Wolfram page at http://www.wolframalpha.com/examples/StatisticalDistributions.html. Performing statistical tests on data Statistical tests are performed to assess the signicance of results in research or application and assist in making quantitative decisions. The idea is to determine whether there is enough evidence to reject a conjecture about the results. In-built functions in R allow several such tests on data. The choice of test depends on the data and the question being asked. To illustrate, when we need to compare a group against a hypothetical value and our measurements follow the Gaussian distribution, we can use a one-sample t-test. However, if we have two paired groups (both measurements that follow the Gaussian distribution) being compared, we can use a paired t-test. R has built-in functions to carry out such tests, and in this recipe, we will try out some of these.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 24 How to do it Use the following steps to perform a statistical test on your data: 1. To do a t-test, load your data (in our case, it is the sleep data) as follows: > data(sleep) 2. To perform the two-sided, unpaired t-test on the rst and second columns (the values for the two conditions), type the following commands: > test <- t.test(sleep[,1]~sleep[,2]) > test Welch Two Sample t-test data: sleep[, 1] by sleep[, 2] t = -1.8608, df = 17.776, p-value = 0.07939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean in group 1 mean in group 2 0.75 2.33 3. Create a contingency table as follows: > cont <- matrix(c(14, 33, 7, 3), ncol = 2) > cont [,1] [,2] [1,] 14 7 [2,] 33 3 4. Create a table that represents two types of cars, namely, sedan and convertible (columns) and two genders, male and female, and a count of these that own the types of cars along the rows. Thus, you have the following output: > colnames(cont) <- c("Sedan", "Convertible") > rownames(cont) <- c("Male", "Female") > cont Sedan Convertible Male 14 7 Female 33 3
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 25 5. In order to nd the car type and gender, carry out a Chi-square test based on this contingency table as follows: > test <- chisq.test(as.table(cont)) > test Pearson's Chi-squared test with Yates' continuity correction data: as.table(cont) X-squared = 4.1324, df = 1, p-value = 0.04207 6. For a Wilcoxon signed-rank test, rst create a set of vectors containing observations to be tested as x and y, as shown in the following commands: > x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) > y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) 7. This is simply followed by a command that you need to execute to run the Wilcoxon signed-rank test as follows: > test <- wilcox.test(x, y, paired = TRUE, alternative = "greater") 8. To look at the contents of the object test, check the structures as follows and look at the specic values of the components: > str(test) > test$p.value How it works The t-test (in our case, it is two sample t-tests) computes how the calculated mean may deviate from the real mean by chance. Here, we use the sleep data that already exists in R. This sleep data shows the effect of two drugs in terms of an increase in the hours of sleep compared to the sleep data of 10 control patients. The result is a list that consists of nine elements, such as p-value, condence interval, method, and mean estimates. Chi-square statistics investigate whether the distributions of the categorical variables differ from one another. It is commonly used to compare observed data with the data that we would expect to obtain according to a specic hypothesis. In this recipe, we considered the scenario that one gender has a different preference for a car, which comes out to true at a p-value cutoff at 0.05. We can also check the expected values for the Chi-square test with the chisq.test(as.table(cont))$expected function. The Wilcoxon test is used to compare two related samples or repeated measurements on a single sample, to assess if their population mean ranks differ. It can be used to compare the results of two methods. Let x and y be the performance results of two methods, and our alternative hypothesis is that x is shifted to the right of y (greater). The p-value returned by the test facilitates the acceptance or rejection of the null hypothesis.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 26 There's more There are certain other tests, such as the permutation test, Kolmogorov-Smirnov test, and so on, that can be done with R using different functions for appropriate datasets. A few more tests will be discussed in later chapters. To learn more about statistical tests, you can refer to a brief tutorial at http://udel.edu/~mcdonald/statbigchart.html. Visualizing data Data is more intuitive to comprehend if visualized in a graphical format rather than in the form of a table, matrix, text, or numbers. For example, if we want to visualize how the sepal length in the Iris ower varies with the petal length, we can plot along the x and y axes, respectively, and visualize the trend or even the correlation (scatter plot).In this recipe, we look at some common way of visualizing data in R and plotting functions with R Base graphics functions. We also discuss the basic plotting functions. These plotting functions can be manipulated in many ways, but discussing them is beyond the scope of this book. To get to know more about all the possible arguments, refer to the corresponding help les. Getting ready The only item we need ready here is the dataset (in this recipe, we use the iris dataset). How to do it The following are the steps for some basic graph visualizations in R: 1. To create a scatter plot, start with your iris dataset. What you want to see is the variation of the sepal length and petal length. You need a plot of the sepal length (column 1) along the y axis and the petal length (column 4) along the x axis, as shown in the following commands: > sl <- iris[,1] > pl <- iris[,4] > plot(x=pl, y=sl, xlab="Petal length", ylab="Sepal length", col="black", main="Varition of sepal length with petal length") Or alternatively, we can use the following command: > plot(with(iris, plot(x = Sepal.Length, y=Petal.Length)) 2. To create a boxplot for the data, use the boxplot function in the following way: > boxplot(Sepal.Length~Species, data=iris, ylab="sepal length", xlab="Species", main="Sepal length for different species")
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 27 3. Plotting a line diagram, however, is the same as plotting a scatter plot; just introduce another argument type into it and set it to 'l'. However, we use a different, self-created dataset to depict this as follows: > genex <- c(rnorm(100, 1, 0.1), rnorm(100, 2, 0.1), rnorm(50, 3, 0.1)) > plot(x=genex, xlim=c(1,5), type='l', main="line diagram") Plotting in R: (A) Scatter plot, (B) Boxplot, (C) Line diagram, and (D) Histogram
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 28 4. Histograms can used to visualize the density of the data and the frequency of every bin/category. Plotting histograms in R is pretty simple; use the following commands: > x <- rnorm(1000, 3, 0.02) > hist(x) How it works The plot function extracts the relevant data from the original dataset with the column numbers (sl and pl, respectively, for the sepal length and petal length) and then plots a scatter plot. The plot function then plots the sepal length along the y axis and the petal length along the x axis. The axis labels can be assigned with the argument for xlab and ylab, respectively, and the plot can be given a title with the main argument. The plot (in the A section of the previous screenshot) thus shows that the two variables follow a more or less positive correlation. Scatter plots are not useful if one has to look for a trend, that is, for how a value is evolving along the indices, which can prove that it is time for a dynamic process. For example, the expression of a gene along time or along the concentration of a drug. A line diagram is a better way to show this. Here, we rst generate a set of 250 articial values and their indices, which are the values on the x scale. For these values, we assume a normal distribution, as we saw in the previous section. This is then plotted (as shown in the B section of the previous screenshot). It is possible to add more lines to the same plot using the line function as follows: > lines(density(x), col="red") A boxplot can be an interesting visualization if we want to compare two categories or groups in terms of their attributes that are measured in terms of numbers. They depict groups of numerical data through their quartiles. To illustrate, let us consider the iris data again. We have the name of the species in this data (column 5). Now, we want to compare the sepal length of these species with each other, such as which one has the longest sepal and how the sepal length varies within and between species. The data table has all this information, but it is not readily observable. The boxplot function has the rst argument that sorts out what to plot and what to plot against. This can be given in terms of the column names of the data frame that is the second argument. Other arguments are the same as other plot functions. The resulting plot (as shown in the C section of the previous screenshot,) shows three boxes along the x axis for the three species in our data. Each of these boxes depicts the range quartiles and median of the corresponding sepal lengths. The histogram (the D section of the previous screenshot) describes the distribution of data. As we see, the data is normally distributed with a mean of 3; therefore, the plot displays a bell shape with a peak of around 3. To see the bell shape, try the plot(density(x)) function.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 29 There's more You can use the plot function for an entire data frame (try doing this for the iris dataset, plot(iris)). You will observe a set of pair-wise plots like a matrix. Beside this, there are many other packages available in R for different high-quality plots such as ggplot2, and plotrix. They will be discussed in the next chapters when needed. This section was just an attempt to introduce the simple plot functions in R. Working with PubMed in R Research begins with a survey of the related works in the eld. This can be achieved by looking into the literature available. PubMed is a service that provides the option to look into the literature. The service has been provided by NCBI-Entrez databases (shown in the following screenshot) and is available at https://www.ncbi.nlm.nih.gov. R provides an interface to look into the various aspects of the literature via PubMed. This section provides a protocol to handle this sort of interface. This recipe allows the searching, storing, and mining, and quantication meta-analysis within the R program itself, without the need to visit the PubMed page every time, thus aiding in analysis automation. The following screenshot shows the PubMed web page for queries and retrieval:
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 30 Getting ready It's time to get practical with what we learned so far. For all the sessions throughout this book, we will use the Linux terminal. Let's start at the point where it begins, by getting into the bibliographic data. The RISmed package facilitates the analyses of NCBI database content, written and maintained by Kovalchik. The following are the requirements to work with PubMed in R: An Internet connection to access the PubMed system An RISmed package installed and loaded in the R session; this can be done easily with the following chunk of code: > install.packages("RISmed") > library(RISmed) To look into the various functionalities, you can use the following help function of R: > help(package="RISmed") How to do it The following steps illustrate how to search and retrieve literature from PubMed using R: 1. Load the default data in RISmed. The default data available with the package is for myeloma. Start by loading this data as follows: > data(myeloma) 2. Now, nd the myeloma object that was loaded in your R workspace with the ls() command as follows (note that you might see some other objects as well besides the myeloma): > ls() [1]myeloma 3. To see the contents of the myeloma object, use the following command: > str(myeloma) 4. Take a look at each element of the data using RISmed, which has the following specic functions: > AbstractText(myeloma) > Author(myeloma) > ArticleTitle(myeloma) > Title(myeloma) > PMID(myeloma)
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 31 5. Create your customized query. What we had till now for RISmed was based on a precompiled data object for the package. It will be interesting to discuss how we can create a similar object (for instance, cancer) with a query of our choice. The function that facilitates the data retrieval and creation of a RISmed class is as follows: > cancer <- EUtilsSummary("cancer[ti]", type="research", db="pubmed") > class(cancer) How it works Before we go deep into the functioning of the package, it's important to know about E-utilities. RISmed uses E-utilities to retrieve data from the Entrez system. In this chapter, however, our focus is on bibliographic data. E-utilities provide an interface to the Entrez query and database system. It covers a range of data, including bibliographic, sequences, and structural data managed by the NCBI. Its functioning is very simple; it sends the query through a URL to the Entrez system and retrieves the results for the query. This enables the use of any programming language, such as Perl, Python, or C++, to fetch the XML response and interpret it. There are several tools that act as a part of E-utilities (for details, visit http://www.ncbi.nlm.nih.gov/books/NBK25497/). However, for us, Efetch is interesting. It responds to an input of unique IDs (for example, PMIDs) for a database (in our case, PubMed) with corresponding data records. The Efetch utility is a data retrieval utility that fetches data from the Entrez databases in the requested format, with the input in the form of corresponding IDs. The RISmed package uses it to retrieve literature data. The rst argument of the EUtilsSummary function is the query term, and in the square brackets, we have the eld of the query term (in our case, [ti] is for title, [au] is for author, and so on). The second argument is the E-utility type and the third one refers to the database (in our case, PubMed). The myeloma object of the RISmed class has information about the query that was made to fetch the object from PubMed, the PMIDs of the search hits, and the details of the articles, such as the year of publication, author, titles, abstract and journal details, and associated mesh terms. All these commands of the package used with/for myeloma return a list of lengths equal to the number of hits in the data object (in our case, the myeloma object). Note that the Title function returns the title of the journal or publisher and not the title of the article, which can be seen with ArticleTitle. Now, let's take a look at the structure of the cancer object that we created: > str(cancer) # As in August 2013 Formal class 'EUtilsSummary' [package "RISmed"] with 5 slots ..@ count : num 575447 ..@ retmax : num 1000 ..@ retstart : num 0
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 32 ..@ id : chr [1:1000] "23901442" "23901427" "23901357" "23901352" ... ..@ querytranslation: chr "cancer[ti]" > cancer@id[1:10] [1] "23905205" "23905156" "23905117" "23905066" "23905042" "23905012" [7] "23904955" "23904921" "23904880" "23904859" The cancer object consists of ve slots: count, retmax, retstart, id, and querytranslation. These variables are assigned as the subclasses of the cancer object. Therefore, in case we need to get the PMIDs of the retrieval, we can do so by getting values for the id component of the cancer object with the following code: > cancer@id One important point that should be noted is that this query retrieved only the rst 1000 hits out of 575,447 (the default value for retmax). Furthermore, one should follow the policies for its uses to avoid overloading the E-Utilities server. For further details, read the policy at http://www.ncbi.nlm.nih.gov/books/NBK25497/. Now, we are left with the creation of a RISmed object (the same as the myeloma object). To get this, we use the following EUtilsGet function: > cancer.ris <- EUtilsGet(cancer, type="efetch", db="pubmed") > class(cancer.ris) This new object, cancer.ris, can be used to acquire further details as explained earlier. For more operations on PubMed, refer to the help le of the RISmed package. A drawback of the RISmed package is that, in some cases, due to the incorrect parsing of text, the values returned could be inaccurate. A more detailed explanation of this package can be found by seeking help for the package as described earlier. To get to know more about the RISmed package, refer to the CRAN package home page at http://cran.r-project. org/web/packages/RISmed/RISmed.pdf. Some interesting applications on the RISmed package are available on the R Psychologist page at http://rpsychologist.com/an-r-script-to-automatically-look-at- pubmed-citation-counts-by-year-of-publication/.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 33 Retrieving data from BioMart So far, we discussed bibliographic data retrieval from PubMed. R also allows the handling of other kinds of data. Here, we introduce another R package called biomaRt, developed by Durinck and co-workers. It provides an interface to a collection of databases implementing the BioMart suite (http://www.biomart.org). It enables the retrieval of data from a range of BioMart databases, such as Ensembl (genes and genomes), Uniprot (information on proteins), HGNC (gene nomenclature), Gramene (plant functional genomics), and Wormbase (information on C. elegans and other nematodes). Getting ready The following are the prerequisites: Install and load the biomaRt library Create the data ID or names you want to retrieve (usually gene nameswe use BRCA1 for a demo in this recipe), such as the ID or the chromosomal location How to do it Retrieving the gene ID from HGNC involves the following steps, where we rst set the mart (data source), followed by the retrieval of genes from this mart: 1. Before you start using biomaRt, install the package and load it into the R session. The package can directly be installed from Bioconductor with the following script. We discuss more about Bioconductor in the next chapter; for the time being, take a look at the following installation: > source("http://bioconductor.org/biocLite.R") > biocLite("biomaRt") > library(biomaRt) 2. Select the appropriate mart for retrieval by dening the right database for your query. Here, you will look for human ensembl genes; hence, run the useMart function as follows: > mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ ensembl") 3. Now, you will get the list of genes from the ensembl data, which you opted for earlier, as follows: > my_results <- getBM(attributes = c("hgnc_symbol"), mart = mart)
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Starting Bioinformatics with R 34 4. You can then sample a few genes, say 50, from your retrieved genes as follows: > N <- 50 > mysample <- sample(my_results$hgnc_symbol,N) > head(mysample) [1] "AHSG" "PLXNA4" "SYT12" "COX6CP8" "RFK" "POLR2LP" 5. The biomaRt package can also be used to retrieve sequences from the databases for a gene, namely "BRCA1", as shown in the following commands: > seq <- getSequence(id="BRCA1", type="hgnc_symbol", seqType="peptide", mart = mart) > show(seq) 6. To retrieve a sequence that species the chromosome position, the range of the position (upstream and downstream from a site) can be used as well, as follows: > seq2 <- getSequence(id="ENST00000520540", type='ensembl_ transcript_id',seqType='gene_flank',upstream = 30,mart = mart) 7. To see the sequence, use the show function as follows: > show(seq2) gene_flank ensembl_transcript_id 1 AATGAAAAGAGGTCTGCCCGAGCGTGCGAC ENST00000520540 How it works The source function in R loads a new set of functions in the source le into the R-session; do not confuse it with a package. Furthermore, during installation, R might ask you to update the Bioconductor libraries that were already installed. You can choose these libraries as per your requirements. The biomaRt package works with the BioMart database as described earlier. It rst selects the mart of interest (that is why, we have to select our mart for a specic query). Then, this mart is used to search for the query on the BioMart database. The results are then returned and formatted for the return value. The package thus provides an interface for the BioMart system. Thus, the biomaRt package can search the database and fetch a variety of biological data. Although the data can be downloaded in a conventional way from its respective database, biomaRt can be used to bring the element of automation into your workow for bulk and batch processing.
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Chapter 1 35 There's more The biomaRt package can also be used to convert one gene ID to other types of IDs. Here, we illustrate the conversion of RefSeq IDs to gene symbols with the following chunk of code: > mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl") > geneList <- read.csv("mylist.csv") > results <- getBM(attributes = c("refseq_mrna", "hgnc_symbol"), filters = "refseq_mrna", values = geneList[,2], mart = mart) > results refseq_mrna hgnc_symbol 1 NM_000546 TP53 2 NM_001271003 TFPI2 3 NM_004402 DFFB 4 NM_005359 SMAD4 5 NM_018198 DNAJC11 6 NM_023018 NADK 7 NM_033467 MMEL1 8 NM_178545 TMEM52 Though biomaRt enables the conversion of the ID for biological entities for most of our work, in this book, we also use some other packages that are handier and will be illustrated in the next chapter. See also The BioMart home page at http://www.biomart.org to know more about BioMart The BioMart: driving a paradigm change in biological data management article by Arek Kasprzyk (http://database.oxfordjournals.org/content/2011/ bar049.full) The BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis article by Durinck and others (http://bioinformatics. oxfordjournals.org/content/21/16/3439.long), which discusses the details of the biomaRt package
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
Where to buy this book You can buy Bioinformatics with R Cookbook from the Packt Publishing website: http://www.packtpub.com/bioinformatics-with-r-cookbook/book. Free shipping to the US, UK, Europe and selected Asian countries. For more information, please read our shipping policy. Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet book retailers.
www.PacktPub.com
For More Information: www.packtpub.com/bioinformatics-with-r-cookbook/book
(Ebook) Orthopaedic Knowledge Update: Home Study Syllabus, 8 (ORTHOPEDIC KNOWLEDGE UPDATE SERIES) (No. 8) by Alexander R. Vaccaro ISBN 9780892033386, 089203338X download pdf
[Ebooks PDF] download (Ebook) Environmental and Ecological Statistics with R, Second Edition by Song S. Qian ISBN 9781498728720, 1498728723 full chapters