0% found this document useful (0 votes)
19 views

Advanced R

Uploaded by

22at1a3228
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Advanced R

Uploaded by

22at1a3228
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ADVANCED R(III AI)

UNIT-1

1. Which of the following is the correct command to install a package in R?

a) install.package()
b) install.packages()
c) install_lib()
d) library_install()

2. How can you set the working directory in R?

a) set_working_dir()
b) setwd()
c) setwd("path/to/directory")
d) change_dir()

3. Which operator is used for assignment in R?

a) ==
b) =
c) <-
d) ===

4. What does the comparison operator != in R signify?

a) Assignment
b) Not equal
c) Greater than
d) Equal

5. What is the correct operator for logical AND in R?

a) &
b) &&
c) ||
d) |

6. How can you create a vector in R?

a) vector()
b) c()
c) list()
d) data.frame()
7. Which function is used to sort a vector in R?

a) arrange()
b) sort()
c) order()
d) rank()

8. To generate a sequence from 1 to 10, which function would you use?

a) repeat(1:10)
b) seq(1, 10)
c) generate_seq(1, 10)
d) list(1:10)

9. What is the correct command to create a factor variable?

a) new.factor()
b) levels()
c) factor()
d) categorize()

10. Which function generates random numbers in R?

a) rand()
b) random()
c) runif()
d) get_random()

11. How do you check the class of an object in R?

a) check_class()
b) class()
c) type()
d) typeof()

12. Which of the following is not a data type in R?

a) Numeric
b) Array
c) Character
d) Logical

13. What does the lapply() function do in R?

a) Applies a function to a single element


b) Applies a function to each element of a list
c) Applies a function to a vector
d) Combines elements of a list
14. What does the sapply() function return?

a) Always a list
b) Always a vector
c) A vector or matrix
d) A data frame

15. What is the primary difference between lapply() and sapply()?

a) sapply() attempts to simplify the result


b) lapply() simplifies the result
c) They both return the same output
d) lapply() is faster than sapply()

16. Which function can be used to apply a function over subsets of a vector in R?

a) lapply()
b) tapply()
c) apply()
d) sapply()

17. What does coercion mean in R?

a) Automatic conversion of one data type to another


b) Manual conversion of variables
c) Creation of complex numbers
d) Variable renaming

18. Which of the following is NOT a valid arithmetic operator in R?

a) +
b) %%
c) :=
d) ^

19. What does the function is.na() check in R?

a) Whether a variable is numeric


b) Whether a variable has missing (NA) values
c) Whether a variable is assigned
d) Whether a variable is a factor

20. Which command would you use to repeat the number 5, ten times?

a) repeat(5, times = 10)


b) replicate(5, 10)
c) rep(5, times = 10)
d) seq(5, 10)
21. How do you create a user-defined function in R?

a) function() = {...}
b) function_name <- function() {...}
c) def function() {...}
d) create_function() = {...}

22. What is the result of 2^3 in R?

a) 5
b) 6
c) 8
d) 9

23. How do you access the second element in a vector x?

a) x[2, ]
b) x[[2]]
c) x[2]
d) x{2}

24. What does typeof() return in R?

a) The internal type or storage mode of an object


b) The length of a vector
c) The dimensions of a matrix
d) The class of an object

25. How do you concatenate two vectors v1 and v2 in R?

a) merge(v1, v2)
b) sum(v1, v2)
c) c(v1, v2)
d) combine(v1, v2)
UNIT-2
1. Which function is used to read a CSV file into R?

a) import_csv()
b) readData()
c) read.csv()
d) load.csv()

2. How do you write a data frame to a CSV file in R?

a) write.csv()
b) export.csv()
c) save.csv()
d) write.csv(data_frame, "file.csv")

3. What is the function to check the structure of a data frame?

a) info()
b) str()
c) summary()
d) head()

4. Which function is used to view the first few rows of a data frame?

a) tail()
b) preview()
c) head()
d) glimpse()

5. How can you export a data frame to an Excel file in R?

a) save_xlsx()
b) write_xl()
c) write.xlsx()
d) save.excel()

6. Which function can be used to view the summary statistics of a data frame?

a) stats()
b) summary()
c) describe()
d) info()

7. What does the na.omit() function do in R?

a) Removes rows with missing (NA) values


b) Replaces NA values with zero
c) Fills NA values with the mean
d) Lists rows with NA values

8. How can you replace NA values with a specific value in R?

a) replace_na()
b) is.na() <- value
c) replace()
d) fill_na()

9. Which function helps to remove duplicates from a data frame?

a) unique()
b) remove_dups()
c) distinct()
d) deduplicate()

10. How can you add a new column to an existing data frame in R?

a) add_column()
b) data_frame$new_column <- values
c) insert_column()
d) append_column()

11. To add a new row to a data frame, which function can be used?

a) rowAdd()
b) rbind(data_frame, new_row)
c) append()
d) add_row()

12. How can you merge two data frames by a common column?

a) join()
b) stack()
c) merge(df1, df2, by = "column_name")
d) combine()

13. Which function reshapes data from wide format to long format?

a) spread()
b) transpose()
c) gather()
d) pivot_longer()

14. Which function reshapes data from long format to wide format?

a) gather()
b) spread()
c) wide()
d) stack()

15. What is the function used to remove a column from a data frame?

a) del_column()
b) remove()
c) df$column_name <- NULL
d) clear_column()

16. Which function can be used to reorder the levels of a factor variable?

a) rearrange_levels()
b) change_factor()
c) factor() with levels argument
d) level_order()

17. How can you combine two data frames vertically in R?

a) combine()
b) rbind(df1, df2)
c) cbind(df1, df2)
d) merge()

18. How can you combine two data frames horizontally in R?

a) rbind()
b) merge()
c) cbind(df1, df2)
d) append()

19. Which function is used to filter rows in a data frame based on conditions?

a) filter_rows()
b) subset(data_frame, condition)
c) filter_data()
d) cond_subset()

20. How do you detect missing values in a data frame?

a) find_na()
b) is.na()
c) na.detect()
d) check_na()

21. To change the names of the columns in a data frame, which function can be used?

a) colnames(data_frame) <- c("new_name1", "new_name2")


b) rename_columns()
c) name_change()
d) change_colnames()

22. Which function converts data frames into tibble (modern data frame)?

a) convert()
b) as_tibble()
c) to_df()
d) to_tibble()

23. What does duplicated() do in R?

a) Removes duplicates
b) Returns a logical vector indicating duplicate rows
c) Finds unique values
d) Orders a data frame

24. What is the function to concatenate strings in R?

a) str_join()
b) string_concat()
c) paste()
d) glue()

25. How can you rename a specific column in a data frame?

a) rename_column()
b) names(data_frame)[names(data_frame) == "old_name"] <- "new_name"
c) change_name()
d) col_rename

Unit-3

1. Which function is used to get the current date and time in R?

a) Sys.time()
b) current_time()
c) now()
d) get_time()

2. What is the class of an object created by Sys.Date()?

a) POSIXlt
b) POSIXct
c) Date
d) time

3. Which of the following represents the POSIXct class in R?

a) A number representing seconds since 1970-01-01


b) A list of date-time components
c) Date stored as string
d) A factor of date-time values

4. What is the difference between POSIXct and POSIXlt?

a) POSIXct is a string, POSIXlt is numeric


b) POSIXct stores dates as the number of seconds since 1970-01-01, POSIXlt stores dates as a list of
components
c) POSIXlt is faster than POSIXct
d) POSIXct stores only date, POSIXlt stores both date and time

5. Which function is used to parse a date string into POSIXct in R?

a) parseDate()
b) as.POSIXct()
c) strptime()
d) dateParse()

6. How would you convert a string "2024-09-26" to a date in R?

a) as.Date("2024-09-26")
b) date("2024-09-26")
c) dateParse("2024-09-26")
d) parseDate("2024-09-26")

7. What is the purpose of the format argument in date parsing functions like as.POSIXct()?

a) To convert date to string


b) To specify the input format of the date string
c) To add time zones
d) To remove seconds from the date

8. Which function extracts the year from a date object in R?

a) extract_year()
b) format(date, "%Y")
c) year()
d) get_year()

9. How do you extract the weekday from a date in R?


a) weekdays(date)
b) format(date, "%W")
c) get_weekday()
d) day_of_week(date)

10. Which function is used to calculate the difference between two dates in R?

a) subtract_dates()
b) time_diff()
c) difftime()
d) date_diff()

11. What is the class of the object returned by the difftime() function?

a) difftime
b) numeric
c) POSIXct
d) Date

12. Which of the following is used to add days to a date in R?

a) add_days()
b) date + days
c) date.plus()
d) append_date()

13. How would you create a sequence of dates starting from "2024-01-01" with an increment of 1
day for 10 days?

a) date_seq("2024-01-01", days = 10)


b) seq(as.Date("2024-01-01"), by = "days", length.out = 10)
c) seq_days("2024-01-01", 10)
d) date_sequence("2024-01-01", "day", 10)

14. Which function is used to truncate a date-time object to the nearest hour?

a) trunc()
b) cut_time()
c) round_time()
d) truncate_hour()

15. How do you get the current time zone in R?

a) get_tz()
b) timezone()
c) Sys.timezone()
d) timeZone()

16. Which of the following is NOT a valid time zone argument in R?


a) "UTC"
b) "America/New_York"
c) "PST/EST"
d) "Europe/London"

17. How would you set the time zone of a date-time object in R?

a) time_zone(object, "timezone")
b) attr(date, "tzone") <- "timezone"
c) set_tz(date, "timezone")
d) tz.set(date, "timezone")

18. How do you calculate the time interval between two dates in R?

a) difftime(end_date, start_date)
b) interval(start_date, end_date)
c) time_difference(start_date, end_date)
d) diff_time(start_date, end_date)

19. What is the default unit of the difftime() function in R?

a) days
b) hours
c) seconds
d) weeks

20. How can you generate a sequence of times every hour for 5 hours starting from a specific date-
time in R?

a) time_sequence(start, "hour", 5)
b) seq.POSIXt(from = start_time, by = "hour", length.out = 5)
c) seq(from = start_time, by = "hour", length.out = 5)
d) seq_time(start_time, hours = 5)

21. How would you check if two time intervals overlap in R?

a) time_overlap(interval1, interval2)
b) lubridate::int_overlaps(interval1, interval2)
c) overlaps(interval1, interval2)
d) check_overlap(interval1, interval2)

22. Which package is commonly used in R for working with time intervals?

a) lubridate
b) chron
c) zoo
d) timeSeries

23. How do you create a time interval between two dates in R?


a) create_interval()
b) lubridate::interval(start_date, end_date)
c) set_interval(start_date, end_date)
d) time_interval(start_date, end_date)

24. What does Sys.Date() return in R?

a) The current date without time


b) The current time without the date
c) The current date and time
d) The current year

25. Which function would you use to round a date-time object to the nearest minute in R?

a) cut_time()
b) round.POSIXt()
c) trunc.POSIXt()
d) round_time()

UNIT-4
1. What is the purpose of a one-sample t-test?

A) To compare means of two independent samples


B) To test if a sample mean is significantly different from a known value
C) To compare proportions
D) To test if the sample variance is significantly different from a known value

2. In a two-sample t-test, the null hypothesis typically states that:

A) The means of the two groups are not equal


B) The means of the two groups are equal
C) The variances of the two groups are equal
D) The sample sizes of the two groups are equal

3. What is the primary difference between t-statistics and z-statistics?

A) Z-statistics is used for small samples only


B) T-statistics is used for known population variance
C) T-statistics is used for small samples or unknown population variance
D) There is no difference; they are interchangeable

4. A test of proportion is typically used to:

A) Compare means of two groups


B) Compare the observed proportion to a theoretical proportion
C) Test for variance differences
D) None of the above
5. ANOVA is used to:

A) Compare means across multiple groups


B) Determine if at least one group mean is different from others
C) Compare proportions
D) Assess variance in a single group

6. In one-way ANOVA, the null hypothesis states that:

A) All group means are different


B) All group means are equal
C) At least one group mean is different
D) Variances are equal across groups

7. The F-statistic in ANOVA is calculated as:

A) Mean of squares between groups divided by mean of squares within groups


B) Variance between groups divided by variance within groups
C) Sum of squares between groups divided by degrees of freedom
D) All of the above

8. In a multiple linear regression model, which of the following is an assumption?

A) All predictors are categorical


B) Errors are not normally distributed
C) There is no multicollinearity
D) There is a linear relationship between predictors and response

9. The purpose of residual analysis in regression is to:

A) Determine the goodness of fit


B) Check the assumptions of linear regression
C) Identify outliers
D) All of the above

10. Heteroskedasticity in regression refers to:

A) Constant variance of residuals


B) Non-normal distribution of residuals
C) Relationship between predictors
D) Non-constant variance of residuals

11. Which of the following tests can be used to detect autocorrelation?

A) Shapiro-Wilk test
B) Durbin-Watson test
C) t-test
D) Chi-squared test
12. What does robust standard error adjust for?

A) Non-normality of the response variable


B) Multicollinearity among predictors
C) Heteroskedasticity of residuals
D) Sample size issues

13. A significant p-value in regression indicates:

A) Strong correlation between predictors


B) The predictor variable has a significant effect on the response
C) No relationship between variables
D) The model is well fitted

14. To visualize regression results, one might use:

A) Boxplots
B) Histograms
C) Scatter plots with regression lines
D) Heatmaps

15. In simple linear regression, what does R-squared represent?

A) The slope of the regression line


B) The intercept of the regression line
C) The proportion of variance explained by the model
D) The total sum of squares

16. What is the main purpose of using multiple linear regression?

A) To predict the value of a response variable using multiple predictors


B) To visualize relationships
C) To compare means
D) To test for independence

17. Which of the following is NOT an assumption of linear regression?

A) Linearity
B) Independence of errors
C) Homoscedasticity
D) The response variable is normally distributed

18. The intercept in a regression equation represents:

A) The expected value of the response variable when all predictors are zero
B) The change in response variable per unit change in predictor
C) The average of the response variable
D) None of the above
19. In a two-way ANOVA, the main effects are:

A) The effect of one factor only


B) The combined effect of all factors
C) The effect of each factor considered individually
D) The interaction effect

20. What does the term “interaction” refer to in the context of ANOVA?

A) The main effect of one factor


B) The effect of two factors considered independently
C) The combined effect of two factors on the response variable
D) The total effect of all factors

21. The term "p-value" in hypothesis testing represents:

A) The probability of making a Type II error


B) The probability of making a Type I error
C) The probability of observing the data assuming the null hypothesis is true
D) None of the above

22. When is it appropriate to use a paired t-test?

A) When comparing means of independent groups


B) When the sample sizes are unequal
C) When comparing means from the same group at different times
D) When variances are unequal

23. In regression analysis, multicollinearity refers to:

A) Linear relationship between response and predictor


B) Relationship between residuals
C) High correlation among predictor variables
D) None of the above

24. Which of the following best describes the assumption of normality in regression?

A) The residuals should be normally distributed


B) The response variable should be normally distributed
C) The predictors should be normally distributed
D) There is no assumption regarding normality

25. What is the primary output of fitting a linear regression model?

A) Mean values of the predictors


B) Coefficients for each predictor
C) P-values for each predictor
D) Variance of the response variable
UNIT-5
1. What is the main purpose of quantile regression?

A) Minimize the squared differences


B) Predict the mean of the dependent variable
C) Estimate the conditional quantiles
D) Maximize the likelihood of predictions

2. In quantile regression, the loss function minimizes:

A) Mean squared error


B) Absolute deviation
C) Asymmetric absolute loss
D) Log-likelihood

3. Which method is most commonly used to handle outliers in quantile data?

A) Remove outliers
B) Normalize data
C) Robust estimation
D) Standardize data

4. What is the key difference between quantile regression and ordinary least squares regression
(OLS)?

A) OLS focuses on the median


B) Quantile regression is parametric
C) Quantile regression estimates conditional quantiles
D) OLS models non-linear relationships

5. Quantile regression can be particularly useful for:

A) Data with linear relationships


B) Skewed data distributions
C) Normally distributed data
D) Time series data

6. What type of diagnostic test is used to assess the fit of a quantile regression model?

A) Shapiro-Wilk test
B) Quantile residual analysis
C) Durbin-Watson test
D) AIC/BIC selection
7. In quantile regression, which of the following quantiles represents the median?

A) 0.25
B) 0.10
C) 0.50
D) 0.75

8. When manipulating quantile data, which method is most effective for visualization?

A) Line charts
B) Histograms
C) Quantile-Quantile (Q-Q) plots
D) Box-and-whisker plots

9. Which of the following is a common technique for treating outliers in quantile regression?

A) Replace with mean


B) Log transformation
C) Winsorization
D) Standardization

10. A key advantage of quantile regression over traditional linear regression is:

A) It’s faster to compute


B) It requires fewer assumptions
C) It is robust to heteroscedasticity
D) It provides only a single prediction

11. In residual analysis for quantile regression, residuals are:

A) Normally distributed
B) Skewed
C) Non-symmetric and centered around zero
D) Uniformly distributed

12. Logit and probit regression models are primarily used for:

A) Continuous data
B) Time series data
C) Binary classification
D) Clustering analysis

13. What is the key difference between Logit and Probit models?

A) Logit models use normal distribution


B) Probit models use logistic distribution
C) Logit models are non-parametric
D) Logit models use a logistic function, while Probit models use a normal CDF
14. Which diagnostic test is commonly applied to check the goodness of fit in a Logit model?

A) Durbin-Watson
B) Hosmer-Lemeshow test
C) ANOVA
D) Akaike Information Criterion

15. What is the purpose of robust estimation in quantile regression?

A) Improve precision in small datasets


B) Handle missing data
C) Reduce the influence of outliers
D) Improve interpretability

16. In a logistic regression model, the odds ratio can be interpreted as:

A) The difference in probabilities


B) The change in odds for a unit change in the predictor
C) The sum of squared differences
D) The likelihood of prediction error

17. In the context of outlier detection in quantile data, what is leverage?

A) A technique for scaling variables


B) A method for improving prediction accuracy
C) A measure of a data point's influence on model estimation
D) A tool for diagnosing model fit

18. In Probit regression, the link function is:

A) Exponential
B) Logistic
C) Linear
D) Cumulative normal distribution

19. When writing quantile data to a file, the most appropriate format is:

A) CSV
B) JSON
C) XML
D) YAML

20. What is the primary benefit of using quantile data visualization techniques like box plots?

A) Shows linear relationships


B) Highlights mean and standard deviation
C) Displays data spread and outliers
D) Estimates conditional probabilities
21. In residual analysis for a Logit model, residuals are typically:

A) Normally distributed
B) Not normally distributed
C) Uniformly distributed
D) Exponentially distributed

22. A key assumption in both Logit and Probit regression models is:

A) Homoscedasticity
B) Linearity between the log-odds and the predictors
C) Normality of residuals
D) Constant variance of error terms

23. In quantile regression, the primary measure of accuracy for outlier treatment is:

A) Root mean square error


B) Sum of squared errors
C) Quantile loss
D) Akaike information criterion

24. Which of the following tools is best suited for manipulating quantile data?

A) PCA
B) Linear regression
C) Quantile transformation
D) Lasso regression

25. When plotting a Q-Q plot for quantile data, what does a deviation from the line suggest?

A) Linear relationship
B) Homoscedasticity
C) Presence of outliers or non-normality
D) Strong correlation

You might also like