ASM using r 2 marks answer Keys
ASM using r 2 marks answer Keys
Explanation of terms:
• True Positive (TP): Correctly predicted positive cases.
• True Negative (TN): Correctly predicted negative cases.
• False Positive (FP): Incorrectly predicted positive cases (Type I error).
• False Negative (FN): Incorrectly predicted negative cases (Type II error).
From this table, we can calculate various performance metrics:
• Accuracy: Overall correctness of the model.
o Accuracy = (TP + TN) / (TP + TN + FP + FN)
• Precision: Proportion of positive predictions that are correct.
o Precision = TP / (TP + FP)
• Recall (Sensitivity): Proportion of actual positive cases correctly identified.
o Recall = TP / (TP + FN)
• Specificity: Proportion of actual negative cases correctly identified.
o Specificity = TN / (TN + FP)
• F1-score: Harmonic mean of precision and recall.
o F1-score = 2 * (Precision * Recall) / (Precision + Recall)
By analyzing these metrics, we can assess the model's performance and make informed
decisions about its suitability for a particular application.
a) Enlist basic statistical functions in R.
Here are some fundamental statistical functions in R:
• mean(x): Calculates the mean (average) of the values in the vector x.
• median(x): Finds the median value of the vector x.
• sd(x): Computes the standard deviation of the values in x.
• var(x): Calculates the variance of the values in x.
• summary(x): Provides a summary of the data in x, including quartiles, mean, median,
min, and max.
• cor(x, y): Computes the correlation between two vectors x and y.
• table(x): Creates a frequency table for categorical data in x.
• hist(x): Plots a histogram of the data in x.
• boxplot(x): Creates a boxplot to visualize the distribution of x.
• t.test(x, y): Performs a t-test to compare means of two groups.
• anova(model): Conducts an analysis of variance (ANOVA) to compare means of
multiple groups.
b) What is the difference between parametric and non-parametric tests?
Parametric tests assume that the data comes from a specific probability distribution (like the
normal distribution) and that certain parameters (like the mean and standard deviation) are
known or can be estimated. Examples include t-tests and ANOVA.
Non-parametric tests make fewer assumptions about the data distribution. They are often
used when the data is not normally distributed or when the sample size is small. Examples
include the Wilcoxon rank-sum test and the Kruskal-Wallis test.
c) Define predictive analytics.
Predictive analytics is a field of data mining that uses statistical models, machine learning
algorithms, and other techniques to predict future outcomes based on historical data. It
helps organizations make informed decisions and anticipate future trends.
d) Explain pbinom() function in R.
The pbinom() function in R calculates the cumulative distribution function (CDF) of the
binomial distribution. It gives the probability of getting at most a certain number of
successes in a given number of trials with a specified probability of success.
e) How do you interpret the p-value in hypothesis testing?
The p-value is the probability of observing a test statistic as extreme or more extreme than
the one calculated from the sample data, assuming the null hypothesis is true.
• If the p-value is less than the significance level (usually 0.05), we reject the null
hypothesis.
• If the p-value is greater than the significance level, we fail to reject the null
hypothesis.
f) Write a function to get a list of all the packages installed in R.
Code snippet
get_installed_packages <- function() {
installed_packages <- rownames(installed.packages())
return(installed_packages)
}
g) Write a function to obtain the transpose of a matrix in R.
Code snippet
transpose_matrix <- function(x) {
t(x)
}
h) What is the purpose of regression analysis in R?
Regression analysis is a statistical method used to model the relationship between a
dependent variable and one or more independent variables. In R, it helps understand how
changes in the independent variables affect the dependent variable, and it can be used for
prediction and inference.
a) Define NULL and Alternate hypothesis.
In hypothesis testing, we make claims about a population parameter.
• Null Hypothesis (H₀): This is the default assumption, often a statement of no effect or
no difference.
• Alternative Hypothesis (H₁): This is the claim we want to test, often the opposite of
the null hypothesis.
Example:
• Null Hypothesis (H₀): The mean height of a population is 170 cm.
• Alternative Hypothesis (H₁): The mean height of the population is not 170 cm.
b) Define statistical modeling.
Statistical modeling involves using mathematical and statistical techniques to represent real-
world phenomena. It helps us understand, predict, and make decisions based on data.
Statistical models can be simple or complex, depending on the nature of the data and the
research question.
c) What is adjusted R² in regression analysis?
Adjusted R² is a modified version of the R² statistic that adjusts for the number of predictors
in a regression model. It penalizes the addition of unnecessary predictors that might not
significantly improve the model's fit. A higher adjusted R² indicates a better-fitting model,
even when more predictors are added.
d) Explain Unlist() function.
The unlist() function in R is used to convert a list into a vector. It flattens the list by
combining all its elements into a single vector. This is useful when you want to perform
operations on the individual elements of a list as if they were a single vector.
e) Explain aov() function.
The aov() function in R is used to perform analysis of variance (ANOVA), which is a statistical
technique to compare means of multiple groups. It helps determine if there are significant
differences between the means of the groups.
f) What is logistic regression?
Logistic regression is a statistical method used to model the probability of a binary outcome
(e.g., success or failure, yes or no) based on one or more predictor variables. It is widely
used in fields like healthcare, finance, and marketing.
g) Define Predictive analytics.
Predictive analytics is a field of data mining that uses statistical models, machine learning
algorithms, and other techniques to predict future outcomes based on historical data. It
helps organizations make informed decisions and anticipate future trends.
h) How many predictor variables must be used in multiple regression?
The number of predictor variables in multiple regression can vary depending on the
complexity of the model and the research question. There is no fixed rule, but generally, you
can use as many predictors as necessary to explain the variation in the dependent variable.
However, adding too many predictors can lead to overfitting, so it's important to balance the
model's complexity with its predictive power.