0% found this document useful (0 votes)
4 views

AI Regression Questions

The document consists of a series of regression-related questions covering topics such as the distinction between regression and classification, variance of random variables, goodness of fit, correlation coefficients, and the impact of multicollinearity. It also includes practical problems involving ANOVA tables, regression equations, and statistical significance. Overall, it serves as a comprehensive guide for understanding and applying regression analysis concepts.

Uploaded by

Shreeya Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

AI Regression Questions

The document consists of a series of regression-related questions covering topics such as the distinction between regression and classification, variance of random variables, goodness of fit, correlation coefficients, and the impact of multicollinearity. It also includes practical problems involving ANOVA tables, regression equations, and statistical significance. Overall, it serves as a comprehensive guide for understanding and applying regression analysis concepts.

Uploaded by

Shreeya Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

AI Regression Questions

1. Distinguish between regression and classification.

2. If f(x) is a valid density function, write an expression for the variance of X.

3. ‘Logistic regression is a type of regression’ – Is this statement true or false? Justify

your answer.

4. Define a Random Variable.

5. A straight line is fit for a given bivariate sample data of size 1000. If the mean of the

residue (e) is found to be 36 and the variance of the residue if found to be 324, write

your observation on the goodness of fit of the straight line.

6. If the correlation coefficient between two variables in a sample of 10 is 0.58, what is

your observation on the relationship between the two variables?

7. Define Regression of Y on X.

8. If the covariance of two variables X and Y, from a sample of size 100, is found to be

36 from a sample, and their variances are 25 and 81 respectively, what is their

correlation coefficient?

9. If a random variable X has a mean of X , then what is the mean of the Random

Variable Y = X - X ?

10. A regression equation y=m*x+c is fit to a sample data set. A statistic MSR/MSE is

calculated. What is the distribution of this statistic?

11. A model was fit for a data with two independent variables X 1 and X2 and one

dependent variable Y. The coefficients w 0, w1 and w2 were found to be 67.67, 5.56

and -0.60 respectively.

ANOVA was performed on the model and the following results were obtained.

Source df SS MS F p
Regressio 1350.75 675.378
2
n 7 4 23.4581 1.29E-

489.443 28.7907 6 05
Residual 17
1 7

Total 19 1840.2

Explain each value in the ANOVA table and write your conclusion on the goodness of

fit of the model.

12. List two issues you would face because of multicollinearity.

13. In a linear regression of a bivariate data y=ax+b, which of the following statements

are true.

a. Both a and b are Constants

b. Both a and b are Random variables

c. b is a constant while a is a random variable


14. The following table gives a sample data set where the variable y is expected to be

linearly dependent on x1 and x2. Determine the coefficients of the regression

equation. Assume no multicollinearity exists.

y x1 x2

7 2 1

12 3 2

5 1 1

16 2 4

23 4 5

20 7 2

13 2 3

6 3 0

5 1 1

12 0 4

15. A regression equation y=m*x+c is fit to a sample data set. A statistic c/SE c is

calculated. What is the distribution of this statistic?

16. Define a Random Variable.

17. If the correlation coefficient between two variables in a sample of 10 is 0.58, what is

your observation on the relationship between the two variables?

18. Define Regression of Y on X.

19. Under what conditions do you say two variables are independent?

20. If X and Y are continuous Random Variables, prove that E[X+Y] = E[X]+E[Y]
21. If the covariance of two variables X and Y, from a sample of size 100, is found to be

36 from a sample, and their variances are 25 and 81 respectively, what is their

correlation coefficient?

22. If a random variable X has a mean of X , then what is the mean of the Random

Variable Y = X - X ?

23. The following table has the data of the total costs and the number of units produced

by a company.

Total Cost Y 25 11 34 23 32

Units Produced X 5 2 8 4 6

Determine the parameters of the linear regression equation of Y on X

24. List two methods of estimating the regression parameters

25. What is the statistic used to find the significant independent variables in case of

multiple regression?

26. What does the P statistic indicate in the Analysis of Variance?

27. In a regression equation Y = βX + ε, the error term ε ~ N(0, 𝜎 2 ). True or False?

28. If a random variable X has a mean of X̅ , then what is the mean of the Random

Variable Y = X -X̅ ?

29. What is the impact of multicollinearity in a regression problem?

30. The following table gives a sample representing the total costs and the number of

units produced by a company.

Total Cost (Y) 25 11 34 23 32

Units Produced (X) 5 2 8 4 6

a. Find the correlation coefficient between X and Y

b. Determine the linear regression line relating Y to X


31. The following table gives the sample values of two variables, X and Y.

X Y

2 14

5 55

8 130

9 175

13 350

A quadratic model is fit for this sample data and the regression equation was found

to be 𝑦 = 2𝑥 2 + 5 Find the F value for this model and also estimate the p value at

95% confidence

32. If X and Y are continuous Random Variables, prove that E[X+Y] = E[X]+E[Y]

33. If the covariance of two variables X and Y, from a sample of size 100, is found to be

36 from a sample, and their variances are 25 and 81 respectively, what is their

correlation coefficient? Is it significant?

34. A straight line is fit for a given bivariate sample data of size 1000. If the mean of the

residue (ε) is found to be 36 and the variance of the residue if found to be 324, write

your observation on the goodness of fit of the straight line.


35. A straight-line, y=3*x+4 was fit for the following data using least square method for

parameter estimation.

x y

1 8

5 17

7 25

13 41

15 51

16 52

18 57

20 65

23 76 Set up an ANOVA table for the above model

36. If r is the mean square error, express r in terms of bias and variance.

37. 36 observations were made of two variables. Based on this sample, the correlation

coefficient was found to be 0.275. Is this a significant value?

38. Find mean and standard deviation of the price of banana if the at various locations of

Bengaluru is found to be Rs. 8, 5, 4, 12, 18, 17, 11, 9, 1, 3, 4, 22, and 4

39. The standard deviation of two variables, based on 19 observations, is 5.34 and 5.4.

The covariance between these two variables, based on the same observations, is -

21.00. Estimate the correlation coefficient between these two variables

40. Three models were proposed describing the relation between two variables. During

the analysis of variance, the F statistic was found to be 0.58, 0.83 and 0.60 for these

three models respectively. Which model should be chosen?

41. In a regression model, the errors follow which distribution?


42. If the correlation coefficient between variables X and Y is -0.73, then which of the

following statements are true?

a. X causes Y to decrease as it increases

b. Y causes X to decreases as it increases

c. There is a good linear relationship between X and Y

d. X decreases as Y increases

43. What is the statistic used to find the significant independent variables in case of

multiple regression?

44. In a linear regression of a bivariate data y=ax+b, which of the following statements

are true.

a. Both a and b are Constants

b. Both a and b are Random variables

c. b is a constant while a is a random variable


n

45. If x is defined as
∑ xi then x is an unbiased estimate of the population mean. True
i=1
,
n

or False?

46. Define Random Variable

47. In a bivariate data, write the equations for determining the value of the regression

coefficients which will minimize the sum of the squared errors.

48. 100 observations were made of two variables. Based on this sample, the correlation

coefficient was found to be 0.275. Is this a significant value?

49. Write the condition to be satisfied for two random variables to be

a. mutually exclusive,

b. Independent.
50. The scatter plot of a set of observed values is shown below. Write your observation

on the relation between the two variables.

9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0 5 10 15 20 25

51. A set of observed values of pulp production in metric tons and world pulp price in

rupees was analyzed and the following statistics were calculated.

a. Mean of pulp production

b. Mean of world pulp price

c. Standard deviation of pulp production

d. Correlation coefficient between pulp production and world pulp price

What are the units of these statistics?

52. If a random variable X has a mean of X , then what is the mean of the Random

Variable Y = X - X ?

53. Given a sample of bivariate data, list the steps to be followed to build a prediction

model.

54. The following table has the data of the total costs and the number of units produced

by a company.

Total Cost Y 25 11 34 23 32

Units Produced 5 2 8 4 6

X
a. Calculate the Correlation Coefficient rxy. Is it significant?

b. Determine the parameters of the linear regression equation of Y on X

You might also like