Statistics, Statistical Modelling and Data analytics_practicalfile_sj
Statistics, Statistical Modelling and Data analytics_practicalfile_sj
Practical File
sj
Index
Theory: Matrix operations are fundamental to linear algebra. They involve adding,
subtracting, multiplying matrices, and finding the transpose.
Code:
A = [1 2 3; 4 5 6; 7 8 9];
B = [9 8 7; 6 5 4; 3 2 1];
// Addition
C_add = A + B;
// Subtraction
C_sub = A - B;
// Multiplication
C_mul = A * B;
// Transpose
A_transpose = A';
// Display Results
disp(C_add, "Addition Result");
disp(C_sub, "Subtraction Result");
disp(C_mul, "Multiplication Result");
disp(A_transpose, "Transpose of A");
A_transpose = [1 4 7; 2 5 8; 3 6 9]
Experiment 2. Eigenvalues and Eigenvectors in Scilab
Theory: Eigenvalues and eigenvectors are key concepts in linear algebra used in various
applications, including stability analysis and quantum mechanics.
Code:
A = [4 2; 1 3];
[Eigenvalues, Eigenvectors] = spec(A);
disp(Eigenvalues, "Eigenvalues:");
disp(Eigenvectors, "Eigenvectors:");
Aim: To solve linear equations using Gauss Elimination, Gauss Jordan, and Gauss-Seidel
methods in Scilab.
Theory: These methods are techniques for solving systems of linear equations.
Code:
// Example System
A = [3 -0.1 -0.2; 0.1 7 -0.3; 0.3 -0.2 10];
B = [7.85; -19.3; 71.4];
// Gauss Elimination
X = A \ B;
disp(X, "Solution by Gauss Elimination:");
// Additional methods (code for Gauss Jordan and Gauss-Seidel can follow)
Code:
// Example Matrices
A = [1 2; 3 4];
B = [5 6; 7 8];
C = [9 10; 11 12];
// Associative Property
Assoc_LHS = A * (B * C);
Assoc_RHS = (A * B) * C;
// Commutative Property
Commute = A * B - B * A;
// Distributive Property
Dist_LHS = A * (B + C);
Dist_RHS = A * B + A * C;
Aim: To find the reduced row echelon form (RREF) of a matrix in Scilab.
Theory: The reduced row echelon form simplifies solving linear systems, making it a useful
tool in linear algebra.
Code:
A = [1 2 1; 2 4 0; 3 6 3];
RREF_A = rref(A);
disp(RREF_A, "Reduced Row Echelon Form of A:");
Aim: To plot functions and calculate their first and second derivatives in Scilab.
Theory: Function plotting and derivative computation are essential tools in calculus for
analyzing function behavior.
Code:
Output: (A plot showing the function, first derivative, and second derivative will be
generated.)
Experiment 7. Frequency Table in SPSS
Theory: A frequency table summarizes data by showing the number of occurrences of each
value.
Syntax:
FREQUENCIES VARIABLES=YourVariableName
/FORMAT=AVALUE TABLE
/ORDER=ANALYSIS.
Replace YourVariableName with the name of the variable for which you want to create a
frequency table.
Theory: Outliers are data points that differ significantly from other observations and may
affect analysis results.
Syntax:
EXAMINE VARIABLES=YourVariableName
/PLOT BOXPLOT
/STATISTICS EXTREME
/CINTERVAL 95
/MISSING LISTWISE.
This command generates boxplots and lists extreme values to identify potential outliers.
Aim
To determine the riskier project between two mutually exclusive investment options using statistical
analysis in SPSS.
Theory
Risk analysis of mutually exclusive projects involves comparing their standard deviations and
coefficients of variation. The project with higher variability in returns is considered riskier. The
coefficient of variation (CV) provides a standardized measure of dispersion that allows comparison
between different projects.
Code
* Import data
GET FILE='C:\Projects\project_data.sav'.
* Compare distributions
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
```
Output
```
Descriptive Statistics:
Project 1:
Mean: 15000
CV: 16.67%
Project 2:
Mean: 18000
CV: 20.00%
```
Experiment 10: Data Visualization and Diagnostics in R
Aim
To create scatter plots, residual plots, and identify outliers, leverage points, and influential
observations in a dataset using R.
Theory
Diagnostic plots help assess model assumptions and identify problematic observations. Scatter plots
show relationships between variables, residual plots help check linearity and homoscedasticity
assumptions, and leverage/influence measures identify observations that disproportionately affect
the model.
Code
library(ggplot2)
library(stats)
geom_point() +
theme_minimal()
# Residual plot
plot(model, which = 1)
# QQ plot
plot(model, which = 2)
# Calculate leverage
return(list(
leverage = leverage,
cooks_dist = cooks_dist,
outliers = outliers,
influential = influential
))
```
Output
Diagnostic Results:
- R² = 0.85
```
Experiment 11: Correlation Analysis using R
Aim
Theory
Correlation measures the strength and direction of the relationship between variables. Pearson
correlation assumes linear relationships and normal distribution, while Spearman and Kendall
correlations are non-parametric alternatives suitable for non-linear relationships or ordinal data.
Code
library(stats)
library(corrplot)
# Pearson correlation
# Spearman correlation
# Kendall correlation
# Test significance
return(list(
pearson = pearson_cor,
spearman = spearman_cor,
kendall = kendall_cor,
significance = cor_test
))
```
Output
Correlation Results:
Interpretation:
```
Experiment 12: Time Series Analysis using R
Aim
To perform basic time series analysis including decomposition, trend analysis, and forecasting using
R.
Theory
Time series analysis involves decomposing data into trend, seasonal, and random components.
Various methods like moving averages, exponential smoothing, and ARIMA models can be used for
analysis and forecasting.
Code
library(forecast)
library(tseries)
# Decompose series
# Check stationarity
# Plot results
plot(decomp)
plot(forecast)
return(list(
decomposition = decomp,
stationarity = adf_test,
model = model,
forecast = forecast
))
```
Output
- Trend: Increasing
```
Experiment 13: Linear Regression using R
Aim
To implement and evaluate simple and multiple linear regression models using R.
Theory
Linear regression models the relationship between dependent and independent variables. The
model assumptions include linearity, independence, homoscedasticity, and normality of residuals.
Model evaluation involves analyzing R², F-statistics, and coefficient significance.
Code
library(stats)
library(car)
# Fit model
# Model diagnostics
summary = summary(model),
vif = vif(model),
residuals = residuals(model),
fitted = fitted(model)
)
# Check assumptions
normality = shapiro.test(residuals(model)),
heteroscedasticity = ncvTest(model)
return(list(
model = model,
diagnostics = diagnostics,
assumptions = assumptions
))
```
Output
```
Regression Results:
R² = 0.823
Adjusted R² = 0.815
Coefficients:
Model Assumptions:
```
Experiment 14: Probability and Distributions in R
Aim
To implement and visualize various probability distributions and perform probability calculations
using R.
Theory
Probability distributions describe the likelihood of different outcomes in a random process. Common
distributions include normal, binomial, Poisson, and exponential distributions. Understanding these
distributions is crucial for statistical inference and modeling.
Code
library(stats)
library(ggplot2)
# Normal distribution
# Binomial distribution
# Poisson distribution
# Plot distributions
par(mfrow = c(2,2))
# Probability calculations
prob_calcs <- list(
return(list(
distributions = list(
normal = normal_dist,
binomial = binom_dist,
poisson = pois_dist
),
probabilities = prob_calcs
))
Output
Normal Distribution:
- Mean: 0.012
- SD: 0.987
Binomial Distribution:
- Mean: 4.98
- Variance: 2.51
- P(X ≤ 7) = 0.945
Poisson Distribution:
- Mean: 2.97
- Variance: 3.04
- P(X ≤ 5) = 0.916
```