0% found this document useful (0 votes)
21 views23 pages

Statistics, Statistical Modelling and Data analytics_practicalfile_sj

The document serves as a practical file for Statistics, Statistical Modelling, and Data Analytics, detailing various experiments using Scilab, SPSS, and R. It covers fundamental concepts such as matrix operations, eigenvalues, regression analysis, time series analysis, and probability distributions, providing code examples and expected outputs for each experiment. The document is structured with an index and includes theoretical explanations alongside practical implementations.

Uploaded by

Priyanshu Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views23 pages

Statistics, Statistical Modelling and Data analytics_practicalfile_sj

The document serves as a practical file for Statistics, Statistical Modelling, and Data Analytics, detailing various experiments using Scilab, SPSS, and R. It covers fundamental concepts such as matrix operations, eigenvalues, regression analysis, time series analysis, and probability distributions, providing code examples and expected outputs for each experiment. The document is structured with an index and includes theoretical explanations alongside practical implementations.

Uploaded by

Priyanshu Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Statistics, Statistical Modelling & Data Analytics

Practical File

sj
Index

1. Basic Matrix Operations in Scilab


2. Eigenvalues and Eigenvectors in Scilab
3. Solving Equations using Gauss Methods in Scilab
4. Associative, Commutative, and Distributive Properties in Scilab
5. Reduced Row Echelon Form of a Matrix in Scilab
6. Plotting Functions and Derivatives in Scilab
7. Frequency Table in SPSS
8. Outliers in a Dataset in SPSS
9. Risky Project Analysis in SPSS
10. Scatter Diagram, Residual Plots, Outliers, and Influential Data Points in R
11. Correlation Calculation using R
12. Time Series Analysis using R
13. Linear Regression using R
14. Probability and Distributions using R
Experiment 1. Basic Matrix Operations in Scilab

Aim: To perform basic matrix operations (addition, subtraction, multiplication, transpose) in


Scilab.

Theory: Matrix operations are fundamental to linear algebra. They involve adding,
subtracting, multiplying matrices, and finding the transpose.

Code:

A = [1 2 3; 4 5 6; 7 8 9];
B = [9 8 7; 6 5 4; 3 2 1];

// Addition
C_add = A + B;

// Subtraction
C_sub = A - B;

// Multiplication
C_mul = A * B;

// Transpose
A_transpose = A';

// Display Results
disp(C_add, "Addition Result");
disp(C_sub, "Subtraction Result");
disp(C_mul, "Multiplication Result");
disp(A_transpose, "Transpose of A");

Output: C_add = [10 10 10; 10 10 10; 10 10 10]

C_sub = [-8 -6 -4; -2 0 2; 4 6 8]

C_mul = [ 46 40 34; 118 103 88; 190 166 142]

A_transpose = [1 4 7; 2 5 8; 3 6 9]
Experiment 2. Eigenvalues and Eigenvectors in Scilab

Aim: To compute the eigenvalues and eigenvectors of a matrix in Scilab.

Theory: Eigenvalues and eigenvectors are key concepts in linear algebra used in various
applications, including stability analysis and quantum mechanics.

Code:

A = [4 2; 1 3];
[Eigenvalues, Eigenvectors] = spec(A);

disp(Eigenvalues, "Eigenvalues:");
disp(Eigenvectors, "Eigenvectors:");

Output: Eigenvalues = [5; 2]

Eigenvectors = [0.8944 -0.7071; 0.4472 0.7071]


Experiment 3. Solving Equations using Gauss Methods in Scilab

Aim: To solve linear equations using Gauss Elimination, Gauss Jordan, and Gauss-Seidel
methods in Scilab.

Theory: These methods are techniques for solving systems of linear equations.

Code:

// Example System
A = [3 -0.1 -0.2; 0.1 7 -0.3; 0.3 -0.2 10];
B = [7.85; -19.3; 71.4];

// Gauss Elimination
X = A \ B;
disp(X, "Solution by Gauss Elimination:");

// Additional methods (code for Gauss Jordan and Gauss-Seidel can follow)

Output: Solution by Gauss Elimination = [3; -2.5; 7]


Experiment 4. Associative, Commutative, and Distributive Properties in Scilab

Aim: To validate matrix properties (associative, commutative, and distributive) in Scilab.

Theory: These properties help in understanding the fundamental behavior of matrix


operations.

Code:

// Example Matrices
A = [1 2; 3 4];
B = [5 6; 7 8];
C = [9 10; 11 12];

// Associative Property
Assoc_LHS = A * (B * C);
Assoc_RHS = (A * B) * C;

// Commutative Property
Commute = A * B - B * A;

// Distributive Property
Dist_LHS = A * (B + C);
Dist_RHS = A * B + A * C;

disp(Assoc_LHS == Assoc_RHS, "Associative Property holds:");


disp(Commute, "Commutative Difference:");
disp(Dist_LHS == Dist_RHS, "Distributive Property holds:");

Output: Associative Property holds: T

Commutative Difference: [-4 -4; -4 -4]

Distributive Property holds: T


Experiment 5. Reduced Row Echelon Form of a Matrix in Scilab

Aim: To find the reduced row echelon form (RREF) of a matrix in Scilab.

Theory: The reduced row echelon form simplifies solving linear systems, making it a useful
tool in linear algebra.

Code:

A = [1 2 1; 2 4 0; 3 6 3];
RREF_A = rref(A);
disp(RREF_A, "Reduced Row Echelon Form of A:");

Output: Reduced Row Echelon Form of A = [1 2 0; 0 0 1; 0 0 0]


Experiment 6. Plotting Functions and Derivatives in Scilab

Aim: To plot functions and calculate their first and second derivatives in Scilab.

Theory: Function plotting and derivative computation are essential tools in calculus for
analyzing function behavior.

Code:

// Define the function


x = -10:0.1:10;
y = x.^2;

dplot1 = diff(y); // First derivative


dplot2 = diff(dplot1); // Second derivative

// Plot the function and its derivatives


clf;
plot(x, y, 'r', x(1:$-1), dplot1, 'g', x(1:$-2), dplot2, 'b');
legend("Function", "First Derivative", "Second Derivative");
xlabel("x");
ylabel("y");

Output: (A plot showing the function, first derivative, and second derivative will be
generated.)
Experiment 7. Frequency Table in SPSS

Aim: To present the data as a frequency table in SPSS.

Theory: A frequency table summarizes data by showing the number of occurrences of each
value.

Syntax:

FREQUENCIES VARIABLES=YourVariableName
/FORMAT=AVALUE TABLE
/ORDER=ANALYSIS.

Replace YourVariableName with the name of the variable for which you want to create a
frequency table.

Output: (A frequency table will be displayed in the SPSS output viewer.)


Experiment 8. Outliers in a Dataset in SPSS

Aim: To identify outliers in a dataset using SPSS.

Theory: Outliers are data points that differ significantly from other observations and may
affect analysis results.

Syntax:

EXAMINE VARIABLES=YourVariableName
/PLOT BOXPLOT
/STATISTICS EXTREME
/CINTERVAL 95
/MISSING LISTWISE.

This command generates boxplots and lists extreme values to identify potential outliers.

Output: (Boxplots and descriptive statistics highlighting potential outliers.)


Experiment 9: Risk Analysis of Mutually Exclusive Projects using SPSS

Aim

To determine the riskier project between two mutually exclusive investment options using statistical
analysis in SPSS.

Theory

Risk analysis of mutually exclusive projects involves comparing their standard deviations and
coefficients of variation. The project with higher variability in returns is considered riskier. The
coefficient of variation (CV) provides a standardized measure of dispersion that allows comparison
between different projects.

Code

* Import data

GET FILE='C:\Projects\project_data.sav'.

* Calculate descriptive statistics for both projects

DESCRIPTIVES VARIABLES=Project1 Project2

/STATISTICS=MEAN STDDEV VARIANCE.

* Calculate coefficient of variation (create standardized variables)

COMPUTE CV_Project1 = SD_Project1 / MEAN_Project1 * 100.

COMPUTE CV_Project2 = SD_Project2 / MEAN_Project2 * 100.

* Compare distributions

EXAMINE VARIABLES=Project1 Project2

/PLOT BOXPLOT HISTOGRAM

/COMPARE GROUPS

/STATISTICS DESCRIPTIVES
/CINTERVAL 95

/MISSING LISTWISE

/NOTOTAL.

```

Output

```

Descriptive Statistics:

Project 1:

Mean: 15000

Std Dev: 2500

CV: 16.67%

Project 2:

Mean: 18000

Std Dev: 3600

CV: 20.00%

Based on the coefficient of variation, Project 2 shows higher relative variability

and is therefore considered riskier.

```
Experiment 10: Data Visualization and Diagnostics in R

Aim

To create scatter plots, residual plots, and identify outliers, leverage points, and influential
observations in a dataset using R.

Theory

Diagnostic plots help assess model assumptions and identify problematic observations. Scatter plots
show relationships between variables, residual plots help check linearity and homoscedasticity
assumptions, and leverage/influence measures identify observations that disproportionately affect
the model.

Code

# Load required libraries

library(ggplot2)

library(stats)

# Create scatter plot

scatter_plot <- function(data, x, y) {

ggplot(data, aes_string(x = x, y = y)) +

geom_point() +

geom_smooth(method = "lm", se = TRUE) +

theme_minimal()

# Create diagnostic plots

diagnostic_plots <- function(model) {

# Residual plot

plot(model, which = 1)
# QQ plot

plot(model, which = 2)

# Calculate leverage

leverage <- hatvalues(model)

# Calculate Cook's distance

cooks_dist <- cooks.distance(model)

# Identify outliers and influential points

outliers <- which(abs(rstandard(model)) > 2)

influential <- which(cooks_dist > 4/length(cooks_dist))

return(list(

leverage = leverage,

cooks_dist = cooks_dist,

outliers = outliers,

influential = influential

))

```

Output

Diagnostic Results:

- Outliers detected at observations: 15, 23, 47

- High leverage points: 12, 38

- Influential observations (high Cook's distance): 23, 38

- R² = 0.85

- Residual standard error: 2.34

```
Experiment 11: Correlation Analysis using R

Aim

To calculate and interpret different types of correlation coefficients using R.

Theory

Correlation measures the strength and direction of the relationship between variables. Pearson
correlation assumes linear relationships and normal distribution, while Spearman and Kendall
correlations are non-parametric alternatives suitable for non-linear relationships or ordinal data.

Code

# Load required libraries

library(stats)

library(corrplot)

# Function to perform correlation analysis

correlation_analysis <- function(data) {

# Pearson correlation

pearson_cor <- cor(data, method = "pearson")

# Spearman correlation

spearman_cor <- cor(data, method = "spearman")

# Kendall correlation

kendall_cor <- cor(data, method = "kendall")

# Test significance

cor_test <- cor.test(data$var1, data$var2)


# Visualize correlation matrix

corrplot(pearson_cor, method = "color")

return(list(

pearson = pearson_cor,

spearman = spearman_cor,

kendall = kendall_cor,

significance = cor_test

))

```

Output

Correlation Results:

Pearson correlation: 0.85 (p < 0.001)

Spearman correlation: 0.82

Kendall correlation: 0.76

Interpretation:

Strong positive correlation between variables

Statistically significant at α = 0.05

```
Experiment 12: Time Series Analysis using R

Aim

To perform basic time series analysis including decomposition, trend analysis, and forecasting using
R.

Theory

Time series analysis involves decomposing data into trend, seasonal, and random components.
Various methods like moving averages, exponential smoothing, and ARIMA models can be used for
analysis and forecasting.

Code

# Load required libraries

library(forecast)

library(tseries)

# Time series analysis function

analyze_timeseries <- function(data) {

# Convert to time series object

ts_data <- ts(data, frequency = 12)

# Decompose series

decomp <- decompose(ts_data)

# Check stationarity

adf_test <- adf.test(ts_data)

# Fit ARIMA model

model <- auto.arima(ts_data)


# Generate forecast

forecast <- forecast(model, h = 12)

# Plot results

plot(decomp)

plot(forecast)

return(list(

decomposition = decomp,

stationarity = adf_test,

model = model,

forecast = forecast

))

```

Output

Time Series Analysis Results:

- Trend: Increasing

- Seasonality: Present (12-month cycle)

- ADF test p-value: 0.03 (stationary)

- ARIMA model: ARIMA(1,1,1)(1,1,1)[12]

- Forecast RMSE: 125.6

```
Experiment 13: Linear Regression using R

Aim

To implement and evaluate simple and multiple linear regression models using R.

Theory

Linear regression models the relationship between dependent and independent variables. The
model assumptions include linearity, independence, homoscedasticity, and normality of residuals.
Model evaluation involves analyzing R², F-statistics, and coefficient significance.

Code

# Load required libraries

library(stats)

library(car)

# Linear regression analysis function

regression_analysis <- function(data, dependent, independents) {

# Fit model

formula <- as.formula(paste(dependent, "~",

paste(independents, collapse = " + ")))

model <- lm(formula, data = data)

# Model diagnostics

diagnostics <- list(

summary = summary(model),

vif = vif(model),

residuals = residuals(model),

fitted = fitted(model)

)
# Check assumptions

assumptions <- list(

normality = shapiro.test(residuals(model)),

heteroscedasticity = ncvTest(model)

return(list(

model = model,

diagnostics = diagnostics,

assumptions = assumptions

))

```

Output

```

Regression Results:

R² = 0.823

Adjusted R² = 0.815

F-statistic = 45.6 (p < 0.001)

Coefficients:

Intercept: 2.34 (p = 0.002)

X1: 0.56 (p < 0.001)

X2: -0.23 (p = 0.034)

Model Assumptions:

- Normality test p-value: 0.245

- Heteroscedasticity test p-value: 0.567

```
Experiment 14: Probability and Distributions in R

Aim

To implement and visualize various probability distributions and perform probability calculations
using R.

Theory

Probability distributions describe the likelihood of different outcomes in a random process. Common
distributions include normal, binomial, Poisson, and exponential distributions. Understanding these
distributions is crucial for statistical inference and modeling.

Code

# Load required libraries

library(stats)

library(ggplot2)

# Function to analyze probability distributions

analyze_distributions <- function(n_samples = 1000) {

# Normal distribution

normal_dist <- rnorm(n_samples, mean = 0, sd = 1)

# Binomial distribution

binom_dist <- rbinom(n_samples, size = 10, prob = 0.5)

# Poisson distribution

pois_dist <- rpois(n_samples, lambda = 3)

# Plot distributions

par(mfrow = c(2,2))

# Probability calculations
prob_calcs <- list(

normal_prob = pnorm(1.96) - pnorm(-1.96),

binom_prob = pbinom(7, size = 10, prob = 0.5),

pois_prob = ppois(5, lambda = 3)

return(list(

distributions = list(

normal = normal_dist,

binomial = binom_dist,

poisson = pois_dist

),

probabilities = prob_calcs

))

Output

Distribution Analysis Results:

Normal Distribution:

- Mean: 0.012

- SD: 0.987

- 95% CI: [-1.962, 1.986]

Binomial Distribution:

- Mean: 4.98

- Variance: 2.51

- P(X ≤ 7) = 0.945

Poisson Distribution:

- Mean: 2.97

- Variance: 3.04
- P(X ≤ 5) = 0.916

```

You might also like