0% found this document useful (0 votes)

8 views

Group_2_Practical

The document outlines an experiment on probability distributions, focusing on linear regression and its applications. It details objectives such as understanding probability distributions, dataset specifications, data pre-processing, model selection, training, and evaluation. Additionally, it explains key probability distributions (Normal, Poisson, and Bernoulli) and their applications in various fields.

Uploaded by

haldankarjatin286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Group_2_Practical

Uploaded by

haldankarjatin286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Experiment No: 2 Probability Distributions

Objectives:

● Intuition
○ Understanding the basics of linear regression
○ Conceptualizing how linear regression works

● Dataset Specification
○ Describing the dataset used in the practical
○ Identifying the variables and their significance

● Data Pre-processing
○ Cleaning and handling missing data
○ Feature scaling and normalization
○ Encoding categorical variables

● Data Splitting
○ Dividing the dataset into training and testing sets
○ Determining the split ratio

● Model Selection
○ Choosing linear regression as the predictive model
○ Justification for selecting linear regression

● Model Training
○ Implementing a simple linear regression model
○ Training the model using the training dataset

● Model Evaluation

○ Calculating the mean squared error (MSE)

○ Computing the R-squared value to assess model fit
○ Interpreting the results

● Generalization and Application

○ Discussing the practical applications of linear regression
○ Generalization of the model for real-world scenarios
○ Sharing insights and takeaways
Intuition

Understanding what is probability distribution:

A probability distribution is a mathematical function or a model that describes how the values of a
random variable are distributed. In other words, it tells you the likelihood of various possible
outcomes for a random event or experiment. Probability distributions are fundamental in statistics
and probability theory and are used to understand, analyse, and model uncertainty in various real-
world scenarios.

There are two main types of probability distributions:

• Discrete Probability Distribution:

o In a discrete distribution, the random variable can only take on specific, separate
values.
o Each value has an associated probability.
o Common examples of discrete distributions include the Bernoulli distribution (binary
outcomes), the Poisson distribution (counting events), and the binomial distribution
(number of successes in fixed trials).
• Continuous Probability Distribution:
o In a continuous distribution, the random variable can take on any value within a range
or interval.
o The distribution is described by a probability density function (PDF) rather than
specific probabilities for each value.
o The normal distribution, also known as the Gaussian distribution, is a well-known
example of a continuous distribution and is used to model many natural phenomena.

Normal (Gaussian) Distribution:

• The normal distribution, often referred to as the Gaussian distribution, is one of the most
important and widely used probability distributions.
• It is characterized by a bell-shaped curve with two parameters: the mean (μ) and the standard
deviation (σ).
• Many natural phenomena, such as heights, weights, and measurement errors, closely follow a
normal distribution.
• The distribution is symmetric and unimodal, with a mean of μ and a variance of σ^2.

Poisson Distribution:

• The Poisson distribution models the number of events occurring in a fixed interval of time or
space.
• It is characterized by a single parameter λ (lambda), which represents the average rate of
occurrence.
• The distribution is discrete and often used for rare events or phenomena that occur randomly
but with a known average rate.

Bernoulli Distribution:
• The Bernoulli distribution models a binary outcome with two possible results: success (1) and
failure (0).
• It is characterized by a single parameter p, representing the probability of success.
• The Bernoulli distribution is a fundamental building block for modelling binary events and
serves as the basis for the binomial distribution, which models the number of successes in a
fixed number of independent Bernoulli trials.

Dataset Specification

In the context of our practical project, we utilized carefully generated datasets that align with specific
probability distributions. These datasets were designed to capture the characteristics of three key
probability distributions, namely the Normal, Poisson, and Bernoulli distributions.

1. Normal Distribution Dataset:

• We crafted a dataset to closely mimic the properties of the normal distribution. This dataset
was created by generating random data points that follow a bell-shaped curve. The data is
characterized by a defined mean (average) and standard deviation (spread). By leveraging
this dataset, we aimed to demonstrate the application of the normal distribution in modelling
real-world phenomena, such as measurements, where data often exhibits a symmetrical
distribution around a central value.

2. Poisson Distribution Dataset:

• To simulate the Poisson distribution, we generated a dataset that emulates the occurrence of
events within a fixed interval. The data in this dataset is characterized by a known average
event rate (λ). The dataset was crafted to illustrate the properties of the Poisson distribution,
which is commonly used to model rare and unpredictable events, such as customer arrivals at
a store, machine failures, or accidents.

3. Bernoulli Distribution Dataset:

• For the Bernoulli distribution, we created a dataset representing binary outcomes with two
possible results, typically labelled as "success" and "failure." The probability of success (p)
was predefined in the dataset, allowing us to illustrate the concept of a Bernoulli trial and
how it serves as the foundation for more complex models like the binomial distribution. This
dataset highlights the application of the Bernoulli distribution in modelling events with only
two possible outcomes, such as coin flips or product defect detection.
Program:
Normal_distribution.py
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters for the normal distribution

mu = 0 # Mean
sigma = 1 # Standard deviation

# Generate random data from a normal distribution

data = np.random.normal(mu, sigma, 1000)

# Plot the histogram of the data

plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Create a PDF (Probability Density Function) for the distribution

x = np.linspace(-3, 3, 100)
pdf = norm.pdf(x, mu, sigma)
plt.plot(x, pdf, 'k-', lw=2)

plt.title("Normal Distribution")
plt.show()

Poisson_distribution.py
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

# Parameter for the Poisson distribution (lambda)

lambda_ = 5 # Average number of events in a fixed interval

# Generate random data from a Poisson distribution

data = poisson.rvs(lambda_, size=1000)

# Plot the histogram of the data

plt.hist(data, bins=range(15), density=True, alpha=0.6, color='b')

# Create a PMF (Probability Mass Function) for the distribution

x = np.arange(0, 15)
pmf = poisson.pmf(x, lambda_)
plt.vlines(x, 0, pmf, colors='k', lw=2)

plt.title("Poisson Distribution Example (λ = 5)")

plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()

Bernoulli_distribution.py
from scipy.stats import bernoulli
# Parameter for the Bernoulli distribution (probability of success)
p = 0.3

# Generate random data from a Bernoulli distribution

data = bernoulli.rvs(p, size=100)

# Plot the histogram of the data

plt.hist(data, bins=2, density=True, alpha=0.6, color='r')

# Create a PMF for the distribution

x = [0, 1]
pmf = bernoulli.pmf(x, p)
plt.vlines(x, 0, pmf, colors='k', lw=2)

plt.title("Bernoulli Distribution")
plt.xticks([0, 1], ['Failure', 'Success'])
plt.show()
Output:
Normal Distribution:

• This figure represents a continuous probability distribution, specifically the normal

distribution (Gaussian distribution).
• The histogram in green shows a simulated dataset generated from a normal distribution with a
mean (μ) of 0 and a standard deviation (σ) of 1.
• The black line represents the probability density function (PDF) of the normal distribution. It
shows the theoretical probability of observing each value in the range.
Poisson Distribution:

• This figure represents the Poisson distribution, which models the number of events occurring
in a fixed interval of time or space.
• The histogram in blue displays a simulated dataset generated from a Poisson distribution with
a parameter λ (lambda) set to 3.
• The black vertical bars represent the probability mass function (PMF) of the Poisson
distribution. They show the theoretical probabilities of observing a specific number of events
in the given range (0 to 10).
Bernoulli Distribution:

• This figure represents the Bernoulli distribution, which models a binary outcome (e.g.,
success or failure) with a single probability parameter (p).
• The histogram in red displays a simulated dataset generated from a Bernoulli distribution
with a probability of success (p) set to 0.3.
• The black vertical lines represent the probability mass function (PMF) of the Bernoulli
distribution. They show the theoretical probabilities of observing a failure (0) or a success
(1).

Applications
Probability distributions have a wide range of applications across various fields, including statistics,
science, engineering, finance, and more. Here are some common applications of probability
distributions:
1. Statistics and Data Analysis:
• Describing and modelling data distributions.
• Hypothesis testing and confidence interval estimation.
• Regression analysis and curve fitting.
2. Quality Control:
• Monitoring and controlling the quality of manufactured products and processes.
• Detecting defects or deviations from expected standards.
3. Finance:
• Modeling financial market returns and asset prices.
• Risk management and portfolio optimization.
• Option pricing using the Black-Scholes model.
4. Natural Phenomena:
• Modeling various natural phenomena with continuous or discrete data, including:
• Temperature distributions.
• IQ score distributions.
• Errors in measurements.

Reference used by students:

• https://www.scribbr.com/statistics/probability-
distributions/#:~:text=A%20probability%20distribution%20is%20a,using%20graphs
%20or%20probability%20tables.
• https://www.investopedia.com/terms/p/probabilitydistribution.asp

Session 3 Distribtion
No ratings yet
Session 3 Distribtion
46 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
4 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Maths
No ratings yet
Maths
10 pages
FRM Part 1: Distributions
No ratings yet
FRM Part 1: Distributions
25 pages
Distribution PPT
No ratings yet
Distribution PPT
75 pages
Important PMFs and PDFs
No ratings yet
Important PMFs and PDFs
7 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Probability
No ratings yet
Probability
10 pages
A Probability Distribution
No ratings yet
A Probability Distribution
5 pages
DS-2, Week 3 - Lectures
No ratings yet
DS-2, Week 3 - Lectures
11 pages
Mathematical statistics
No ratings yet
Mathematical statistics
7 pages
Probability Distribution: Shreya Kanwar (16eemme023)
No ratings yet
Probability Distribution: Shreya Kanwar (16eemme023)
51 pages
Stat 301 L15
No ratings yet
Stat 301 L15
25 pages
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
No ratings yet
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
46 pages
Analysis Part 2
No ratings yet
Analysis Part 2
71 pages
Statistics Final Review
No ratings yet
Statistics Final Review
28 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
Inferiential Statistics
No ratings yet
Inferiential Statistics
30 pages
Types of Probability Distribution
No ratings yet
Types of Probability Distribution
10 pages
Session 3 Distribtion
No ratings yet
Session 3 Distribtion
61 pages
Stat and Prob
No ratings yet
Stat and Prob
6 pages
Lec 01
No ratings yet
Lec 01
44 pages
DS4.1
No ratings yet
DS4.1
5 pages
Probability_FoundationalMathofAI_S24
No ratings yet
Probability_FoundationalMathofAI_S24
7 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Probability Handout
No ratings yet
Probability Handout
21 pages
Discrete probability distributions
No ratings yet
Discrete probability distributions
5 pages
DS Full
No ratings yet
DS Full
113 pages
Lecture Slides - Inferential Statistics
No ratings yet
Lecture Slides - Inferential Statistics
42 pages
Week04 - 2303 Aplikasi Bisnis S2 UI
No ratings yet
Week04 - 2303 Aplikasi Bisnis S2 UI
46 pages
Statistical Distributions
No ratings yet
Statistical Distributions
35 pages
cosc416
No ratings yet
cosc416
6 pages
STA80006 Weeks7-12 PDF
No ratings yet
STA80006 Weeks7-12 PDF
29 pages
Distributions in Probability
No ratings yet
Distributions in Probability
13 pages
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Carlos J. Rosas-Anderson
No ratings yet
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Carlos J. Rosas-Anderson
39 pages
Notes On Unit 3
No ratings yet
Notes On Unit 3
42 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Chapter 3 Radiation
100% (1)
Chapter 3 Radiation
36 pages
ML U3
No ratings yet
ML U3
34 pages
Chapter 5
No ratings yet
Chapter 5
37 pages
Commonly Used Probability Distribution - SHORT
No ratings yet
Commonly Used Probability Distribution - SHORT
26 pages
Sta 2200 Probability & Statistics II (Course Outline With Notes)
No ratings yet
Sta 2200 Probability & Statistics II (Course Outline With Notes)
155 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
Stat Cheatsheet (Ver.2)
No ratings yet
Stat Cheatsheet (Ver.2)
2 pages
B128 Expt9 Sem 2
No ratings yet
B128 Expt9 Sem 2
8 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages
PPT3 - Statistical Models in Simulation
No ratings yet
PPT3 - Statistical Models in Simulation
38 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Lecture - 2
No ratings yet
Lecture - 2
31 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Unit 3
No ratings yet
Unit 3
70 pages
Distributions and Sampling - Tuesday
No ratings yet
Distributions and Sampling - Tuesday
54 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Gamma
No ratings yet
Gamma
12 pages
Stochastic Processes Third Edition J. Medhi pdf download
No ratings yet
Stochastic Processes Third Edition J. Medhi pdf download
55 pages
UNIT 13.1 - The Probability Scale
No ratings yet
UNIT 13.1 - The Probability Scale
37 pages
Normal Distributon Topic Assessment
No ratings yet
Normal Distributon Topic Assessment
5 pages
Fuzzy Surfaces in GIS and Geographical Analysis: Theory, Analytical Methods, Algorithms, and Applications
No ratings yet
Fuzzy Surfaces in GIS and Geographical Analysis: Theory, Analytical Methods, Algorithms, and Applications
167 pages
Tutorial 8 (10)
No ratings yet
Tutorial 8 (10)
16 pages
1probability and Statistics For Pre-Engineers Course Outline
No ratings yet
1probability and Statistics For Pre-Engineers Course Outline
3 pages
CBSE Clas 7 Maths Worksheet - Data Handling
No ratings yet
CBSE Clas 7 Maths Worksheet - Data Handling
4 pages
Behavioural Models of Ambiguity
No ratings yet
Behavioural Models of Ambiguity
4 pages
Dissecting Reinforcement Learning-Part6
No ratings yet
Dissecting Reinforcement Learning-Part6
25 pages
CH 05
No ratings yet
CH 05
129 pages
Probability Theory: Probability Spaces and Events
No ratings yet
Probability Theory: Probability Spaces and Events
16 pages
CH5019 Mathematical Foundations of Data Science Test 8 Questions
No ratings yet
CH5019 Mathematical Foundations of Data Science Test 8 Questions
4 pages
HomeworkCh4 22
No ratings yet
HomeworkCh4 22
1 page
Grade 8 Data Handling - Docx - Compressed
No ratings yet
Grade 8 Data Handling - Docx - Compressed
15 pages
Random Signals and Noise
No ratings yet
Random Signals and Noise
517 pages
Proofs of Convergence of Random Variables
No ratings yet
Proofs of Convergence of Random Variables
5 pages
Probability Unit Qp
No ratings yet
Probability Unit Qp
4 pages
Slides On T and Chi Square Distributions
No ratings yet
Slides On T and Chi Square Distributions
22 pages
[Ebooks PDF] download River Ice Processes and Ice Flood Forecasting A Guide for Practitioners and Students Karl-Erich Lindenschmidt full chapters
100% (3)
[Ebooks PDF] download River Ice Processes and Ice Flood Forecasting A Guide for Practitioners and Students Karl-Erich Lindenschmidt full chapters
55 pages
Mme PDF
No ratings yet
Mme PDF
9 pages
ECN 236 - Probability 3
No ratings yet
ECN 236 - Probability 3
9 pages
Probability Teach
No ratings yet
Probability Teach
32 pages
CE204 Recitation07 Week08 Chapter3 4
No ratings yet
CE204 Recitation07 Week08 Chapter3 4
2 pages
Formula Sheet CT1
No ratings yet
Formula Sheet CT1
3 pages
Module 18 Probability Distributions
No ratings yet
Module 18 Probability Distributions
34 pages
Lecture 1 Inferential Statistics
No ratings yet
Lecture 1 Inferential Statistics
32 pages
Probability
No ratings yet
Probability
14 pages
TUM BATCH 4 COVER NOTES
No ratings yet
TUM BATCH 4 COVER NOTES
17 pages
STAT 330 Supplementary Notes
No ratings yet
STAT 330 Supplementary Notes
134 pages