0% found this document useful (0 votes)
8 views

Group_2_Practical

The document outlines an experiment on probability distributions, focusing on linear regression and its applications. It details objectives such as understanding probability distributions, dataset specifications, data pre-processing, model selection, training, and evaluation. Additionally, it explains key probability distributions (Normal, Poisson, and Bernoulli) and their applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Group_2_Practical

The document outlines an experiment on probability distributions, focusing on linear regression and its applications. It details objectives such as understanding probability distributions, dataset specifications, data pre-processing, model selection, training, and evaluation. Additionally, it explains key probability distributions (Normal, Poisson, and Bernoulli) and their applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Experiment No: 2 Probability Distributions

Objectives:

● Intuition
○ Understanding the basics of linear regression
○ Conceptualizing how linear regression works

● Dataset Specification
○ Describing the dataset used in the practical
○ Identifying the variables and their significance

● Data Pre-processing
○ Cleaning and handling missing data
○ Feature scaling and normalization
○ Encoding categorical variables

● Data Splitting
○ Dividing the dataset into training and testing sets
○ Determining the split ratio

● Model Selection
○ Choosing linear regression as the predictive model
○ Justification for selecting linear regression

● Model Training
○ Implementing a simple linear regression model
○ Training the model using the training dataset

● Model Evaluation

○ Calculating the mean squared error (MSE)


○ Computing the R-squared value to assess model fit
○ Interpreting the results

● Generalization and Application


○ Discussing the practical applications of linear regression
○ Generalization of the model for real-world scenarios
○ Sharing insights and takeaways
Intuition

Understanding what is probability distribution:

A probability distribution is a mathematical function or a model that describes how the values of a
random variable are distributed. In other words, it tells you the likelihood of various possible
outcomes for a random event or experiment. Probability distributions are fundamental in statistics
and probability theory and are used to understand, analyse, and model uncertainty in various real-
world scenarios.

There are two main types of probability distributions:

• Discrete Probability Distribution:


o In a discrete distribution, the random variable can only take on specific, separate
values.
o Each value has an associated probability.
o Common examples of discrete distributions include the Bernoulli distribution (binary
outcomes), the Poisson distribution (counting events), and the binomial distribution
(number of successes in fixed trials).
• Continuous Probability Distribution:
o In a continuous distribution, the random variable can take on any value within a range
or interval.
o The distribution is described by a probability density function (PDF) rather than
specific probabilities for each value.
o The normal distribution, also known as the Gaussian distribution, is a well-known
example of a continuous distribution and is used to model many natural phenomena.

Normal (Gaussian) Distribution:

• The normal distribution, often referred to as the Gaussian distribution, is one of the most
important and widely used probability distributions.
• It is characterized by a bell-shaped curve with two parameters: the mean (μ) and the standard
deviation (σ).
• Many natural phenomena, such as heights, weights, and measurement errors, closely follow a
normal distribution.
• The distribution is symmetric and unimodal, with a mean of μ and a variance of σ^2.

Poisson Distribution:

• The Poisson distribution models the number of events occurring in a fixed interval of time or
space.
• It is characterized by a single parameter λ (lambda), which represents the average rate of
occurrence.
• The distribution is discrete and often used for rare events or phenomena that occur randomly
but with a known average rate.

Bernoulli Distribution:
• The Bernoulli distribution models a binary outcome with two possible results: success (1) and
failure (0).
• It is characterized by a single parameter p, representing the probability of success.
• The Bernoulli distribution is a fundamental building block for modelling binary events and
serves as the basis for the binomial distribution, which models the number of successes in a
fixed number of independent Bernoulli trials.

Dataset Specification

In the context of our practical project, we utilized carefully generated datasets that align with specific
probability distributions. These datasets were designed to capture the characteristics of three key
probability distributions, namely the Normal, Poisson, and Bernoulli distributions.

1. Normal Distribution Dataset:

• We crafted a dataset to closely mimic the properties of the normal distribution. This dataset
was created by generating random data points that follow a bell-shaped curve. The data is
characterized by a defined mean (average) and standard deviation (spread). By leveraging
this dataset, we aimed to demonstrate the application of the normal distribution in modelling
real-world phenomena, such as measurements, where data often exhibits a symmetrical
distribution around a central value.

2. Poisson Distribution Dataset:

• To simulate the Poisson distribution, we generated a dataset that emulates the occurrence of
events within a fixed interval. The data in this dataset is characterized by a known average
event rate (λ). The dataset was crafted to illustrate the properties of the Poisson distribution,
which is commonly used to model rare and unpredictable events, such as customer arrivals at
a store, machine failures, or accidents.

3. Bernoulli Distribution Dataset:

• For the Bernoulli distribution, we created a dataset representing binary outcomes with two
possible results, typically labelled as "success" and "failure." The probability of success (p)
was predefined in the dataset, allowing us to illustrate the concept of a Bernoulli trial and
how it serves as the foundation for more complex models like the binomial distribution. This
dataset highlights the application of the Bernoulli distribution in modelling events with only
two possible outcomes, such as coin flips or product defect detection.
Program:
Normal_distribution.py
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters for the normal distribution


mu = 0 # Mean
sigma = 1 # Standard deviation

# Generate random data from a normal distribution


data = np.random.normal(mu, sigma, 1000)

# Plot the histogram of the data


plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Create a PDF (Probability Density Function) for the distribution


x = np.linspace(-3, 3, 100)
pdf = norm.pdf(x, mu, sigma)
plt.plot(x, pdf, 'k-', lw=2)

plt.title("Normal Distribution")
plt.show()

Poisson_distribution.py
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

# Parameter for the Poisson distribution (lambda)


lambda_ = 5 # Average number of events in a fixed interval

# Generate random data from a Poisson distribution


data = poisson.rvs(lambda_, size=1000)

# Plot the histogram of the data


plt.hist(data, bins=range(15), density=True, alpha=0.6, color='b')

# Create a PMF (Probability Mass Function) for the distribution


x = np.arange(0, 15)
pmf = poisson.pmf(x, lambda_)
plt.vlines(x, 0, pmf, colors='k', lw=2)

plt.title("Poisson Distribution Example (λ = 5)")


plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()

Bernoulli_distribution.py
from scipy.stats import bernoulli
# Parameter for the Bernoulli distribution (probability of success)
p = 0.3

# Generate random data from a Bernoulli distribution


data = bernoulli.rvs(p, size=100)

# Plot the histogram of the data


plt.hist(data, bins=2, density=True, alpha=0.6, color='r')

# Create a PMF for the distribution


x = [0, 1]
pmf = bernoulli.pmf(x, p)
plt.vlines(x, 0, pmf, colors='k', lw=2)

plt.title("Bernoulli Distribution")
plt.xticks([0, 1], ['Failure', 'Success'])
plt.show()
Output:
Normal Distribution:

• This figure represents a continuous probability distribution, specifically the normal


distribution (Gaussian distribution).
• The histogram in green shows a simulated dataset generated from a normal distribution with a
mean (μ) of 0 and a standard deviation (σ) of 1.
• The black line represents the probability density function (PDF) of the normal distribution. It
shows the theoretical probability of observing each value in the range.
Poisson Distribution:

• This figure represents the Poisson distribution, which models the number of events occurring
in a fixed interval of time or space.
• The histogram in blue displays a simulated dataset generated from a Poisson distribution with
a parameter λ (lambda) set to 3.
• The black vertical bars represent the probability mass function (PMF) of the Poisson
distribution. They show the theoretical probabilities of observing a specific number of events
in the given range (0 to 10).
Bernoulli Distribution:

• This figure represents the Bernoulli distribution, which models a binary outcome (e.g.,
success or failure) with a single probability parameter (p).
• The histogram in red displays a simulated dataset generated from a Bernoulli distribution
with a probability of success (p) set to 0.3.
• The black vertical lines represent the probability mass function (PMF) of the Bernoulli
distribution. They show the theoretical probabilities of observing a failure (0) or a success
(1).

Applications
Probability distributions have a wide range of applications across various fields, including statistics,
science, engineering, finance, and more. Here are some common applications of probability
distributions:
1. Statistics and Data Analysis:
• Describing and modelling data distributions.
• Hypothesis testing and confidence interval estimation.
• Regression analysis and curve fitting.
2. Quality Control:
• Monitoring and controlling the quality of manufactured products and processes.
• Detecting defects or deviations from expected standards.
3. Finance:
• Modeling financial market returns and asset prices.
• Risk management and portfolio optimization.
• Option pricing using the Black-Scholes model.
4. Natural Phenomena:
• Modeling various natural phenomena with continuous or discrete data, including:
• Temperature distributions.
• IQ score distributions.
• Errors in measurements.

Reference used by students:


• https://www.scribbr.com/statistics/probability-
distributions/#:~:text=A%20probability%20distribution%20is%20a,using%20graphs
%20or%20probability%20tables.
• https://www.investopedia.com/terms/p/probabilitydistribution.asp

You might also like