0% found this document useful (0 votes)
26 views

??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind

??Statistical concepts that every Data Scientist should know??_???_?!! _ by Dhilip Maharish _ AI Mind

Uploaded by

sarsij mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind

??Statistical concepts that every Data Scientist should know??_???_?!! _ by Dhilip Maharish _ AI Mind

Uploaded by

sarsij mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!!

| by Dhilip Maharish | AI Mind

Open in app Sign up Sign in

Search

📈📊Statistical concepts that every Data


Scientist should know👨🏻‍💻👨🏻‍🎓!!
Dhilip Maharish · Follow
Published in AI Mind
7 min read · Aug 26, 2023

Listen Share

Essential Statistics concepts to build basic foundation for Modern Data Scientists
📊

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 1/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Source: Pixels images

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 2/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

In the world of Data Science, there are some important ideas that makes
efficient progress in workflow and also as super tool. These ideas help
data scientists make sense of all the information they work in it.

Yes, it is none other than Statistics. The basics foundational concepts that build the
process in data science.

In this article, we are going to explore how statistical concepts contribute to data
science. Whether you’re new to data science or have been doing it for a while, these
ideas are like a guidebook. They help you understand numbers better and use them
to make smart decisions.

So, let’s deep dive into these essential statistical ideas that make data science so
powerful.

First, we can get clear on this what data science is?

The title itself explains you, taking Data and applying scientifical concepts like
statistics, probability and calculus to derive the meaningful insights out of it.

Data Science is understanding Past information and predicting future information.

Source: Pixels Images


https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 3/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
Examples:

Data science helps us predict the future, like a weather forecast telling us if it will
rain tomorrow. It is not a magic it uses number and machine learning. It’s about
finding the truth in data. It helps us answer questions and solve problems.

Now we can get into Why statistics is needed in data science and how it contributes
in it?

Statistics is the backbone of data science.

It provides the necessary tools, methods, and principles for data scientists to
explore, analyze, and extract valuable insights from data. Without statistics, data
science would lack the rigor and reliability needed to make data-driven decisions
and solve complex problems.

It contributes to every process in Data science such as

✅Data Exploration and Summarization

✅Data Cleaning and Preprocessing

✅Inferential Analysis

✅Predictive Modeling

✅Feature Selection

✅Model Evaluation

✅Time Series Analysis

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 4/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Source: Pixels Images

In statistics, it is broadly classified into various types which applies in Data science
are listed below.

1. Descriptive Statistics

2. Inferential Statistics

3. Regression Analysis

4. Data Sampling

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 5/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
5. Feature Selection

6. Statistical Evaluation on Model

1. Descriptive Statistics
Descriptive statistics is a branch of statistics that deals with the presentation and
summary of data. Its primary goal is to provide a clear and concise overview of
data, allowing for easier interpretation and understanding.

It involves various concepts to make understanding data easier. They are

✅Mean (Average)- Measure the average value in the distribution of numerical data.

✅Median- Provide the average information with more efficient way compared to
Mean and it is not affected by outlier in data.

✅Variance- Measure the Spread in data.

✅Standard Deviation — The square root of the variance, providing a more


interpretable measure of data variability.

✅Percentile- It is a measure that indicated the percentage of data points that are
equal to or below a specific value in a dataset.

✅IQR (Interquartile range)- It is the measure of range between first quartile and
third quartile which helps to identify middle of 50 % of data.

✅Histogram- It is the measure of frequency or count of data points falling into


specific intervals (bins) along the horizontal axis.

✅PDF (Probability Density Function)-It is a statistical function that describes the


likelihood of a continuous random variable taking on a specific value within a given
range.

✅CDF (Cumulative Density Function)- It is a statistical function that gives the


cumulative probability that a random variable is less than or equal to a specific
value.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 6/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Skewness- It describes the asymmetry in the distribution of data.

✅Kurtosis- It measures the tailedness of the data distribution.

Source: Pixels Images

2.Inferential Statistics
Inferential statistics is a branch of statistics involves data to make inferences,
predictions, or generalizations about populations based on sample data. It helps us
to draw conclusions or make statements about a larger group (population) by
analyzing a smaller, representative subset of that group (sample).

✅Hypothesis Testing- It formulate hypotheses about population parameters (e.g.,


population mean) and use sample data to test whether these hypotheses are
supported or refuted.

✅Estimation- It estimate population parameters based on sample data.

✅Confidence Interval- It provide a range of values within which a population


parameter is likely to fall.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 7/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Statistical Tests- A wide range of statistical tests, such as t-tests, chi-squared tests,
ANOVA, and regression analysis, are used in inferential statistics to compare groups,
assess relationships, and make predictions.

✅Level of Significance- It often denoted by α, which represents the probability of


making a Type I error ie., incorrectly rejecting a true null hypothesis.

Source: Pixels Images

3. Regression Analysis
Regression analysis is the statistical technique used in Data science which quantify
the relationship between one or more independent variables (predictors) and a
dependent variable (outcome) in order to make predictions or understand the
impact of the predictors on the outcome.

✅Linear Regression- It makes relationship between a dependent variable and one


or more independent variables by fitting a linear equation to the data.

✅Multiple Regression- It incorporate two or more independent variables to predict


a single dependent variable.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 8/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Polynomial Regression- It make relationship between variables appears to be
nonlinear, this model fits a polynomial (e.g., quadratic or cubic) equation to the
data.

✅Ridge Regression and Lasso Regression- Variations of linear regression that


incorporate regularization techniques to handle multicollinearity and prevent
overfitting.

Photo by Enayet Raheem on Unsplash

4. Data Sampling
Data sampling is a statistical technique used in data science to select a subset of data
points from a larger dataset. The purpose of sampling is to make data analysis more
manageable, cost-effective, and practical, especially when working with large or
extensive datasets.

✅Random Sampling- In this method, every item or member in the population has
an equal chance of being selected for the sample. It reduces bias and ensures that
the sample is representative of the population.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 9/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Stratified Sampling- The population is divided into subgroups or strata based on
certain characteristics (e.g., age, gender, location). Then, random sampling is
performed within each stratum to ensure representation of all groups.

✅Systematic Sampling- The starting point is randomly chosen, and then every
“kth” item is included in the sample. It’s simple and often more efficient than simple
random sampling.

Source: Pixels Images

5.Feature Selection
It the Statistical techniques which guides in selection of relevant features (variables)
for predictive modeling. Techniques like feature importance and correlation
analysis help data scientists choose the most influential factors.

✅Correlation-Based Feature Selection- Selects features based on their correlation


with the target variable, removing redundant or highly correlated features.

✅Tree-Based Feature Importance- Decision tree and ensemble models (e.g.,


Random Forest, Gradient Boosting) can provide feature importance scores, which
can be used to select the most important features.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 10/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Mutual Information- Measures the dependency between features and the target
variable, selecting features with high mutual information.

✅L1 Regularization (Lasso)- Encourages sparsity in the model by penalizing the


absolute values of feature coefficients, effectively selecting a subset of features.

Source: Pixels Images

6.Statistical Evaluation on Model


It involves various statistical metrics and tests to quantitatively measure how well
the model performs.

✅Accuracy- Accuracy measures the proportion of correctly classified instances in


a classification model.

✅Mean Absolute Error (MAE)- MAE measures the average absolute difference
between the predicted values and the actual values.

✅Mean Squared Error (MSE)- MSE calculates the average of the squared
differences between predicted and actual values.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 11/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
✅Root Mean Squared Error (RMSE)- RMSE is the square root of MSE, providing an
interpretable metric in the same units as the target variable.

✅R-squared (R²) or Coefficient of Determination- R² measures the proportion of


the variance in the dependent variable that is explained by the independent
variables in the model.

✅Area Under the Receiver Operating Characteristic (ROC AUC)- It measures the
area under the receiver operating characteristic curve, which plots the trade-off
between true positive rate (recall) and false positive rate at various thresholds.

✅Confusion Matrix- A table that shows the number of true positives, true
negatives, false positives, and false negatives, providing detailed insights into the
performance of a classification model.

✅Precision- Measures the ratio of true positive predictions to the total positive
predictions, emphasizing the model’s ability to avoid false positives.

✅Recall- Measures the ratio of true positives to the total actual positives,
emphasizing the model’s ability to find all relevant instances.

✅F1-Score- The harmonic mean of precision and recall, offering a balance


between the two metrics.

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 12/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Photo by ThisisEngineering RAEng on Unsplash

If you like the above article kindly provide claps and

Follow me in medium: Dhilip Maharish — Medium

Follow me in Linkedin: Dhilip Maharish | LinkedIn

A Message from AI Mind

Thanks for being a part of our community! Before you go:

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 13/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
👏 Clap for the story and follow the author 👉
📰 View more content in the AI Mind Publication
🧠 Improve your AI prompts effortlessly and FREE
🧰 Discover Intuitive AI Tools
Data Science Statistics Data Analysis Mathematics Data

Follow

Written by Dhilip Maharish


5.4K Followers · Writer for AI Mind

75K+ views | Technical Content Engineer at GeeksforGeeks | Python | SQL | Power BI | Data science and
analysis https://www.linkedin.com/in/dhilip-kumar-ds/

More from Dhilip Maharish and AI Mind

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 14/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Dhilip Maharish in AI Mind

📈Linear Algebra that every Machine Learning Engineer should know👨🏻‍💻


👨🏻‍🎓!!
Linear algebra is the bridge between raw data and data-driven decisions.

10 min read · Sep 10, 2023

1.4K 6

Arsalan Pardesi in AI Mind

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 15/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
How to Build a Neural Network from Scratch: A Step-by-Step Guide
Building Neural Networks from the Grounds Up: A Hands-on Exploration of the Math Behind
the Magic

15 min read · Jul 17, 2023

237

Dhilip Maharish in AI Mind

Calculus that every Machine Learning Engineer should know👨🏻‍💻👨🏻‍🎓!!


Calculus is like a Fuel which power in optimize the Machine Learning Model!!

7 min read · Sep 17, 2023

799 4

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 16/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Dhilip Maharish in Artificial Intelligence in Plain English

Common Linux Commands Used by Programmers


Let’s get a pro player in using the Linux system by knowing the common commands.

6 min read · Dec 7, 2023

303 1

See all from Dhilip Maharish

See all from AI Mind

Recommended from Medium

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 17/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Anjolaoluwa Ajayi in 𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

130 Data Science Terms Every Data Scientist Should Know in 2024
Most Data Science Jargon explained in plain English

11 min read · Jan 5

2.3K 22

Zach Quinn in Pipeline: Your Data Engineering Resource

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 18/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
3 Data Science Projects That Got Me 12 Interviews. And 1 That Got Me in
Trouble.
3 work samples that got my foot in the door, and 1 that almost got me tossed out.

· 7 min read · Aug 30, 2022

4K 53

Lists

Predictive Modeling w/ Python


20 stories · 837 saves

Practical Guides to Machine Learning


10 stories · 976 saves

data science and AI


39 stories · 54 saves

ChatGPT prompts
35 stories · 1025 saves

Data Scian by Imad Adrees

Best Portfolio Projects for Data Science


“How can I showcase my data skills to the world?” you may be asking. Fear not, for the solution
lies in assembling a sparkling portfolio of…

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 19/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind
5 min read · Sep 19, 2023

813 7

Virat Patel

I applied to 230 Data science jobs during last 2 months and this is what
I’ve found.
A little bit about myself: I have been working as a Data Analyst for a little over 2 years.
Additionally, for the past year, I have been…

· 3 min read · Aug 11, 2023

2.9K 59

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 20/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

Liu Zuo Lin in Level Up Coding

30 Python Concepts I Wish I Knew Way Earlier


# Stuff I wish I Learnt Much Earlier in my Python Journey

· 14 min read · Aug 12, 2023

5.7K 31

Thu Vu in Towards Data Science

How to Learn AI on Your Own (a self-study guide)

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 21/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻‍💻👨🏻‍🎓!! | by Dhilip Maharish | AI Mind

If your hands touch a keyboard for work, Artificial Intelligence is going to change your job in the
next few years.

· 12 min read · Jan 5

2.4K 24

See more recommendations

https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 22/22

You might also like