??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
Search
Listen Share
Essential Statistics concepts to build basic foundation for Modern Data Scientists
📊
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 1/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 2/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
In the world of Data Science, there are some important ideas that makes
efficient progress in workflow and also as super tool. These ideas help
data scientists make sense of all the information they work in it.
Yes, it is none other than Statistics. The basics foundational concepts that build the
process in data science.
In this article, we are going to explore how statistical concepts contribute to data
science. Whether you’re new to data science or have been doing it for a while, these
ideas are like a guidebook. They help you understand numbers better and use them
to make smart decisions.
So, let’s deep dive into these essential statistical ideas that make data science so
powerful.
The title itself explains you, taking Data and applying scientifical concepts like
statistics, probability and calculus to derive the meaningful insights out of it.
Data science helps us predict the future, like a weather forecast telling us if it will
rain tomorrow. It is not a magic it uses number and machine learning. It’s about
finding the truth in data. It helps us answer questions and solve problems.
Now we can get into Why statistics is needed in data science and how it contributes
in it?
It provides the necessary tools, methods, and principles for data scientists to
explore, analyze, and extract valuable insights from data. Without statistics, data
science would lack the rigor and reliability needed to make data-driven decisions
and solve complex problems.
✅Inferential Analysis
✅Predictive Modeling
✅Feature Selection
✅Model Evaluation
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 4/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
In statistics, it is broadly classified into various types which applies in Data science
are listed below.
1. Descriptive Statistics
2. Inferential Statistics
3. Regression Analysis
4. Data Sampling
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 5/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
5. Feature Selection
1. Descriptive Statistics
Descriptive statistics is a branch of statistics that deals with the presentation and
summary of data. Its primary goal is to provide a clear and concise overview of
data, allowing for easier interpretation and understanding.
✅Mean (Average)- Measure the average value in the distribution of numerical data.
✅Median- Provide the average information with more efficient way compared to
Mean and it is not affected by outlier in data.
✅Percentile- It is a measure that indicated the percentage of data points that are
equal to or below a specific value in a dataset.
✅IQR (Interquartile range)- It is the measure of range between first quartile and
third quartile which helps to identify middle of 50 % of data.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 6/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Skewness- It describes the asymmetry in the distribution of data.
2.Inferential Statistics
Inferential statistics is a branch of statistics involves data to make inferences,
predictions, or generalizations about populations based on sample data. It helps us
to draw conclusions or make statements about a larger group (population) by
analyzing a smaller, representative subset of that group (sample).
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 7/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Statistical Tests- A wide range of statistical tests, such as t-tests, chi-squared tests,
ANOVA, and regression analysis, are used in inferential statistics to compare groups,
assess relationships, and make predictions.
3. Regression Analysis
Regression analysis is the statistical technique used in Data science which quantify
the relationship between one or more independent variables (predictors) and a
dependent variable (outcome) in order to make predictions or understand the
impact of the predictors on the outcome.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 8/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Polynomial Regression- It make relationship between variables appears to be
nonlinear, this model fits a polynomial (e.g., quadratic or cubic) equation to the
data.
4. Data Sampling
Data sampling is a statistical technique used in data science to select a subset of data
points from a larger dataset. The purpose of sampling is to make data analysis more
manageable, cost-effective, and practical, especially when working with large or
extensive datasets.
✅Random Sampling- In this method, every item or member in the population has
an equal chance of being selected for the sample. It reduces bias and ensures that
the sample is representative of the population.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 9/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Stratified Sampling- The population is divided into subgroups or strata based on
certain characteristics (e.g., age, gender, location). Then, random sampling is
performed within each stratum to ensure representation of all groups.
✅Systematic Sampling- The starting point is randomly chosen, and then every
“kth” item is included in the sample. It’s simple and often more efficient than simple
random sampling.
5.Feature Selection
It the Statistical techniques which guides in selection of relevant features (variables)
for predictive modeling. Techniques like feature importance and correlation
analysis help data scientists choose the most influential factors.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 10/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Mutual Information- Measures the dependency between features and the target
variable, selecting features with high mutual information.
✅Mean Absolute Error (MAE)- MAE measures the average absolute difference
between the predicted values and the actual values.
✅Mean Squared Error (MSE)- MSE calculates the average of the squared
differences between predicted and actual values.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 11/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
✅Root Mean Squared Error (RMSE)- RMSE is the square root of MSE, providing an
interpretable metric in the same units as the target variable.
✅Area Under the Receiver Operating Characteristic (ROC AUC)- It measures the
area under the receiver operating characteristic curve, which plots the trade-off
between true positive rate (recall) and false positive rate at various thresholds.
✅Confusion Matrix- A table that shows the number of true positives, true
negatives, false positives, and false negatives, providing detailed insights into the
performance of a classification model.
✅Precision- Measures the ratio of true positive predictions to the total positive
predictions, emphasizing the model’s ability to avoid false positives.
✅Recall- Measures the ratio of true positives to the total actual positives,
emphasizing the model’s ability to find all relevant instances.
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 12/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 13/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
👏 Clap for the story and follow the author 👉
📰 View more content in the AI Mind Publication
🧠 Improve your AI prompts effortlessly and FREE
🧰 Discover Intuitive AI Tools
Data Science Statistics Data Analysis Mathematics Data
Follow
75K+ views | Technical Content Engineer at GeeksforGeeks | Python | SQL | Power BI | Data science and
analysis https://www.linkedin.com/in/dhilip-kumar-ds/
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 14/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
1.4K 6
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 15/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
How to Build a Neural Network from Scratch: A Step-by-Step Guide
Building Neural Networks from the Grounds Up: A Hands-on Exploration of the Math Behind
the Magic
237
799 4
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 16/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
303 1
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 17/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
130 Data Science Terms Every Data Scientist Should Know in 2024
Most Data Science Jargon explained in plain English
2.3K 22
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 18/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
3 Data Science Projects That Got Me 12 Interviews. And 1 That Got Me in
Trouble.
3 work samples that got my foot in the door, and 1 that almost got me tossed out.
4K 53
Lists
ChatGPT prompts
35 stories · 1025 saves
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 19/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
5 min read · Sep 19, 2023
813 7
Virat Patel
I applied to 230 Data science jobs during last 2 months and this is what
I’ve found.
A little bit about myself: I have been working as a Data Analyst for a little over 2 years.
Additionally, for the past year, I have been…
2.9K 59
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 20/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
5.7K 31
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 21/22
1/27/24, 11:12 PM 📈📊Statistical concepts that every Data Scientist should know👨🏻💻👨🏻🎓!! | by Dhilip Maharish | AI Mind
If your hands touch a keyboard for work, Artificial Intelligence is going to change your job in the
next few years.
2.4K 24
https://pub.aimind.so/statistical-concepts-that-every-data-scientist-should-know-478b90a997ad 22/22