Session1-DataCharacteristics
Session1-DataCharacteristics
➢Handling Complexity
➢Decision-Making and Optimization
➢Adaptability to Changing
➢Environments
➢Automation and Efficiency
What is Machine Learning?
• Machine learning is a set of methods that can automatically
detect patterns in data.
• These uncovered patterns are then used to predict future
data, or to perform other kinds of decision-making under
uncertainty.
• The key premise is learning from data!!
• Addresses the problem of analyzing huge bodies of data so
that they can be understood.
• Providing techniques to automate the analysis and
exploration of large, complex data sets.
• Tools, methodologies, and theories for revealing patterns in
data – critical step in knowledge discovery.
Machine Learning Process
Examples
• Machine learning plays a key role in many areas of science, finance and industry:
• Predict whether a patient, hospitalized due to a heart attack, will have a second
heart attack.
• The prediction is to be based on demographic, diet and clinical measurements
for that patient.
• Predict the price of a stock in 6 months from now, on the basis of company
performance measures and economic data.
• Identify the numbers in a handwritten ZIP code, from a digitized image.
• Estimate the amount of glucose in the blood of a diabetic person, from the
infrared absorption spectrum of that person’s blood.
• Identify the risk factors for prostate cancer, based on clinical and demographic
variables.
The Modelling Process
1 6 7
• Define Business • Optimize Model • Determine Best Fit
Problem
2 5 8
• Define Hypotheses • Develop Predictive • Utilize Model/Score
Model New Data
3 4 9
• Collect Data • Analyze Data • Monitor Model
Background of Learning Process
bigml@iiitkottayam
Contents
1. Introduction to Data
2. Data Analytics
2. Diagnostic Analytics
Diagnostic analytics is used to
identify the root causes or reasons
behind a trend or anomaly
observed in the data.
Data Analytics using AI
3. Predictive Analytics
Predictive analytics helps forecast future events based on historical
data and trends
4. Prescriptive Analytics
Prescriptive analytics suggests actions to take based on the data
analysis and predictions, providing recommendations on the best
course of action.
Exploratory Data Analysis:
• Exploratory Data Analysis (EDA) involves analyzing and
visualizing data to understand its key characteristics, uncover
patterns, and identify relationships between variables refers to the
method of studying and exploring record sets to apprehend their
predominant traits, discover patterns, locate outliers, and identify
relationships between variables.
• EDA is normally carried out as a preliminary step before
undertaking extra formal statistical analyses or modeling.
Key aspects of EDA:
• Distribution of Data: Examining the distribution of data points to understand their
range, central tendencies (mean, median), and dispersion (variance, standard
deviation).
• Graphical Representations: Utilizing charts such as histograms, box plots, scatter
plots, and par charts to visualize relationships within the data and distributions of
variables.
• Outlier Detection: Identifying unusual values that deviate from other data points.
Outliers can influence statistical analyses and might indicate data entry errors or
unique cases.
• Correlation Analysis: Checking the relationships between variables to understand
how they night affect each other. This includes computing correlation coefficients
and creating correlation matrices.
• Handling Missing Values: Detecting and deciding how to address missing data
points, whether by imputation or removal, depending on their impact and the
amount of missing data.
• Summary Statistics: Calculating key statistics that provide insight into data trends.
• Testing Assumptions: Many statistical tests and models assume the data meet
certain conditions. EDA helps verify these assumptions.
Python libraries for EDA
• GeeksforGeeks (https://www.geeksforgeeks.org/data-analysis/)
• IBM (https://www.ibm.com/topics/data-analysis)
• DataCamp (https://www.datacamp.com/community/tutorials/exploratory-
data-analysis-python)
• GeeksforGeeks (https://www.geeksforgeeks.org/text-mining-using-
python/)
• KDnuggets (https://www.kdnuggets.com/2021/03/data-analysis-overview-
process.html)
• Microsoft (https://learn.microsoft.com/en-us/azure/machine-
learning/data-science-workflow)
References (Cont.)
• Towards Data Science
(https://towardsdatascience.com/understanding-exploratory-data-
analysis-in-python-85f8eacfaedb)
• DataFlair (https://data-flair.training/blogs/text-mining-using-python/)
• SAS (https://www.sas.com/en_us/insights/analytics/text-
analytics.html)