ML Unit 1 CSE
ML Unit 1 CSE
Unit: 1
Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1
1
9/3/2024
Profile
❑ Working on different Project with Technical University, Sofia, Bulgaria, Aarhus University, Denmark.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 6
Course Objectives
➢This course will serve as a comprehensive introduction to various
topics in machine learning
➢To introduce students to the basic concepts and techniques of
Machine Learning.
➢To become familiar with regression methods, classification
methods, clustering methods.
➢To become familiar with Artificial Neural Networks and Deep
Learning
➢To introduce the concept of Reinforcement Learning and Genetic
Algorithms
➢It Focuses on the implementation of machine learning for solving
practical problems
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 7
Objectives of Unit
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 9
CO-PO and PSO Mapping
CO MAPPING WITH PO
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 10
CO-PO and PSO Mapping
1 1 2
2 1 2 1 1
3 2 1 1 2
4 1 1 1 2
5 1 1 1
6 1 1 1
7 2 1 1 1
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 11
Syllabus
Unit-I : Introduction-
What is Machine Learning?, Fundamental of Machine Learning, Key Concepts
and an example of ML, Basics of Python for machine learning, Machine Learning
Libraries, Data Pre-processing, Handling Missing Values, Handling Outliers, One
Hot Encoder & FeatureScaling
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 14
UNIT-WISE OBJECTIVES
Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1
9/3/2024 15
PREREQUISITE AND RECAP
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 16
Prerequisite and Recap
Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1
9/3/2024 17
Top Artificial Intelligence Stats You
Should Know About in 2024
Elon Musk is undoubtedly one of the most famous personalities in the field of
AI. He founded OpenAI in 2015, along with other co-founders with the vision of
developing friendly AI that should benefit the entire humanity. OpenAI is
conducting groundbreaking research in AI and developing open-source tools
such as OpenAI Universe.
Demis Hassabis is another famous personality in AI, and the founder and
CEO of DeepMind. DeepMind is an AI research firm which mostly focuses on
deep learning, AI robotics, neuroscience, unsupervised learning and
generative models, and reinforcement learning. The company is known for
developing the first AI system to defeat a professional human Go player —
AlphaGo.
9/3/2024 Dr. Hitesh Singh KCS 055 ML Unit 1 25
Top 10 People in Artificial
Intelligence
7. Fei-Fei Li
According to ZipRecruiter, these are the top 5 skills required for AI jobs:
• Communication skills
• Knowledge and experience with Python specifically (in general, proficiency
in programming language)
• Digital marketing goals and strategies
• Collaborating effectively with others
• Analytical skills
•Classification
•Regression
• The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email
filtering, etc.
Disadvantages:
•Clustering
•Association
The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by
their purchasing behaviour.
2) Association
• Association rule learning is an unsupervised learning
technique, which finds interesting relations among variables
within a large dataset.
• The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map
those variables accordingly so that it can generate maximum
profit.
• This algorithm is mainly applied in Market Basket analysis,
Web usage mining, continuous production, etc.
• Some popular algorithms of Association rule learning are
Apriori Algorithm, Eclat, FP-growth algorithm.
Advantages:
•These algorithms can be used for complicated tasks compared to the
supervised
ones because these algorithms work on the unlabeled dataset.
Disadvantages:
•The output of an unsupervised algorithm can be less accurate as the dataset
is not labelled, and algorithms are not trained with the exact output in prior.
•Working with Unsupervised learning is more difficult as it works with the
unlabeled dataset that does not map with the output.
Advantages:
•It is simple and easy to understand the algorithm.
•It is highly efficient.
•It is used to solve drawbacks of Supervised and Unsupervised
Learning algorithms.
Disadvantages:
•Iterations results may not be stable.
•We cannot apply these algorithms to network-level data.
•Accuracy is low.
9/3/2024 Dr. Hitesh Singh KCS -055 ML Unit 1 80
Introduction (CO1)
4. Reinforcement Learning
•Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how
to use RL in computer to automatically learn and schedule resources to wait for different
jobs in order to minimize average job slowdown.
•Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement
learning. There are different industries that have their vision of building intelligent robots
using AI and Machine learning technology.
•Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the
help of Reinforcement Learning by Salesforce company.
9/3/2024 Dr. Hitesh Singh KCS -055 ML Unit 1 84
Introduction (CO1)
Advantages
•It helps in solving complex real-world problems which are difficult to be solved by
general techniques.
•The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
•Helps in achieving long term results.
Disadvantage
Data Preprocessing
• Data preprocessing is the process of transforming
raw data into an understandable format.
• It is also an important step in data analytics as we
cannot work with raw data.
• The quality of the data should be checked before
applying machine learning or data mining
algorithms.
1. Data Cleaning:
2. Data Transformation:
3. Data Integration:
• Combining data from multiple sources into a unified dataset, resolving any
inconsistencies or conflicts in data formats, naming conventions, or units.
4. Data Reduction:
5. Data Discretization:
6. Data Normalization:
7. Data Augmentation:
8. Data Balancing:
1. Domain Knowledge:
• Start by gaining a deep understanding of the domain you're working in. This includes understanding
the business context, the problem you're trying to solve, and the relevant factors that might
influence the outcomes.
2. Data Exploration:
• Perform exploratory data analysis (EDA) to get a comprehensive view of the dataset. This involves
techniques like summary statistics, data visualization (histograms, scatter plots, etc.), and
correlation analysis to understand relationships between variables.
• Based on domain knowledge and EDA results, identify variables that are likely
to be relevant to the problem at hand. Look for variables that have a strong
impact on the target variable or exhibit interesting patterns and relationships.
4. Handling Redundancy:
• Identify and handle redundant variables, i.e., variables that provide similar
information. Redundant variables can increase model complexity without
adding meaningful insights. Techniques like correlation analysis or variance
inflation factor (VIF) can help identify and address redundancy.
5. Feature Engineering:
6. Dimensionality Reduction:
7. Feature Importance:
8. Iterative Process:
• Data Selection:
• The process starts with selecting the relevant data
from one or more databases or data sources.
• This involves identifying the data sources,
understanding their structure, and determining
which data subsets are necessary for the analysis.
• Data Preprocessing:
• Data Transformation:
• Data mining is the core step in KDD where algorithms and techniques are applied to the
transformed data to extract patterns, relationships, and insights. Common data mining
techniques include:
• Classification: Predicting categorical outcomes or classes based on input variables.
• Regression: Predicting continuous numerical values based on input variables.
• Clustering: Grouping similar data points together based on their attributes.
• Association Rule Mining: Discovering relationships and associations between variables in
large datasets (e.g., market basket analysis).
• Anomaly Detection: Identifying outliers or unusual patterns that deviate from the norm.
• Sequential Pattern Mining: Identifying sequences or patterns in time-series or sequential
data.
• Pattern Evaluation:
• Once patterns and insights are extracted, they are evaluated based on their
significance, reliability, and relevance to the problem domain. This involves
statistical analysis, validation techniques, and domain expert feedback to
assess the quality of discovered patterns.
• Knowledge Representation:
• The discovered patterns and insights are represented in a meaningful and
interpretable format that can be used for decision-making. This may involve
visualizations, rules, graphs, or other forms of representation that facilitate
understanding and utilization of the knowledge.
• Knowledge Utilization:
• Data Smoothing:
• Moving Average: Smooth time-series data by calculating moving
averages to reduce noise and highlight trends.
• Filtering Techniques: Apply filters (e.g., median filter, Gaussian filter)
to remove noise from signals or images.
• Feature Engineering:
• Feature Selection: Choose relevant features and exclude noisy or
irrelevant features from the analysis.
• Feature Creation: Create new features or combine existing features
to capture important information and reduce noise.
• Modeling Techniques:
3. Custom Binning:
1. Categorical Binning:
• Explanation: Similar to numerical binning, categorical binning
involves grouping categorical values into broader categories or
bins. This can be based on similarity, frequency, or domain
knowledge.
• Example:
Category A: {Apple, Banana, Cherry}Category B: {Orange, Mango
➢https://nptel.ac.in/courses/106106093/
➢https://www.youtube.com/watch?v=m-aKj5ovDfg
➢https://www.youtube.com/watch?v=G4NYQox4n2g
➢https://nptel.ac.in/courses/106/105/106105174/
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 161
DAILY QUIZ
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 162
DAILY QUIZ
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 163
DAILY QUIZ
o Graphical.
o Geometric.
o Icon-based.
o Pixel-based.
o Preprocessed.
o Cleaned.
o Real-world
o Transformed.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 164
DAILY QUIZ
7. The term that is not associated with data cleaning process is ______.
o domain consistency
o deduplication.
o disambiguation.
o segmentation.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 165
WEEKLY ASSIGNMENT
9/3/2024
Q5: Explain the steps of knowledge discovery
Dr. Hitesh Singh & Dr. Vivek Kumar
in databases?
Machine Learning Unit 1 166
WEEKLY ASSIGNMENT(CONT’d)
Q6: There are various Data Reduction techniques, which one is
having minimum loss of information content? Brief on it.
[CO1]
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 168
MCQ s
o selection.
o preprocessing.
o transformation.
o interpretation.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 169
MCQ s
o Data.
o Information.
o Query.
o Useful information.
o Data.
o Information.
o Query.
o Process.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 170
MCQ s
o Graphical.
o Geometric.
o Icon-based.
o Pixel-based.
o Induction.
o Compression.
o Approximation.
o Substitution.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 171
MCQ s
o dimensionality curse.
o dimensionality reduction.
o cleaning.
o Overfitting.
10. The term that is not associated with data cleaning process is
______.
o domain consistency.
o deduplication.
o disambiguation.
o segmentation.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 172
MCQ s
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 173
MCQ s
14. Data set {brown, black, blue, green, red} is example of Select
one:
o Continuous attribute
o Ordinal attribute
o Numeric attribute
o Nominal attribute
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 174
OLD QUESTION PAPERS
B.Tech
(SEM VI) THEORY EXAMINATION 2017-18
DATAWAREHOUSING AND DATA MINING
Time: 3 Hours Total Marks: 100
Note: 1. Attempt all Sections.
If require any missing data; then choose suitably.
SECTION A
1. Attempt all questions in brief.
2 x 10 = 20
a. Draw the diagram for key steps of data mining.
b. Define the term Support and Confidence.
c. What are attribute selection measures? What is the
drawback of information gain?
d. Differentiate between classification and clustering
e. Write the statement for Apriori Algorithm.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 175
OLD QUESTION PAPERS
f. What are the drawbacks of k‐mean algorithm?
g. What is Chi Square test?
h. Compare Roll up, Drill down operation.
i. What are Hierarchal methods for clustering?
j. Name main features of Genetic Algorithm.
SECTION B
Attempt any three of the following: 10 x 3 =
30
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 176
OLD QUESTION PAPERS
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 177
OLD QUESTION PAPERS
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 179
EXPECTED QUESTIONS FOR UNIVERSITY
EXAM
1. Discuss the steps involved in KDD process.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 181
SUMMARY
➢ Major Pre-Processing task in Data warehousing is data Cleaning,
Integration, Reduction and Transformation.
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 182
REFERENCES
➢https://www.oreilly.com/library/view/datawarehousingarchitectur
e/0130809020/ch07.html
➢https://www.slideshare.net/2cdude/data-warehousing-3292359
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 183
Thank You
9/3/2024 Dr. Hitesh Singh & Dr. Vivek Kumar Machine Learning Unit 1 184