SlideShare a Scribd company logo
5/30/2024
Fundamentals of Data Mining 1
CourseTitle Fundamentals of Data Mining
Course Code CSI-508
Credit Hours: 3(3-0)
Instructor: NaveedAbbas
Lecture#
Reference Book fundamentals of data mining 4th edition by the morgan pdf
Search instead for fundamentas of data mining 4th edition by the morgan pdf
Download Link chrome-
extension://efaidnbmnnnibpcajpcglclefindmkaj/https://user.engineering.uiowa.edu/~
comp/Public/Kantardzic.pdf
❑ What is data preparation
• Data preparation is the process of gathering, combining, structuring
and organizing data so it can be used in business intelligence analytics
and data visualization applications. The components of data
preparation include data preprocessing, profiling, cleansing, validation
and transformation; it often also involves pulling together data from
different internal systems and external sources.
5/30/2024
Fundamentals of Data Mining 2
❑ Continue….
• Data preparation work is done by information technology (IT), BI and
data management teams as they integrate data sets to load into a data
warehouse, NoSQL database or data lake repository, and then when
new analytics applications are developed with those data sets. In
addition, data scientists, data engineers, other data analysts and
business users increasingly use self service data preparation tools to
collect and prepare data themselves..
5/30/2024
Fundamentals of Data Mining 3
❑ Purposes of data preparation
One of the primary purposes of data preparation is to ensure that raw
data being readied for processing and analysis is accurate and
consistent so the results of BI and analytics applications will be valid.
Data is commonly created with missing values, inaccuracies or other
errors, and separate data sets often have different formats that need
to be reconciled when they're combined. Correcting data errors,
validating data quality and consolidating data sets are big parts of
data preparation projects. 5/30/2024
Fundamentals of Data Mining 4
❑ What are the benefits of data preparation?
Data scientists often complain that they spend most of their time
gathering, cleansing and structuring data instead of analyzing it. A big
benefit of an effective data preparation process is that they and other
end users can focus more on data mining and data analysis the parts of
their job that generate business value.
5/30/2024
Fundamentals of Data Mining 5
❑ Continue….
For example, data preparation can be done more quickly, and
prepared data can automatically be fed to users for recurring
analytics applications.
5/30/2024
Fundamentals of Data Mining 6
❖ Benefits of Data Preparations
ensure the data used in analytics applications
produces reliable results.
identify and fix data issues that otherwise might
not be detected.
enable more informed decision-making by
business executives and operational workers.
5/30/2024
Fundamentals of Data Mining 7
❑ Steps in the data preparation process
Data collection
Data discovery and profiling
Data cleansing
Data structuring
Data transformation and enrichment
Data validation and publishing.
5/30/2024
Fundamentals of Data Mining 8
❑ Data collection
Relevant data is gathered from operational systems, data
warehouses, data lakes and other data sources. During this step,
data scientists, members of the BI team, other data professionals
and end users who collect data should confirm that it's a good fit
for the objectives of the planned analytics applications.
5/30/2024
Fundamentals of Data Mining 9
❑ Data discovery and profiling
The next step is to explore the collected data to better
understand what it contains and what needs to be done to
prepare it for the intended uses. To help with that, data profiling
identifies patterns, relationships and other attributes in the data,
as well as inconsistencies, anomalies, missing values and other
issues so they can be addressed.
5/30/2024
Fundamentals of Data Mining 10
❑ Data cleansing
Next, the identified data errors and issues are corrected to
create complete and accurate data sets. For example, as part of
cleansing data sets, faulty data is removed or fixed, missing
values are filled in and inconsistent entries are harmonized.
5/30/2024
Fundamentals of Data Mining 11
❑ Data structuring
At this point, the data needs to be modeled and organized to
meet the analytics requirements. For example, data stored in
comma-separated values (CSV) files or other file formats has to
be converted into tables to make it accessible to BI and analytics
tools..
5/30/2024
Fundamentals of Data Mining 12
❑ Data transformation and enrichment
In addition to being structured, the data typically must be transformed
into a unified and usable format. For example, data transformation may
involve creating new fields or columns that aggregate values from
existing ones. Data enrichment further enhances and optimizes data
sets as needed, through measures such as augmenting and adding
data.
5/30/2024
Fundamentals of Data Mining 13
❑ Data validation and publishing
In this last step, automated routines are run against the data to
validate its consistency, completeness and accuracy. The
prepared data is then stored in a data warehouse, a data lake or
another repository and either used directly by whoever prepared
it or made available for other users to access.
5/30/2024
Fundamentals of Data Mining 14
❑ Problems in Data Preparations
Inadequate or nonexistent data profiling.
Missing or Incomplete Data.
Invalid data values.
Name and Address Standardization.
Inconsistent data across enterprise systems.
Data enrichment.
Maintaining and expanding data prep processes.
5/30/2024
Fundamentals of Data Mining 15
❑ Problems in Data Preparations
Inadequate or nonexistent data profiling.
If data isn't properly profiled, errors, anomalies and other
problems might not be identified, which can result in flawed
analytics.
5/30/2024
Fundamentals of Data Mining 16
❑ Problems in Data Preparations
Missing or Incomplete Data.
Data sets often have missing values and other forms of
incomplete data; such issues need to be assessed as possible
errors and addressed if so.
5/30/2024
Fundamentals of Data Mining 17
❑ Problems in Data Preparations
Invalid data values.
Misspellings, other typos and wrong numbers are examples of
invalid entries that frequently occur in data and must be fixed to
ensure analytics accuracy.
5/30/2024
Fundamentals of Data Mining 18
❑ Problems in Data Preparations
Name and Address Standardization.
Names and addresses may be inconsistent in data from different
systems, with variations that can affect views of customers and
other entities
5/30/2024
Fundamentals of Data Mining 19
❑ Problems in Data Preparations
Inconsistent data across enterprise systems.
Other inconsistencies in data sets drawn from multiple source
systems, such as different terminology and unique identifiers, are
also a pervasive issue in data preparation efforts.
5/30/2024
Fundamentals of Data Mining 20
❑ Problems in Data Preparations.
Data enrichment.
Deciding how to enrich a data set -- for example, what to add to it
is a complex task that requires a strong understanding of
business needs and analytics goals.
5/30/2024
Fundamentals of Data Mining 21
❑ Problems in Data Preparations.
Maintaining and expanding data prep processes.
Data preparation work often becomes a recurring process that
needs to be sustained and enhanced on an ongoing basis.
5/30/2024
Fundamentals of Data Mining 22

More Related Content

Similar to Fundamentals of Data Mining in object oriented programming. (20)

PPTX
SRU_RK_Lecturer1 about datamining cocepts
coolscools1231
 
PPTX
Exploratory data analysis for business MODULE 1.pptx
YashwanthKumar306128
 
PDF
Data Mining
SOMASUNDARAM T
 
PDF
3-DataPreprocessing a complete guide.pdf
shobyscms
 
PPT
`Data mining
Jebin R
 
PPTX
Data mining concepts
Basit Rafiq
 
PPTX
omama munir 58.pptx
OmamaNoor2
 
PPT
Introduction of Data Mining - Concept and techniques
SUMANTWACHASUNDAR1
 
PPTX
Data mining
hardavishah56
 
PPT
Data mining
Alisha Korpal
 
PDF
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
PDF
CRISP-DM Agile Approach to Data Mining Projects
Data Science Warsaw
 
PPT
Datamining
IssacArputharajJeyak
 
PPT
Datamining
IssacArputharajJeyak
 
PPTX
INTRODUCTION TO DATA ANALYTICS -MODULE 1.pptx
paathuu04
 
PDF
Data mining chapter for students of university
hossainsafari4
 
PPTX
data mining is the process of data reduction in the field of data mining
naveedabbas61
 
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
PPTX
Data mining
nandini patil
 
PPTX
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
SRU_RK_Lecturer1 about datamining cocepts
coolscools1231
 
Exploratory data analysis for business MODULE 1.pptx
YashwanthKumar306128
 
Data Mining
SOMASUNDARAM T
 
3-DataPreprocessing a complete guide.pdf
shobyscms
 
`Data mining
Jebin R
 
Data mining concepts
Basit Rafiq
 
omama munir 58.pptx
OmamaNoor2
 
Introduction of Data Mining - Concept and techniques
SUMANTWACHASUNDAR1
 
Data mining
hardavishah56
 
Data mining
Alisha Korpal
 
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
CRISP-DM Agile Approach to Data Mining Projects
Data Science Warsaw
 
INTRODUCTION TO DATA ANALYTICS -MODULE 1.pptx
paathuu04
 
Data mining chapter for students of university
hossainsafari4
 
data mining is the process of data reduction in the field of data mining
naveedabbas61
 
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
Data mining
nandini patil
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 

More from naveedabbas61 (10)

PPTX
Data Communication and computing networking device
naveedabbas61
 
PPTX
Ch-09 (ICS I) - Spreadsheet Software.pptx
naveedabbas61
 
PPTX
Ch-06 (ICS I) - Security, Copyright and the Law.pptx
naveedabbas61
 
PDF
web application development is an technique which
naveedabbas61
 
PDF
computer organization and assembly language
naveedabbas61
 
PDF
computer organization and assembly language
naveedabbas61
 
PDF
mobile application development are used to
naveedabbas61
 
PDF
Ch-02 (ICS I) - Information Networks.pdf
naveedabbas61
 
PDF
Data Mining is the process ofData Mining is the process ofData Mining is the ...
naveedabbas61
 
PDF
DBMS its Advantages and Disadvantages.pdf
naveedabbas61
 
Data Communication and computing networking device
naveedabbas61
 
Ch-09 (ICS I) - Spreadsheet Software.pptx
naveedabbas61
 
Ch-06 (ICS I) - Security, Copyright and the Law.pptx
naveedabbas61
 
web application development is an technique which
naveedabbas61
 
computer organization and assembly language
naveedabbas61
 
computer organization and assembly language
naveedabbas61
 
mobile application development are used to
naveedabbas61
 
Ch-02 (ICS I) - Information Networks.pdf
naveedabbas61
 
Data Mining is the process ofData Mining is the process ofData Mining is the ...
naveedabbas61
 
DBMS its Advantages and Disadvantages.pdf
naveedabbas61
 
Ad

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna36
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna36
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Ad

Fundamentals of Data Mining in object oriented programming.

  • 1. 5/30/2024 Fundamentals of Data Mining 1 CourseTitle Fundamentals of Data Mining Course Code CSI-508 Credit Hours: 3(3-0) Instructor: NaveedAbbas Lecture# Reference Book fundamentals of data mining 4th edition by the morgan pdf Search instead for fundamentas of data mining 4th edition by the morgan pdf Download Link chrome- extension://efaidnbmnnnibpcajpcglclefindmkaj/https://user.engineering.uiowa.edu/~ comp/Public/Kantardzic.pdf
  • 2. ❑ What is data preparation • Data preparation is the process of gathering, combining, structuring and organizing data so it can be used in business intelligence analytics and data visualization applications. The components of data preparation include data preprocessing, profiling, cleansing, validation and transformation; it often also involves pulling together data from different internal systems and external sources. 5/30/2024 Fundamentals of Data Mining 2
  • 3. ❑ Continue…. • Data preparation work is done by information technology (IT), BI and data management teams as they integrate data sets to load into a data warehouse, NoSQL database or data lake repository, and then when new analytics applications are developed with those data sets. In addition, data scientists, data engineers, other data analysts and business users increasingly use self service data preparation tools to collect and prepare data themselves.. 5/30/2024 Fundamentals of Data Mining 3
  • 4. ❑ Purposes of data preparation One of the primary purposes of data preparation is to ensure that raw data being readied for processing and analysis is accurate and consistent so the results of BI and analytics applications will be valid. Data is commonly created with missing values, inaccuracies or other errors, and separate data sets often have different formats that need to be reconciled when they're combined. Correcting data errors, validating data quality and consolidating data sets are big parts of data preparation projects. 5/30/2024 Fundamentals of Data Mining 4
  • 5. ❑ What are the benefits of data preparation? Data scientists often complain that they spend most of their time gathering, cleansing and structuring data instead of analyzing it. A big benefit of an effective data preparation process is that they and other end users can focus more on data mining and data analysis the parts of their job that generate business value. 5/30/2024 Fundamentals of Data Mining 5
  • 6. ❑ Continue…. For example, data preparation can be done more quickly, and prepared data can automatically be fed to users for recurring analytics applications. 5/30/2024 Fundamentals of Data Mining 6
  • 7. ❖ Benefits of Data Preparations ensure the data used in analytics applications produces reliable results. identify and fix data issues that otherwise might not be detected. enable more informed decision-making by business executives and operational workers. 5/30/2024 Fundamentals of Data Mining 7
  • 8. ❑ Steps in the data preparation process Data collection Data discovery and profiling Data cleansing Data structuring Data transformation and enrichment Data validation and publishing. 5/30/2024 Fundamentals of Data Mining 8
  • 9. ❑ Data collection Relevant data is gathered from operational systems, data warehouses, data lakes and other data sources. During this step, data scientists, members of the BI team, other data professionals and end users who collect data should confirm that it's a good fit for the objectives of the planned analytics applications. 5/30/2024 Fundamentals of Data Mining 9
  • 10. ❑ Data discovery and profiling The next step is to explore the collected data to better understand what it contains and what needs to be done to prepare it for the intended uses. To help with that, data profiling identifies patterns, relationships and other attributes in the data, as well as inconsistencies, anomalies, missing values and other issues so they can be addressed. 5/30/2024 Fundamentals of Data Mining 10
  • 11. ❑ Data cleansing Next, the identified data errors and issues are corrected to create complete and accurate data sets. For example, as part of cleansing data sets, faulty data is removed or fixed, missing values are filled in and inconsistent entries are harmonized. 5/30/2024 Fundamentals of Data Mining 11
  • 12. ❑ Data structuring At this point, the data needs to be modeled and organized to meet the analytics requirements. For example, data stored in comma-separated values (CSV) files or other file formats has to be converted into tables to make it accessible to BI and analytics tools.. 5/30/2024 Fundamentals of Data Mining 12
  • 13. ❑ Data transformation and enrichment In addition to being structured, the data typically must be transformed into a unified and usable format. For example, data transformation may involve creating new fields or columns that aggregate values from existing ones. Data enrichment further enhances and optimizes data sets as needed, through measures such as augmenting and adding data. 5/30/2024 Fundamentals of Data Mining 13
  • 14. ❑ Data validation and publishing In this last step, automated routines are run against the data to validate its consistency, completeness and accuracy. The prepared data is then stored in a data warehouse, a data lake or another repository and either used directly by whoever prepared it or made available for other users to access. 5/30/2024 Fundamentals of Data Mining 14
  • 15. ❑ Problems in Data Preparations Inadequate or nonexistent data profiling. Missing or Incomplete Data. Invalid data values. Name and Address Standardization. Inconsistent data across enterprise systems. Data enrichment. Maintaining and expanding data prep processes. 5/30/2024 Fundamentals of Data Mining 15
  • 16. ❑ Problems in Data Preparations Inadequate or nonexistent data profiling. If data isn't properly profiled, errors, anomalies and other problems might not be identified, which can result in flawed analytics. 5/30/2024 Fundamentals of Data Mining 16
  • 17. ❑ Problems in Data Preparations Missing or Incomplete Data. Data sets often have missing values and other forms of incomplete data; such issues need to be assessed as possible errors and addressed if so. 5/30/2024 Fundamentals of Data Mining 17
  • 18. ❑ Problems in Data Preparations Invalid data values. Misspellings, other typos and wrong numbers are examples of invalid entries that frequently occur in data and must be fixed to ensure analytics accuracy. 5/30/2024 Fundamentals of Data Mining 18
  • 19. ❑ Problems in Data Preparations Name and Address Standardization. Names and addresses may be inconsistent in data from different systems, with variations that can affect views of customers and other entities 5/30/2024 Fundamentals of Data Mining 19
  • 20. ❑ Problems in Data Preparations Inconsistent data across enterprise systems. Other inconsistencies in data sets drawn from multiple source systems, such as different terminology and unique identifiers, are also a pervasive issue in data preparation efforts. 5/30/2024 Fundamentals of Data Mining 20
  • 21. ❑ Problems in Data Preparations. Data enrichment. Deciding how to enrich a data set -- for example, what to add to it is a complex task that requires a strong understanding of business needs and analytics goals. 5/30/2024 Fundamentals of Data Mining 21
  • 22. ❑ Problems in Data Preparations. Maintaining and expanding data prep processes. Data preparation work often becomes a recurring process that needs to be sustained and enhanced on an ongoing basis. 5/30/2024 Fundamentals of Data Mining 22