SlideShare a Scribd company logo
Introduction to Data Science
Prepared by
Dr.G.DEENA
AP/CSE
SRMIST
Definition
Data Science is a combination of multiple disciplines that uses
statistics, data analysis, and machine learning to analyze data and
to extract knowledge and insights from it.
Key points
• data gathering, analysis and decision-making.
• finding patterns in data, through analysis, and make future
predictions.
Example:
Better decisions made by companies on particular products.
Predictive analysis- What next?
Hidden Pattern
• Let's consider an example, to find hidden patterns in a online retail purchase dataset to understand
customer behavior.
Scenario:
dataset from an e-commerce website with following information about customer transactions:
• Customer ID: Unique identifier for the customer
• Product ID: Unique identifier for the product purchased
• Product Category: The category of the product (e.g., electronics, clothing, groceries)
• Purchase Date: The date when the purchase was made
• Price: The price of the product purchased
• Quantity: The number of items purchased
Example Objective: We want to find hidden patterns, such as:
• What products are commonly bought together?
• At What time the customers more likely to make a purchase?
• Which product categories are popular in different seasons?
• What is the highest sale of the product in the weekend?
Application
• Data Science is used in many industries in the world today –
e.g. Banking, consultancy, Healthcare, and Manufacturing.
• Route planning: To discover the best routes to ship the goods.
• To foresee delays for flight/ship/train etc. (through predictive
analysis)
• To find the best suited time to deliver goods
• To forecast the next years revenue for a company
• To analyze health benefit of training
• To predict who will win elections
Facets of Data
Very large amount of data will generate in data science and they are
of different types,
• Structured
• Unstructured
• Natural Language
• Graph based
• Machine Generated
• Audio, video and images
Structured Data
• Structured data is arranged in rows and column format.
• structured data refers to data that is identifiable because it is
organized in a structure. The most common form of structured
data or records is a database where specific information is stored
based on a methodology of columns and rows.
• Database management system is used for storing structured data.
(retrieve and process data easily.)
• Structured data is also searchable by data type within content.
• An Excel table is an example of structured data.
Data Science Introduction to Data Science
Unstructured Data
• Unstructured data is data that does not follow a specified format. Row
and columns are not used for unstructured data. Therefore, it is
difficult to retrieve information. It has no identifiable structure.
• It does not follow any template/rules. Hence it is unpredictable in
nature.
• Most of the companies use unstructured data format.
• Eg- word documents, email messages, customer feedbacks, audio,
video, images, email.
Natural Language
• It is a special type of unstructured data.
• Natural language processing(NLP) enables machines to recognize
characters, words and sentences, then apply meaning and
understanding to that information.
• The natural language processing is better in entity recognition, topic
recognition, summarization, text classification and sentiment analysis.
Data Science Process
• The data science process is a powerful toolkit that helps us
unlock hidden knowledge from the available data.
• The data science process is a systematic approach to extracting
knowledge and insights from data.
• It’s a structured framework that guides data scientists through
a series of steps, from defining a problem to communicating
actionable results.
Data Science Process Life Cycle
Framing the Problem
• The process begins with a clear understanding of the problem or
question.
• This process define the project’s objectives and goals.
• A well-defined problem statement acts as a compass, guiding the
entire data science process and ensure the desired outcomes.
Data Collection
• Once the problem clearly defined, its important to collect
the data in data science.
• This involves identifying relevant data sources, whether
internal databases, external APIs, or publicly available
datasets.
• Data scientists must carefully consider the types of data
needed.
Data Cleaning
• Raw data is often messy, with errors, missing values, and
inconsistencies
• This involves removing duplicates, filling in missing values, and
transforming data into a format suitable for further exploration.
• The data cleaning phase is all about removing unwanted and fill
the values as well ensure the data is accurate, complete, and
ready for analysis
Exploratory Data Analysis (EDA)
• EDA is the detective work of data science.
• It’s about to uncover hidden patterns, trends, and anomalies.
• Data scientists use a variety of techniques, including summary
statistics, visualizations, and interactive tools, to gain a deeper
understanding of the data’s characteristics and their
relationships.
• This stage is crucial for identifying potential way for further
investigation.
Model Building
• In this phase, data scientists build models that can predict
future outcomes or classify data into different categories.
• These models are often based on machine learning
algorithms or statistical techniques.
• The choice of model depends on the problem at hand and the
nature of the data.
• Once the model is chosen, it’s trained on the prepared data
to learn patterns and relationships.
Model Deployment
• Once a model is trained and validated, it’s time to put it to work.
• Model deployment involves integrating the model into a
production environment, where it can be used to make
predictions or inform decision-making.
Communicating Results
• The final stage of the data science process involves
communicating the findings and insights to stakeholders.
• This includes creating clear and concise reports, presentations,
and visualizations that effectively convey the results and their
implications.
• The goal is to ensure that stakeholders understand the analysis,
trust the conclusions, and can use the insights to make decisions.
Introduction to NumPy
• NumPy is a Python library used for working with array.
• NumPy stands for Numerical Python.
• It is the fundamental package for mathematical and logical
operations on array.
• Array is homogeneous collection of data.
• The values can be number, characters, Booleans.
• It can be install in Jupyter notebook as pip install numpy
Ad

More Related Content

Similar to Data Science Introduction to Data Science (20)

Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Spartan60
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
asmeerana605
 
Introduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxIntroduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptx
ssuser5cdaa93
 
intoduction of probabliity and statistics
intoduction of probabliity and statisticsintoduction of probabliity and statistics
intoduction of probabliity and statistics
Taranpreet Singh
 
DataScienceandVisualization_Mod_1_ppt.pptx
DataScienceandVisualization_Mod_1_ppt.pptxDataScienceandVisualization_Mod_1_ppt.pptx
DataScienceandVisualization_Mod_1_ppt.pptx
AnithaCL1
 
Data mining
Data miningData mining
Data mining
GILM Project
 
CS3352-Foundations of Data Science Notes.pdf
CS3352-Foundations of Data Science Notes.pdfCS3352-Foundations of Data Science Notes.pdf
CS3352-Foundations of Data Science Notes.pdf
Builders Engineering College
 
Module 2 Data Collection and Management.pdf
Module 2 Data Collection and Management.pdfModule 2 Data Collection and Management.pdf
Module 2 Data Collection and Management.pdf
VinayVitekari
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
Infosectrain3
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
andualemtemesgen3
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
Data mining
Data miningData mining
Data mining
jadhav_priti
 
data science, prior knowledge ,modeling, scatter plot
data science, prior knowledge ,modeling, scatter plotdata science, prior knowledge ,modeling, scatter plot
data science, prior knowledge ,modeling, scatter plot
SteffinAlex
 
This is abouts are you doing the same time who is the best person to be safe and
This is abouts are you doing the same time who is the best person to be safe andThis is abouts are you doing the same time who is the best person to be safe and
This is abouts are you doing the same time who is the best person to be safe and
codekeliyehai
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Spartan60
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
asmeerana605
 
Introduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxIntroduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptx
ssuser5cdaa93
 
intoduction of probabliity and statistics
intoduction of probabliity and statisticsintoduction of probabliity and statistics
intoduction of probabliity and statistics
Taranpreet Singh
 
DataScienceandVisualization_Mod_1_ppt.pptx
DataScienceandVisualization_Mod_1_ppt.pptxDataScienceandVisualization_Mod_1_ppt.pptx
DataScienceandVisualization_Mod_1_ppt.pptx
AnithaCL1
 
Module 2 Data Collection and Management.pdf
Module 2 Data Collection and Management.pdfModule 2 Data Collection and Management.pdf
Module 2 Data Collection and Management.pdf
VinayVitekari
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
Infosectrain3
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
data science, prior knowledge ,modeling, scatter plot
data science, prior knowledge ,modeling, scatter plotdata science, prior knowledge ,modeling, scatter plot
data science, prior knowledge ,modeling, scatter plot
SteffinAlex
 
This is abouts are you doing the same time who is the best person to be safe and
This is abouts are you doing the same time who is the best person to be safe andThis is abouts are you doing the same time who is the best person to be safe and
This is abouts are you doing the same time who is the best person to be safe and
codekeliyehai
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 

Recently uploaded (20)

Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
New Microsoft PowerPoint Presentation.pdf
New Microsoft PowerPoint Presentation.pdfNew Microsoft PowerPoint Presentation.pdf
New Microsoft PowerPoint Presentation.pdf
mohamedezzat18803
 
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxbMain cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
SunilSingh610661
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Resistance measurement and cfd test on darpa subboff model
Resistance measurement and cfd test on darpa subboff modelResistance measurement and cfd test on darpa subboff model
Resistance measurement and cfd test on darpa subboff model
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
 
Artificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptxArtificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptx
DrMarwaElsherif
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
New Microsoft PowerPoint Presentation.pdf
New Microsoft PowerPoint Presentation.pdfNew Microsoft PowerPoint Presentation.pdf
New Microsoft PowerPoint Presentation.pdf
mohamedezzat18803
 
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxbMain cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
SunilSingh610661
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Artificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptxArtificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptx
DrMarwaElsherif
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Ad

Data Science Introduction to Data Science

  • 1. Introduction to Data Science Prepared by Dr.G.DEENA AP/CSE SRMIST
  • 2. Definition Data Science is a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract knowledge and insights from it. Key points • data gathering, analysis and decision-making. • finding patterns in data, through analysis, and make future predictions. Example: Better decisions made by companies on particular products. Predictive analysis- What next?
  • 3. Hidden Pattern • Let's consider an example, to find hidden patterns in a online retail purchase dataset to understand customer behavior. Scenario: dataset from an e-commerce website with following information about customer transactions: • Customer ID: Unique identifier for the customer • Product ID: Unique identifier for the product purchased • Product Category: The category of the product (e.g., electronics, clothing, groceries) • Purchase Date: The date when the purchase was made • Price: The price of the product purchased • Quantity: The number of items purchased Example Objective: We want to find hidden patterns, such as: • What products are commonly bought together? • At What time the customers more likely to make a purchase? • Which product categories are popular in different seasons? • What is the highest sale of the product in the weekend?
  • 4. Application • Data Science is used in many industries in the world today – e.g. Banking, consultancy, Healthcare, and Manufacturing. • Route planning: To discover the best routes to ship the goods. • To foresee delays for flight/ship/train etc. (through predictive analysis) • To find the best suited time to deliver goods • To forecast the next years revenue for a company • To analyze health benefit of training • To predict who will win elections
  • 5. Facets of Data Very large amount of data will generate in data science and they are of different types, • Structured • Unstructured • Natural Language • Graph based • Machine Generated • Audio, video and images
  • 6. Structured Data • Structured data is arranged in rows and column format. • structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data or records is a database where specific information is stored based on a methodology of columns and rows. • Database management system is used for storing structured data. (retrieve and process data easily.) • Structured data is also searchable by data type within content. • An Excel table is an example of structured data.
  • 8. Unstructured Data • Unstructured data is data that does not follow a specified format. Row and columns are not used for unstructured data. Therefore, it is difficult to retrieve information. It has no identifiable structure. • It does not follow any template/rules. Hence it is unpredictable in nature. • Most of the companies use unstructured data format. • Eg- word documents, email messages, customer feedbacks, audio, video, images, email.
  • 9. Natural Language • It is a special type of unstructured data. • Natural language processing(NLP) enables machines to recognize characters, words and sentences, then apply meaning and understanding to that information. • The natural language processing is better in entity recognition, topic recognition, summarization, text classification and sentiment analysis.
  • 10. Data Science Process • The data science process is a powerful toolkit that helps us unlock hidden knowledge from the available data. • The data science process is a systematic approach to extracting knowledge and insights from data. • It’s a structured framework that guides data scientists through a series of steps, from defining a problem to communicating actionable results.
  • 11. Data Science Process Life Cycle
  • 12. Framing the Problem • The process begins with a clear understanding of the problem or question. • This process define the project’s objectives and goals. • A well-defined problem statement acts as a compass, guiding the entire data science process and ensure the desired outcomes.
  • 13. Data Collection • Once the problem clearly defined, its important to collect the data in data science. • This involves identifying relevant data sources, whether internal databases, external APIs, or publicly available datasets. • Data scientists must carefully consider the types of data needed.
  • 14. Data Cleaning • Raw data is often messy, with errors, missing values, and inconsistencies • This involves removing duplicates, filling in missing values, and transforming data into a format suitable for further exploration. • The data cleaning phase is all about removing unwanted and fill the values as well ensure the data is accurate, complete, and ready for analysis
  • 15. Exploratory Data Analysis (EDA) • EDA is the detective work of data science. • It’s about to uncover hidden patterns, trends, and anomalies. • Data scientists use a variety of techniques, including summary statistics, visualizations, and interactive tools, to gain a deeper understanding of the data’s characteristics and their relationships. • This stage is crucial for identifying potential way for further investigation.
  • 16. Model Building • In this phase, data scientists build models that can predict future outcomes or classify data into different categories. • These models are often based on machine learning algorithms or statistical techniques. • The choice of model depends on the problem at hand and the nature of the data. • Once the model is chosen, it’s trained on the prepared data to learn patterns and relationships.
  • 17. Model Deployment • Once a model is trained and validated, it’s time to put it to work. • Model deployment involves integrating the model into a production environment, where it can be used to make predictions or inform decision-making.
  • 18. Communicating Results • The final stage of the data science process involves communicating the findings and insights to stakeholders. • This includes creating clear and concise reports, presentations, and visualizations that effectively convey the results and their implications. • The goal is to ensure that stakeholders understand the analysis, trust the conclusions, and can use the insights to make decisions.
  • 19. Introduction to NumPy • NumPy is a Python library used for working with array. • NumPy stands for Numerical Python. • It is the fundamental package for mathematical and logical operations on array. • Array is homogeneous collection of data. • The values can be number, characters, Booleans. • It can be install in Jupyter notebook as pip install numpy