0% found this document useful (0 votes)
2 views

Data Science

The document outlines an introductory course on Data Science, covering key components such as data collection, cleaning, analysis, and visualization, along with machine learning fundamentals. It highlights the importance of data science across various industries and the skills required, including programming and statistical analysis. The course structure includes hands-on projects, assessments, and a final capstone project to provide practical experience.

Uploaded by

eczhyena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Science

The document outlines an introductory course on Data Science, covering key components such as data collection, cleaning, analysis, and visualization, along with machine learning fundamentals. It highlights the importance of data science across various industries and the skills required, including programming and statistical analysis. The course structure includes hands-on projects, assessments, and a final capstone project to provide practical experience.

Uploaded by

eczhyena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Title: Introduction to Data Science

Subtitle: Day 1
Instructor Name: Ei Cho Zin
Date: 29 - 01 - 2025
Data Science
Data Science Workflow
Data Collection & Sources
Data Cleaning & Preprocessing
Exploratory Data Analysis (EDA)
AGENDA Machine Learning Overview
Data Visualization Tools
Data Science in Python
Real-World Applications
Welcome to Data Science

 derive useful insight for business


decision making
 use scientific methods, processes,
algorithms, and systems
 use tools, techniques, and creativity
to uncover insights hidden within
data.
 combine math, Programming, and
domain expertise to tackle real-world
challenges in a variety of fields.
WHY IS DATA SCIENCE IMPORTANT?
• Explosion of data in the digital age.
• Helps in decision-making and predictive analysis.
• Used across industries: Healthcare, Finance, Retail, Technology, etc.
• High demand for data scientists in the job market.
• Helps Business in Decision-Making: By analyzing data, businesses
can understand trends and make informed choices that reduce risks
and maximize profits.
• Improves Efficiency: Organizations can use data science to identify
areas where they can save time and resources.
• Predicts the Future: Businesses can use data to forecast trends,
demand, and other important factors.
• Drives Innovation: New ideas and products often come from insights
discovered through data science.
KEY COMPONENTS: DATA,
ALGORITHMS, INSIGHTS
Data science involves these key steps:
• Data Collection: Gathering raw data from various sources, such as databases,
sensors, or user interactions.
• Data Cleaning: Ensuring the data is accurate, complete, and ready for analysis.
• Data Analysis: Applying statistical and computational methods to identify patterns,
trends, or relationships.
• Data Visualization: Creating charts, graphs, and dashboards to present findings
clearly.
• Decision-Making: Using insights to inform strategies, create solutions, or predict
outcomes.

5
KEY SKILLS IN DATA SCIENCE

• Programming: Python, R, SQL.

• Statistics and Mathematics: Probability, Linear


Algebra.

• Data Wrangling: Cleaning and preprocessing


data.

• Machine Learning: Supervised and unsupervised


learning.

• Data Visualization: Tools like Tableau, Matplotlib,


Seaborn.
APPLICATIONS OF DATA SCIENCE


SPEAKING IMPACT
Healthcare: Disease prediction, drug
discovery.
• Finance: Fraud detection, risk
management.
• E-commerce: Personalized
recommendations.
• Social Media: Sentiment analysis, trend
prediction.
• Transportation: Route optimization,
autonomous vehicles.
TOOLS AND TECHNOLOGIES

• Programming Languages: Python, R.


• Libraries: Pandas, NumPy, Scikit-learn,
TensorFlow.
• Data Visualization: Tableau, Power BI,
Matplotlib.
• Big Data Tools: Hadoop, Spark.
• Visual: Logos of tools and technologies.
COURSE STRUCTURE AND
EXPECTATIONS

Course Structure and Expectations


• Modules:
• Introduction to Data Science
• Data Wrangling and Cleaning
• Exploratory Data Analysis (EDA)
• Machine Learning Basics
• SQL
• Python
• Expectations:
• Hands-on projects and assignments.
• Regular quizzes and assessments.
• Final Capstone Project.
• Visual: Timeline or roadmap of the course.
DIFFERENCE BETWEEN DATA
SCIENCE AND DATA ANALYTICS
Aspect Data Science Data Analytics
Broad (includes analytics, ML,
Scope AI) Narrow (focuses on analysis)

Objective Discover insights, build models Provide actionable insights


Machine learning, AI, deep Statistical analysis,
Techniques learning visualization

Tools Python, R, TensorFlow, Spark Excel, Tableau, Power BI, SQL

Data Types Structured and unstructured Primarily structured


Reports, dashboards,
Output Predictive models, algorithms visualizations
Career
Roles Data Scientist, ML Engineer Data Analyst, Business Analyst

Example Building a recommendation system Analyzing sales trends


MACHINE LEARNING FUNDAMENTALS
• Supervised Learning: Regression and
Classification

• Unsupervised Learning: Clustering and


Dimensionality Reduction

• Model Evaluation: Accuracy, Precision, Recall, F1


Score
EXPLORATORY DATA ANALYSIS (ED
• Understanding Data Distributions

• Descriptive Statistics: Mean, Median, Mode,


Variance

• Data Visualization: Matplotlib, Seaborn, and Plotly


DATA WRANGLING AND CLEANING

• Data Collection: APIs, Web Scraping, and


Databases

• Data Cleaning: Handling Missing Values, Outliers,


and Duplicates

• Data Transformation: Normalization and Encoding


STRUCTURED QUERY LANGUAGE
(SQL)

• A programming language used to manage and


manipulate relational databases.

• Essential for querying, updating, and analyzing


structured data.

Why SQL for Data Science?

• Most organizations store data in relational


databases.

• SQL is a must-have skill for data extraction and


preprocessing.
DATA SCIENCE WORKFLOW WITH
PYTHON

•Why Python?

• Popular, easy to use, rich ecosystem.

•Key Libraries:

• NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow.

•Python Basics:

• Data types, control structures, functions, file handling.

•Data Science Workflow:

• Collection, cleaning, EDA, modeling, evaluation, deployment.


THANK YOU

You might also like