0% found this document useful (0 votes)
18 views

DS3 Data Science Introduction

This document provides an overview of practical data science using Python. It discusses what data science is, common use cases, the data science process, sources of data, tools and technologies used, and the scope of study in the course. Data science is an interdisciplinary field that uses scientific methods and algorithms to extract knowledge and insights from structured and unstructured data. It has various use cases across industries like banking, e-commerce, healthcare, and manufacturing. The data science process involves problem definition, data acquisition, analysis, modeling, and deploying models. Programming languages like Python and R are commonly used along with machine learning libraries and visualization tools.

Uploaded by

vikas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

DS3 Data Science Introduction

This document provides an overview of practical data science using Python. It discusses what data science is, common use cases, the data science process, sources of data, tools and technologies used, and the scope of study in the course. Data science is an interdisciplinary field that uses scientific methods and algorithms to extract knowledge and insights from structured and unstructured data. It has various use cases across industries like banking, e-commerce, healthcare, and manufacturing. The data science process involves problem definition, data acquisition, analysis, modeling, and deploying models. Programming languages like Python and R are commonly used along with machine learning libraries and visualization tools.

Uploaded by

vikas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Practical Data Science using Python

Introduction
Contents

What is Data Science

Use Cases of Data Science

Data Science Process Steps

Data Science, Machine Learning and Analytics

Sources of Data

The CRISP-DM Methodology


What is Data Science
Statistical Analysis of Data

Analytical Methods

Visualization Techniques
Data Science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights Machine Learning
from noisy, structured and unstructured data, and apply knowledge and
actionable insights from data across a broad range of application
domains. Deep Neural Networks

INSIGHTS

PREDICTIONS
Data Science Use Cases

Banks are creating solutions for making on-the-spot decisions to loan applicants using machine
learning-powered credit risk models.
Data Science is positively impacting all
Industries
E-Commerce companies are using Machine Learning to build powerful product recommendation
engines that helps suggest relevant products to its website visitors.

Digital media companies have started using Data Science based solutions to use Media Analytics
and predictive models to smartly embed media to the users’ taste and preferences in the right
channel at the right timing for maximising viewership and revenue.

Healthcare companies are developing solutions to scan medical imaging to automatically detect
diseases without doctor’s intervention.

Manufacturing companies are using Analytics and Prediction on machine-sensor data to prevent
faults in the machinery y detecting them before they break down.
Who oversees Data Science in an Organization

Business Leaders: They work with the Data Science teams to define business problems and develop strategies to solve
them. They may be the head of a line of business, such as marketing, finance, or sales, and have a data science team
reporting directly or indirectly to them.

IT Leaders: Senior IT Leaders are responsible for the infrastructure and architecture that will support data science
operations. They are continually monitoring operations and resource usage to ensure that data science teams operate
efficiently and securely. They may also be responsible for building and updating IT environments for data science teams.

Data Science and Analytics Leaders: They oversee the Data Science teams and their day-to-day work. They are team
builders who can balance team development with project planning and monitoring.

But the most important player in this process is the Data Scientist.
Who is a Data Scientist

👉 A Data Scientist’s duties can include developing strategies for analysing data, preparing data for
analysis, exploring, analysing, and visualizing data, building models with data using
programming languages, such as Python and R, and deploying models into applications.

👉 The Data Scientist works in teams. In addition to a data scientist, this team might include a
business analyst who defines the problem, a data engineer who prepares the data and how it is
accessed, an IT architect who oversees the underlying processes and infrastructure, and an
production engineer who deploys the models or outputs of the analysis into applications and
products.
Data Science / Machine Learning Project Lifecycle

Problem Statement

Data Acquisition

Every Machine Learning/Data Science initiative or project follow Data Analysis


certain steps. There would be variations depending on the business
goal and the type of Machine Learning algorithm used. However,
there are some broad steps that will almost always be applicable. In Pre-processing of Data
the following few sections, we will look at these steps forming a
complete Machine Learning project.
Model Selection

Model Training

Model Optimization

Deploy, Monitor, Maintain


EDA / Wrangling / Munging

Feature
Deploy
Extraction

ML / Deep Model Training


Data Data Collection Data Structuring Learning and
Sources Algorithms Optimization
Monitoring
Data Cleaning

Enriching
Model Update
Transformations

Visualisations

Data Analysis / Pre-processing Model and Training Production

Machine Learning, Predictive Algorithms,


Python, Statistical Techniques, Exploratory Data Analysis,
TOOLS

Regression, Classification and Clustering,


Visualisations, RDBMS, Various Data Formats, Excel ML-Ops
Hyperparameter Tuning and Model
Analytics
Optimization
Data Science Tools and Technologies
Type of Data Science Tools Tool Name Functions
Programming Languages Python Base Programming Language for Data Science and ML Tasks. Massive amount of
Libraries and community support available.

R Language Base Programming Language. Very mature and supported well. More specialized for
Statistical problem solving.

Julia Based Programming Language, more suited for scientific and mathematical
problems.

ML Libraries TensorFlow Most popular Tensor Management Library from Google. This is the backend for
most Deep Learning tasks.

Microsoft CNTK ML Backend Library from Microsoft.

Theano ML Backend Library, more suited for Scientific applications.

ML APIs Keras User friendly Machine Learning API that can tap TensorFlow, CNTK, Theano.

PyTorch ML User API and can work with all the ML backend Libraries.

Data Libraries Numpy Array management library

Pandas Data Structure Library (DataFrame and Series Objects)

Visualization Libraries Matplotlib Most used Visualization Library and backend for may modern Libraries

Seaborn Visualization Library built on top of MarplotLib

Bokeh A more modern and elegant Visualization Library

Analytics Tools Tableau Analytics Application for Enterprises

SAS Analytics Application for Enterprises

Qlik Sense Analytics Application for Enterprises


Scope of Study in this Course

Python Language

Fundamentals of Statistics

Data Analytics Techniques

Data Visualization Techniques

Machine Learning • Types of Machine Learning Algorithms


• Model Building
• Model Evaluation
• Model Optimization
Data to Wisdom – Purpose of Data Science

Data Raw, in primary format, collected at source

Information Finding relationships within data

Knowledge When it is useful for a purpose

Wisdom When it is helpful in making decisions

Data Science aims to derive wisdom out of raw Data through a series of steps
Analytics vs. Data Science
Data Analyst Data Scientist

👉 Analytics is performed manually on Data by setting up rules of


information extraction. Data Science (Predictive) models
perform automatic pattern recognition on the data. • Advanced Statistical
• BI Tools
Methods
👉 The above also means that the Predictive or Machine Learning • Intermediate Statistics
• Machine Learning
Models are dynamic based on changing data while Analytics • RDBMS
• Data Analysis
patterns are not. • Excel
• Visualizations
• Data Visualisations
• Predictive Analytics
👉 In analytics, testing is performed to check that the defined • Data Analysis
• Data Mining
outcome is achieved as expected, while in machine learning, • Data Mining
• ML-Ops
model is tested against prior labels/outcomes to optimize its
• Algorithm Development
design depending on the nature of the data.

👉 Techniques and tools used to develop analytics models and


machine learning models differ. Machine learning modelling
techniques are much more advanced and are built on
statistical and mathematical principles related to how the
machine will learn to optimize the model performance.
Data Science vs. Machine Learning

👉 Data Science encompasses a number of tools, techniques


and technologies in dealing with data, especially in bringing
out wisdom from raw data.

👉 Machine Learning, Neural Networks, Statistical Analysis are


some of the areas that comprise Data Science.

👉 Some of the Techniques that are part of Data Science are


§ Data Collection
§ Data Mining
§ Data Extraction
§ Data Cleaning/refining
§ Data Analysis
§ Data Visualization
§ Predictive Modelling/Machine Learning
Sources of Data in Data Science

Gathered from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media that
Social Data are uploaded and shared via social media platforms. This kind of data provides invaluable
insights into consumer behaviour and sentiment and can be enormously influential in marketing
analytics.

These are data generated by industrial equipment, sensors that are installed in machinery, and
Machine Data even web logs which track user behaviour. This type of data is expected to grow exponentially as
the internet of things grows ever more pervasive and expands around the world. Sensors such as
medical devices, smart meters, road cameras, satellites, games and the rapidly growing Internet
Of Things will deliver high velocity, value, volume and variety of data in the very near future.

These are generated from all the daily transactions that take place both online and offline.
Transactional Data Invoices, payment orders, storage records, delivery receipts – all are characterized as
transactional data.
Data Science Project Life Cycle: CRISP-DM Model

The CRISP-DM model is flexible and can be customized.

For example, if your organization aims to detect money laundering, it is


likely that you will sift through large amounts of data without a specific
modelling goal. Instead of modelling, your work will focus on data
exploration and visualization to uncover suspicious patterns in financial
data.

In such a situation, the modelling, evaluation, and deployment phases


might be less relevant than the data understanding and preparation
phases.

Cross-Industry Standard Process for Data Mining


(CRISP-DM)
Fully Editable Icon Sets: A
You can Resize without
losing quality
You can Change Fill
Color &
Line Color

FREE
PPT TEMPLATES
www.allppt.com
Fully Editable Icon Sets: B
You can Resize without
losing quality
You can Change Fill
Color &
Line Color

FREE
PPT TEMPLATES
www.allppt.com
Fully Editable Icon Sets: C
You can Resize without
losing quality
You can Change Fill
Color &
Line Color

FREE
PPT TEMPLATES
www.allppt.com

You might also like