DS3 Data Science Introduction
DS3 Data Science Introduction
Introduction
Contents
Sources of Data
Analytical Methods
Visualization Techniques
Data Science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights Machine Learning
from noisy, structured and unstructured data, and apply knowledge and
actionable insights from data across a broad range of application
domains. Deep Neural Networks
INSIGHTS
PREDICTIONS
Data Science Use Cases
Banks are creating solutions for making on-the-spot decisions to loan applicants using machine
learning-powered credit risk models.
Data Science is positively impacting all
Industries
E-Commerce companies are using Machine Learning to build powerful product recommendation
engines that helps suggest relevant products to its website visitors.
Digital media companies have started using Data Science based solutions to use Media Analytics
and predictive models to smartly embed media to the users’ taste and preferences in the right
channel at the right timing for maximising viewership and revenue.
Healthcare companies are developing solutions to scan medical imaging to automatically detect
diseases without doctor’s intervention.
Manufacturing companies are using Analytics and Prediction on machine-sensor data to prevent
faults in the machinery y detecting them before they break down.
Who oversees Data Science in an Organization
Business Leaders: They work with the Data Science teams to define business problems and develop strategies to solve
them. They may be the head of a line of business, such as marketing, finance, or sales, and have a data science team
reporting directly or indirectly to them.
IT Leaders: Senior IT Leaders are responsible for the infrastructure and architecture that will support data science
operations. They are continually monitoring operations and resource usage to ensure that data science teams operate
efficiently and securely. They may also be responsible for building and updating IT environments for data science teams.
Data Science and Analytics Leaders: They oversee the Data Science teams and their day-to-day work. They are team
builders who can balance team development with project planning and monitoring.
But the most important player in this process is the Data Scientist.
Who is a Data Scientist
👉 A Data Scientist’s duties can include developing strategies for analysing data, preparing data for
analysis, exploring, analysing, and visualizing data, building models with data using
programming languages, such as Python and R, and deploying models into applications.
👉 The Data Scientist works in teams. In addition to a data scientist, this team might include a
business analyst who defines the problem, a data engineer who prepares the data and how it is
accessed, an IT architect who oversees the underlying processes and infrastructure, and an
production engineer who deploys the models or outputs of the analysis into applications and
products.
Data Science / Machine Learning Project Lifecycle
Problem Statement
Data Acquisition
Model Training
Model Optimization
Feature
Deploy
Extraction
Enriching
Model Update
Transformations
Visualisations
R Language Base Programming Language. Very mature and supported well. More specialized for
Statistical problem solving.
Julia Based Programming Language, more suited for scientific and mathematical
problems.
ML Libraries TensorFlow Most popular Tensor Management Library from Google. This is the backend for
most Deep Learning tasks.
ML APIs Keras User friendly Machine Learning API that can tap TensorFlow, CNTK, Theano.
PyTorch ML User API and can work with all the ML backend Libraries.
Visualization Libraries Matplotlib Most used Visualization Library and backend for may modern Libraries
Python Language
Fundamentals of Statistics
Data Science aims to derive wisdom out of raw Data through a series of steps
Analytics vs. Data Science
Data Analyst Data Scientist
Gathered from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media that
Social Data are uploaded and shared via social media platforms. This kind of data provides invaluable
insights into consumer behaviour and sentiment and can be enormously influential in marketing
analytics.
These are data generated by industrial equipment, sensors that are installed in machinery, and
Machine Data even web logs which track user behaviour. This type of data is expected to grow exponentially as
the internet of things grows ever more pervasive and expands around the world. Sensors such as
medical devices, smart meters, road cameras, satellites, games and the rapidly growing Internet
Of Things will deliver high velocity, value, volume and variety of data in the very near future.
These are generated from all the daily transactions that take place both online and offline.
Transactional Data Invoices, payment orders, storage records, delivery receipts – all are characterized as
transactional data.
Data Science Project Life Cycle: CRISP-DM Model
FREE
PPT TEMPLATES
www.allppt.com
Fully Editable Icon Sets: B
You can Resize without
losing quality
You can Change Fill
Color &
Line Color
FREE
PPT TEMPLATES
www.allppt.com
Fully Editable Icon Sets: C
You can Resize without
losing quality
You can Change Fill
Color &
Line Color
FREE
PPT TEMPLATES
www.allppt.com