0% found this document useful (0 votes)
30 views48 pages

Data Science Intro Session-18 & 19

This document provides an introduction to data science, including defining data science, explaining its components and life cycle, and describing different types of data. It aims to familiarize students with basic concepts in data science such as classifying data, understanding analytics, and learning outcomes related to defining data science and summarizing its classification. Key topics covered include the definition of data science, its multidisciplinary nature, components like statistics, machine learning, and tools, the data science life cycle of discovery, preparation, modeling, and communicating results, and different data types including structured, unstructured, machine-generated, and streaming data.

Uploaded by

s6652565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views48 pages

Data Science Intro Session-18 & 19

This document provides an introduction to data science, including defining data science, explaining its components and life cycle, and describing different types of data. It aims to familiarize students with basic concepts in data science such as classifying data, understanding analytics, and learning outcomes related to defining data science and summarizing its classification. Key topics covered include the definition of data science, its multidisciplinary nature, components like statistics, machine learning, and tools, the data science life cycle of discovery, preparation, modeling, and communicating results, and different data types including structured, unstructured, machine-generated, and streaming data.

Uploaded by

s6652565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Introduction to

Data science

CO – 4

Session 18 &19

1
AIM

To familiarize students with the basic concept of Data Science: Types of Data, Classification and Analytics

INSTRUCTIONAL OBJECTIVES

This topic is designed to:

1. Understand about types of data.

2. Understand the classification of Data Science and Data Analytics

LEARNING OUTCOMES

At the end of this unit, you should be able to:


1. Define the Data Science
2. Summarizes the Classification of Data Science
3. Describe the data analytics

2
WHAT IS DATA SCIENCE?

• Data science is a deep study of the massive amount of data, which involves extracting meaningful
insights from raw, structured, and unstructured data that is processed using the scientific method,
different technologies, and algorithms.
• It is a multidisciplinary field that uses tools and techniques to manipulate the data so that we can
find something new and meaningful.
• Data science uses the most powerful hardware, programming systems, and most efficient
algorithms to solve the data related problems. It is the future of artificial intelligence.
• In short, we can say that data science is all about:
• Asking the correct questions and analyzing the raw data.
• Modeling the data using various complex and efficient algorithms.
• Visualizing the data to get a better perspective.
• Understanding the data to make better decisions and finding the final result.
3
WHAT IS DATA SCIENCE?...

• Data science is a multidisciplinary approach to extract actionable insights from the


large and ever-increasing volumes of data collected and created by organizations.
• Data science enables businesses to process huge amounts of structured and
unstructured big data to detect patterns.

4
WHY DATA SCIENCE?

• With the right tools, technologies, algorithms, we can use data and convert it into a
distinct business advantage
• Data Science help to detect fraud using advanced machine learning algorithms
• It helps to prevent any significant monetary losses
• Allows to build intelligence ability in machines
• It enables to take better and faster decisions
• It helps to recommend the right product to the right customer to enhance your business

5
DATA SCIENCE COMPONENTS

6
DATA SCIENCE COMPONENTS…

1. Statistics: Statistics is one of the most important components of data science. Statistics is a way
to collect and analyze the numerical data in a large amount and finding meaningful insights from it.
2. Domain Expertise: In data science, domain expertise binds data science together. Domain
expertise means specialized knowledge or skills of a particular area. In data science, there are
various areas for which we need domain experts.
3. Data engineering: Data engineering is a part of data science, which involves acquiring, storing,
retrieving, and transforming the data. Data engineering also includes metadata (data about data) to
the data.
4. Visualization: Data visualization is meant by representing data in a visual context so that people
can easily understand the significance of data. Data visualization makes it easy to access the huge
amount of data in visuals.

7
DATA SCIENCE COMPONENTS…

5. Advanced computing: Advanced computing involves designing, writing, debugging,


and maintaining the source code of computer programs.
6. Mathematics: Mathematics is the critical part of data science. Mathematics involves the
study of quantity, structure, space, and changes. For a data scientist, knowledge of good
mathematics is essential.
7. Machine learning: Machine learning is backbone of data science. Machine learning is
all about to provide training to a machine so that it can act as a human brain. In data
science, we use various machine learning algorithms to solve the problems.

8
DATA SCIENCE COMPONENTS…

9
TOOLS FOR DATA SCIENCE

Following are some tools required for data science:


• Data Analysis tools: R, Python, Statistics, SAS,
Jupiter, R Studio, MATLAB, Excel, RapidMiner.
• Data Warehousing: ETL, SQL, Hadoop,
Informatica/Talend, AWS Redshift
• Data Visualization tools: R, Jupyter, Tableau,
Cognos.
• Machine learning tools: Spark, Mahout, Azure
ML studio.

10
DATA SCIENCE LIFE CYCLE

11
DATA SCIENCE LIFE CYCLE…

1. Discovery: The first phase is discovery, which involves asking the right questions. When we
start any data science project, we need to determine what are the basic requirements, priorities,
and project budget.
In this phase, we need to determine all the requirements of the project such as the number of
people, technology, time, data, an end goal, and then we can frame the business problem on first
hypothesis level.
2. Data preparation: In this phase, we need to perform the following tasks:
• Data cleaning
• Data Reduction
• Data integration
• Data transformation,
After performing all the above tasks, we can easily use this data for our further processes.
12
DATA SCIENCE LIFE CYCLE…

3. Model Planning: In this phase, we need to determine the various methods and techniques
to establish the relation between input variables.
We will apply Exploratory data analytics (EDA) by using various statistical formula and
visualization tools to understand the relations between variable and to see what data can
inform us.
Common tools used for model planning are:
• SQL Analysis Services
• R
• SAS
• Python

13
DATA SCIENCE LIFE CYCLE…

4. Model-building: In this phase, the process of model building starts.


We will create datasets for training and testing purpose.
We will apply different techniques such as association, classification, and
clustering, to build the model.
Following are some common Model building tools:
• SAS Enterprise Miner
• WEKA
• SPCS Modeler
• MATLAB

14
DATA SCIENCE LIFE CYCLE…

5. Operationalize: In this phase, we will deliver the final reports of the project, along with
briefings, code, and technical documents.
This phase provides a clear overview of complete project performance and other
components on a small scale before the full deployment.

6. Communicate results: In this phase, we will check if we reach the goal, which we have
set on the initial phase.
We will communicate the findings and final result with the business team.

15
TYPES OF DATA

• In data science and big data different types of data will be used, and each of them
tends to require different tools and techniques.
• The main categories of data are:
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming

16
Types of data…
Structured data is data that depends on a data model and resides in a fixed field within a record.

17
Unstructured data is data that isn’t easy to fit into a data model because the
content is context-specific or varying.

18
Machine-generated data is information that’s automatically created by a
computer, process, application, or other machine without human intervention.

19
Graph-based data is a natural way to represent social networks, and its structure
allows to calculate specific metrics such as the influence of a person and the shortest
path between two people.

20
AUDIO, IMAGE AND VIDEO

• Audio, image, and video are data types that pose specific challenges to a data scientist.

• Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for computers.

• MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to
approximately 7 TB per game for the purpose of live, in-game analytics.
• High-speed cameras at stadiums will capture ball and athlete movements to calculate in real time, for example,
the path taken by a defender relative to two baselines.
• Recently a company called Deep Mind succeeded at creating an algorithm that’s capable of learning how to play
video games. This algorithm takes the video screen as input and learns to interpret everything via a complex
process of deep learning.
• It’s a remarkable feat that prompted Google to buy the company for their own Artificial Intelligence (AI)
development plans.
• The learning algorithm takes in data as it’s produced by the computer game; it’s streaming data.

21
STREAMING DATA

• While streaming data can take almost any of the previous forms, it has an
extra property.
• The data flows into the system when an event happens instead of being
loaded into a data store in a batch.
• Although this isn’t really a different type of data, we treat it here as such
because you need to adapt process to deal with this type of information.
• Examples are the “What’s trending” on Twitter, live sporting or music events,
and the stock market.

22
DATA ANALYTICS

• At its core, data analytics is about answering questions and making decisions.

• Just as there are different types of questions, there are also different types of data
analytics depending on what we are going to accomplish.

• Data analytics is the science of analyzing raw data to make conclusions about that
information.

• The techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.

• Data analytics help a business to optimize its performance.

23
FOUR PRIMARY TYPES OF DATA ANALYTICS

24
TYPES OF DATA ANALYTICS…

1. Descriptive Analytics, which tells us what happened in the past


2. Diagnostic Analytics, which helps us why something happened in the past
3. Predictive Analytics, which predicts what’s most likely to happen in the future.
4. Prescriptive Analytics, which recommends actions we can take to affect those likely
outcomes.

25
FOUR PRIMARY TYPES OF DATA ANALYTICS…

26
1. DESCRIPTIVE ANALYTICS

• Descriptive analytics is typically the starting point in business intelligence. It uses data aggregation and
data mining to collect and organize historical data, producing visualizations such as line graphs, bar
charts, pie charts.
• Descriptive analytics presents a clear picture of what has happened in the past, such as statistical
modeling does, and it stops there — it doesn’t make interpretations or advise on future actions.
• Descriptive analytics juggles raw data from multiple data sources to give valuable insights into the past.
However, these findings simply signal that something is wrong or right, without explaining why.
• For example, Data analysts are working with an e-commerce marketing team to review sales data to
identify the sales trends and patterns, you will see an increase or decrease in sales from last year,
specifically in what region and by what percentage.
• Descriptive analytics can benefit decision-makers from every department in a company, from finance to
operations.
27
2. DIAGNOSTIC ANALYTICS

• Once we know what happened, we’ll want to know why it happened. That’s where
diagnostic analytics comes in. Understanding why a trend is developing or why a
problem occurred will make our business intelligence actionable.
• It prevents our team from making inaccurate guesses, particularly related to confusing
correlation and causality.
• Because diagnostic analytics is used to identify the origin of business issues and find
appropriate solutions to prevent them from happening in the future, it is also called root
cause analysis.

28
3. PREDICTIVE ANALYTICS

• When you know what happened in the past and understand why it happened, you can then
begin to predict what is likely to occur in the future based on that information.
• Predictive analytics takes the investigation a step further, using statistics, computational
modeling, and machine learning to determine the probability of various outcomes.
• Predictive analytics is especially powerful for teams because it allows decision-makers to
be more confident about the future.

29
4. PRESCRIPTIVE ANALYTICS

• Prescriptive analytics is where the action is. This type of analytics tells teams what they need to
do based on the predictions made.
• It’s the most complex type, which is why less than 3% of companies are using it in their
business.
• Prescriptive analytics anticipates what, when, and why an event or trend might happen. It tells us
what actions have the highest potential for the best outcome.
• It allows to fix problems, improve performance, and jump on valuable opportunities.

30
DATA ANALYTICS LIFE CYCLE

31
DATA ANALYTICS LIFE CYCLE…
PHASE 1— DISCOVERY

• In Phase 1, the team learns the business domain, including relevant history such as
whether the organization or business unit has attempted similar projects in the past from
which they can learn.
• The team assesses the resources available to support the project in terms of people,
technology, time, and data.
• Important activities in this phase include framing the business problem as an analytics
challenge that can be addressed in subsequent phases and formulating initial hypotheses
(IHs) to test and begin learning the data.

32
DATA ANALYTICS LIFE CYCLE…
PHASE 2 — DATA PREPARATION

• Phase 2—Data preparation: requires the presence of an analytic sandbox, in which the
team can work with data and perform analytics for the duration of the project.
• The team needs to execute extract, load, and transform (ELT) or extract, transform and
load (ETL) to get data into the sandbox.
• The ELT and ETL are sometimes abbreviated as ETLT.
• Data should be transformed in the ETLT process so the team can work with it and analyze
it.

33
DATA ANALYTICS LIFE CYCLE…
PHASE 3—MODEL PLANNING

• Phase 3—Model planning: where the team determines the methods, techniques, and
workflow it intends to follow for the subsequent model building phase.
• The team explores the data to learn about the relationships between variables and
subsequently selects key variables and the most suitable models.

34
DATA ANALYTICS LIFE CYCLE…
PHASE 4 — MODEL BUILDING

• Phase 4—Model building: the team develops datasets for testing, training, and
production purposes.
• In addition, in this phase the team builds and executes models based on the work done
in the model planning phase.
• The team also considers whether its existing tools will suffice for running the models,
or if it will need a more robust environment for executing models and workflows

35
DATA ANALYTICS LIFE CYCLE…
PHASE 4 — MODEL BUILDING

36
DATA ANALYTICS LIFE CYCLE…
PHASE 5 — COMMUNICATE RESULTS AND FINDINGS

• Phase 5—Communicate results: the team, in collaboration with major stakeholders,


determines if the results of the project are a success or a failure based on the criteria
developed in Phase 1.
• The team should identify key findings, quantify the business value, and develop a
narrative to summarize and convey findings to stakeholders.

37
DATA ANALYTICS LIFE CYCLE…
PHASE 5 — COMMUNICATE RESULTS AND FINDINGS

38
DATA ANALYTICS LIFE CYCLE…
PHASE 6 — OPERATIONALIZE

• In this phase, the team delivers final reports, briefings, code, and technical
documents.
• In addition, the team may run a pilot project to implement the models in a
production environment.
• In the final phase, the team communicates the benefits of the project more broadly
and sets up a pilot project to deploy the work in a controlled way before
broadening the work to a full enterprise or ecosystem of users.
• In model building phase, the team scored the model in the analytics sandbox.

39
DATA ANALYTICS LIFE CYCLE…
PHASE 6 — OPERATIONALIZE

40
APPLICATIONS OF DATA SCIENCE

• Image recognition and speech recognition:


Data science is currently using for Image and speech recognition. When you upload an image on Facebook and start
getting the suggestion to tag to your friends. This automatic tagging suggestion uses image recognition algorithm,
which is part of data science.
When you say something using, "Ok Google, Siri, Cortana", etc., and these devices respond as per voice control, so
this is possible with speech recognition algorithm.
• Gaming world:
In the gaming world, the use of Machine learning algorithms is increasing day by day. EA Sports, Sony, Nintendo,
are widely using data science for enhancing user experience.
• Internet search:
When we want to search for something on the internet, then we use different types of search engines such as Google,
Yahoo, Bing, Ask, etc. All these search engines use the data science technology to make the search experience better,
and you can get a search result with a fraction of seconds.

41
APPLICATIONS OF DATA SCIENCE…

• Transport:
Transport industries also using data science technology to create self-driving cars. With self-driving cars, it will be easy to
reduce the number of road accidents.
• Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is being used for tumor detection, drug
discovery, medical image analysis, virtual medical bots, etc.
• Recommendation systems:
Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data science technology for making a better
user experience with personalized recommendations. Such as, when you search for something on Amazon, and you started
getting suggestions for similar products, so this is because of data science technology.
• Risk detection:
Finance industries always had an issue of fraud and risk of losses, but with the help of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and any type of losses with an increase in
customer satisfaction.

42
ROLE OF AI IN LAW (CASE STUDY)

The current Artificial Intelligence applications in the industry can be categorized into six main
parts:
1. DUE DILIGENCE: Lawyers use Artificial Intelligence tools to perform due diligence and
uncover background information. In light of the current scenario, developers have opted to
integrate a slew of new features, including agreement review, legal inquiry, and electronic media
for this section of the industry.
2. PROGNOSTICATION TECHNOLOGY: Artificial Intelligence (AI) aids in the generation of
outcomes for legal investigations and agreement evaluations. This characteristic of AI
programming appears to be extremely beneficial to legal firms and industries.

43
ROLE OF AI IN LAW (CASE STUDY)…

3. LEGAL MECHANISM: Lawyers can obtain information points from prior or past instances using
Artificial Intelligence technologies. They can also utilize this data to keep track of the judge’s instructions
and forecasts. This technology is likely to become increasingly important on a global scale in the near
future.

4. DOCUMENTING MECHANISM: Different types of software arrangements are used in the legal
industry to develop papers that aid in the collection of data and information. In the law firm industry,
there are numerous documents that are useful. As a result, it is really beneficial.

5. INTELLECTUAL POSSESSION: Artificial intelligence algorithms demonstrate lawyers how to


examine massive IP files and extract meaning from a variety of attractive texts.

6. ELECTRONIC RECEIPT: Lawyers used to make their own receipts for a long time. The billings of
lawyers were turned electronic after AI software development technology was applied in these businesses.

44
ROLE OF AI IN PHARMACY (CASE STUDY)

• As AI becomes more prevalent in pharmacy practice, it is important to consider the legal implications of its use.
• There are several key laws and regulations governing pharmacy practice, and AI usage must comply with these
laws to ensure patient safety and privacy.
1. Health Insurance Portability and Accountability Act (HIPAA)
Pharmacies that utilize AI systems to collect, store, or transmit patient health information must comply with federal
HIPAA regulations by ensuring that their systems are secure and that patient information is protected from
unauthorized access. Patient consent also may be required before using AI to collect or analyze health information,
depending on the circumstances.
2. State Pharmacy Practice Laws
Pharmacies should check with their state board of pharmacy to ensure that their use of AI complies with all
applicable state laws. Although state regulators may ultimately hold the pharmacist and pharmacy responsible for
any errors or omissions, AI may be perceived as novel in many states, necessitating communication with a state
board of pharmacy. 45
ROLE OF AI IN PHARMACY (CASE STUDY)…

3. Liability
Pharmacies that use AI to manage medication inventory or provide medication management
services may be liable for any errors or omissions that occur as a result of their use of AI.
Pharmacies must ensure that their AI systems are accurate and reliable, and that they are used in
accordance with all applicable laws and regulations.
• Pharmacies should also ensure that their pharmacists are trained in the use of AI and that they
understand how to comply with all applicable laws and regulations and to identify and correct
errors that may occur as a result of AI use. In addition, pharmacies should consult with legal
counsel to ensure that their use of AI complies with all applicable state laws and regulations.

46
Self-Assessment Questions
1. Describe the roles of Data Science.

2. Draw the data science life cycle diagram and explain. Write down the
steps involved in data science life cycle.

3. List any FOUR applications of Data Science and explain any ONE
application in detail.

4. Illustrate the structured and un-structured data types.

5. Describe Data Analytics Life Cycle with neat diagram.


THANK YOU

You might also like