Data Science Intro Session-18 & 19
Data Science Intro Session-18 & 19
Data science
CO – 4
Session 18 &19
1
AIM
To familiarize students with the basic concept of Data Science: Types of Data, Classification and Analytics
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
2
WHAT IS DATA SCIENCE?
• Data science is a deep study of the massive amount of data, which involves extracting meaningful
insights from raw, structured, and unstructured data that is processed using the scientific method,
different technologies, and algorithms.
• It is a multidisciplinary field that uses tools and techniques to manipulate the data so that we can
find something new and meaningful.
• Data science uses the most powerful hardware, programming systems, and most efficient
algorithms to solve the data related problems. It is the future of artificial intelligence.
• In short, we can say that data science is all about:
• Asking the correct questions and analyzing the raw data.
• Modeling the data using various complex and efficient algorithms.
• Visualizing the data to get a better perspective.
• Understanding the data to make better decisions and finding the final result.
3
WHAT IS DATA SCIENCE?...
4
WHY DATA SCIENCE?
• With the right tools, technologies, algorithms, we can use data and convert it into a
distinct business advantage
• Data Science help to detect fraud using advanced machine learning algorithms
• It helps to prevent any significant monetary losses
• Allows to build intelligence ability in machines
• It enables to take better and faster decisions
• It helps to recommend the right product to the right customer to enhance your business
5
DATA SCIENCE COMPONENTS
6
DATA SCIENCE COMPONENTS…
1. Statistics: Statistics is one of the most important components of data science. Statistics is a way
to collect and analyze the numerical data in a large amount and finding meaningful insights from it.
2. Domain Expertise: In data science, domain expertise binds data science together. Domain
expertise means specialized knowledge or skills of a particular area. In data science, there are
various areas for which we need domain experts.
3. Data engineering: Data engineering is a part of data science, which involves acquiring, storing,
retrieving, and transforming the data. Data engineering also includes metadata (data about data) to
the data.
4. Visualization: Data visualization is meant by representing data in a visual context so that people
can easily understand the significance of data. Data visualization makes it easy to access the huge
amount of data in visuals.
7
DATA SCIENCE COMPONENTS…
8
DATA SCIENCE COMPONENTS…
9
TOOLS FOR DATA SCIENCE
10
DATA SCIENCE LIFE CYCLE
11
DATA SCIENCE LIFE CYCLE…
1. Discovery: The first phase is discovery, which involves asking the right questions. When we
start any data science project, we need to determine what are the basic requirements, priorities,
and project budget.
In this phase, we need to determine all the requirements of the project such as the number of
people, technology, time, data, an end goal, and then we can frame the business problem on first
hypothesis level.
2. Data preparation: In this phase, we need to perform the following tasks:
• Data cleaning
• Data Reduction
• Data integration
• Data transformation,
After performing all the above tasks, we can easily use this data for our further processes.
12
DATA SCIENCE LIFE CYCLE…
3. Model Planning: In this phase, we need to determine the various methods and techniques
to establish the relation between input variables.
We will apply Exploratory data analytics (EDA) by using various statistical formula and
visualization tools to understand the relations between variable and to see what data can
inform us.
Common tools used for model planning are:
• SQL Analysis Services
• R
• SAS
• Python
13
DATA SCIENCE LIFE CYCLE…
14
DATA SCIENCE LIFE CYCLE…
5. Operationalize: In this phase, we will deliver the final reports of the project, along with
briefings, code, and technical documents.
This phase provides a clear overview of complete project performance and other
components on a small scale before the full deployment.
6. Communicate results: In this phase, we will check if we reach the goal, which we have
set on the initial phase.
We will communicate the findings and final result with the business team.
15
TYPES OF DATA
• In data science and big data different types of data will be used, and each of them
tends to require different tools and techniques.
• The main categories of data are:
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
16
Types of data…
Structured data is data that depends on a data model and resides in a fixed field within a record.
17
Unstructured data is data that isn’t easy to fit into a data model because the
content is context-specific or varying.
18
Machine-generated data is information that’s automatically created by a
computer, process, application, or other machine without human intervention.
19
Graph-based data is a natural way to represent social networks, and its structure
allows to calculate specific metrics such as the influence of a person and the shortest
path between two people.
20
AUDIO, IMAGE AND VIDEO
• Audio, image, and video are data types that pose specific challenges to a data scientist.
• Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for computers.
• MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to
approximately 7 TB per game for the purpose of live, in-game analytics.
• High-speed cameras at stadiums will capture ball and athlete movements to calculate in real time, for example,
the path taken by a defender relative to two baselines.
• Recently a company called Deep Mind succeeded at creating an algorithm that’s capable of learning how to play
video games. This algorithm takes the video screen as input and learns to interpret everything via a complex
process of deep learning.
• It’s a remarkable feat that prompted Google to buy the company for their own Artificial Intelligence (AI)
development plans.
• The learning algorithm takes in data as it’s produced by the computer game; it’s streaming data.
21
STREAMING DATA
• While streaming data can take almost any of the previous forms, it has an
extra property.
• The data flows into the system when an event happens instead of being
loaded into a data store in a batch.
• Although this isn’t really a different type of data, we treat it here as such
because you need to adapt process to deal with this type of information.
• Examples are the “What’s trending” on Twitter, live sporting or music events,
and the stock market.
22
DATA ANALYTICS
• At its core, data analytics is about answering questions and making decisions.
• Just as there are different types of questions, there are also different types of data
analytics depending on what we are going to accomplish.
• Data analytics is the science of analyzing raw data to make conclusions about that
information.
• The techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
23
FOUR PRIMARY TYPES OF DATA ANALYTICS
24
TYPES OF DATA ANALYTICS…
25
FOUR PRIMARY TYPES OF DATA ANALYTICS…
26
1. DESCRIPTIVE ANALYTICS
• Descriptive analytics is typically the starting point in business intelligence. It uses data aggregation and
data mining to collect and organize historical data, producing visualizations such as line graphs, bar
charts, pie charts.
• Descriptive analytics presents a clear picture of what has happened in the past, such as statistical
modeling does, and it stops there — it doesn’t make interpretations or advise on future actions.
• Descriptive analytics juggles raw data from multiple data sources to give valuable insights into the past.
However, these findings simply signal that something is wrong or right, without explaining why.
• For example, Data analysts are working with an e-commerce marketing team to review sales data to
identify the sales trends and patterns, you will see an increase or decrease in sales from last year,
specifically in what region and by what percentage.
• Descriptive analytics can benefit decision-makers from every department in a company, from finance to
operations.
27
2. DIAGNOSTIC ANALYTICS
• Once we know what happened, we’ll want to know why it happened. That’s where
diagnostic analytics comes in. Understanding why a trend is developing or why a
problem occurred will make our business intelligence actionable.
• It prevents our team from making inaccurate guesses, particularly related to confusing
correlation and causality.
• Because diagnostic analytics is used to identify the origin of business issues and find
appropriate solutions to prevent them from happening in the future, it is also called root
cause analysis.
28
3. PREDICTIVE ANALYTICS
• When you know what happened in the past and understand why it happened, you can then
begin to predict what is likely to occur in the future based on that information.
• Predictive analytics takes the investigation a step further, using statistics, computational
modeling, and machine learning to determine the probability of various outcomes.
• Predictive analytics is especially powerful for teams because it allows decision-makers to
be more confident about the future.
29
4. PRESCRIPTIVE ANALYTICS
• Prescriptive analytics is where the action is. This type of analytics tells teams what they need to
do based on the predictions made.
• It’s the most complex type, which is why less than 3% of companies are using it in their
business.
• Prescriptive analytics anticipates what, when, and why an event or trend might happen. It tells us
what actions have the highest potential for the best outcome.
• It allows to fix problems, improve performance, and jump on valuable opportunities.
30
DATA ANALYTICS LIFE CYCLE
31
DATA ANALYTICS LIFE CYCLE…
PHASE 1— DISCOVERY
• In Phase 1, the team learns the business domain, including relevant history such as
whether the organization or business unit has attempted similar projects in the past from
which they can learn.
• The team assesses the resources available to support the project in terms of people,
technology, time, and data.
• Important activities in this phase include framing the business problem as an analytics
challenge that can be addressed in subsequent phases and formulating initial hypotheses
(IHs) to test and begin learning the data.
32
DATA ANALYTICS LIFE CYCLE…
PHASE 2 — DATA PREPARATION
• Phase 2—Data preparation: requires the presence of an analytic sandbox, in which the
team can work with data and perform analytics for the duration of the project.
• The team needs to execute extract, load, and transform (ELT) or extract, transform and
load (ETL) to get data into the sandbox.
• The ELT and ETL are sometimes abbreviated as ETLT.
• Data should be transformed in the ETLT process so the team can work with it and analyze
it.
33
DATA ANALYTICS LIFE CYCLE…
PHASE 3—MODEL PLANNING
• Phase 3—Model planning: where the team determines the methods, techniques, and
workflow it intends to follow for the subsequent model building phase.
• The team explores the data to learn about the relationships between variables and
subsequently selects key variables and the most suitable models.
34
DATA ANALYTICS LIFE CYCLE…
PHASE 4 — MODEL BUILDING
• Phase 4—Model building: the team develops datasets for testing, training, and
production purposes.
• In addition, in this phase the team builds and executes models based on the work done
in the model planning phase.
• The team also considers whether its existing tools will suffice for running the models,
or if it will need a more robust environment for executing models and workflows
35
DATA ANALYTICS LIFE CYCLE…
PHASE 4 — MODEL BUILDING
36
DATA ANALYTICS LIFE CYCLE…
PHASE 5 — COMMUNICATE RESULTS AND FINDINGS
37
DATA ANALYTICS LIFE CYCLE…
PHASE 5 — COMMUNICATE RESULTS AND FINDINGS
38
DATA ANALYTICS LIFE CYCLE…
PHASE 6 — OPERATIONALIZE
• In this phase, the team delivers final reports, briefings, code, and technical
documents.
• In addition, the team may run a pilot project to implement the models in a
production environment.
• In the final phase, the team communicates the benefits of the project more broadly
and sets up a pilot project to deploy the work in a controlled way before
broadening the work to a full enterprise or ecosystem of users.
• In model building phase, the team scored the model in the analytics sandbox.
39
DATA ANALYTICS LIFE CYCLE…
PHASE 6 — OPERATIONALIZE
40
APPLICATIONS OF DATA SCIENCE
41
APPLICATIONS OF DATA SCIENCE…
• Transport:
Transport industries also using data science technology to create self-driving cars. With self-driving cars, it will be easy to
reduce the number of road accidents.
• Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is being used for tumor detection, drug
discovery, medical image analysis, virtual medical bots, etc.
• Recommendation systems:
Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data science technology for making a better
user experience with personalized recommendations. Such as, when you search for something on Amazon, and you started
getting suggestions for similar products, so this is because of data science technology.
• Risk detection:
Finance industries always had an issue of fraud and risk of losses, but with the help of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and any type of losses with an increase in
customer satisfaction.
42
ROLE OF AI IN LAW (CASE STUDY)
The current Artificial Intelligence applications in the industry can be categorized into six main
parts:
1. DUE DILIGENCE: Lawyers use Artificial Intelligence tools to perform due diligence and
uncover background information. In light of the current scenario, developers have opted to
integrate a slew of new features, including agreement review, legal inquiry, and electronic media
for this section of the industry.
2. PROGNOSTICATION TECHNOLOGY: Artificial Intelligence (AI) aids in the generation of
outcomes for legal investigations and agreement evaluations. This characteristic of AI
programming appears to be extremely beneficial to legal firms and industries.
43
ROLE OF AI IN LAW (CASE STUDY)…
3. LEGAL MECHANISM: Lawyers can obtain information points from prior or past instances using
Artificial Intelligence technologies. They can also utilize this data to keep track of the judge’s instructions
and forecasts. This technology is likely to become increasingly important on a global scale in the near
future.
4. DOCUMENTING MECHANISM: Different types of software arrangements are used in the legal
industry to develop papers that aid in the collection of data and information. In the law firm industry,
there are numerous documents that are useful. As a result, it is really beneficial.
6. ELECTRONIC RECEIPT: Lawyers used to make their own receipts for a long time. The billings of
lawyers were turned electronic after AI software development technology was applied in these businesses.
44
ROLE OF AI IN PHARMACY (CASE STUDY)
• As AI becomes more prevalent in pharmacy practice, it is important to consider the legal implications of its use.
• There are several key laws and regulations governing pharmacy practice, and AI usage must comply with these
laws to ensure patient safety and privacy.
1. Health Insurance Portability and Accountability Act (HIPAA)
Pharmacies that utilize AI systems to collect, store, or transmit patient health information must comply with federal
HIPAA regulations by ensuring that their systems are secure and that patient information is protected from
unauthorized access. Patient consent also may be required before using AI to collect or analyze health information,
depending on the circumstances.
2. State Pharmacy Practice Laws
Pharmacies should check with their state board of pharmacy to ensure that their use of AI complies with all
applicable state laws. Although state regulators may ultimately hold the pharmacist and pharmacy responsible for
any errors or omissions, AI may be perceived as novel in many states, necessitating communication with a state
board of pharmacy. 45
ROLE OF AI IN PHARMACY (CASE STUDY)…
3. Liability
Pharmacies that use AI to manage medication inventory or provide medication management
services may be liable for any errors or omissions that occur as a result of their use of AI.
Pharmacies must ensure that their AI systems are accurate and reliable, and that they are used in
accordance with all applicable laws and regulations.
• Pharmacies should also ensure that their pharmacists are trained in the use of AI and that they
understand how to comply with all applicable laws and regulations and to identify and correct
errors that may occur as a result of AI use. In addition, pharmacies should consult with legal
counsel to ensure that their use of AI complies with all applicable state laws and regulations.
46
Self-Assessment Questions
1. Describe the roles of Data Science.
2. Draw the data science life cycle diagram and explain. Write down the
steps involved in data science life cycle.
3. List any FOUR applications of Data Science and explain any ONE
application in detail.