0% found this document useful (0 votes)

23 views

DA(Unit 1)

DA notes aktu

Uploaded by

Keshav Singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

DA(Unit 1)

DA notes aktu

Uploaded by

Keshav Singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

DATA ANALYTICS (KCS-051)

Departmental Elective I - 5th Semester

UNIT-1

JAYATI BHARDWAJ
Assistant Professor, CSE
Course Outcomes
Syllabus as per University
Syllabus as per University
Data Analytics
• Data analytics (DA) is the process of examining data sets in order to find trends and draw
conclusions about the information they contain.
• Data analytics is done with the aid of specialized systems and software.
• Data analytics predominantly refers to an assortment of applications, from basic business
intelligence (BI), reporting and online analytical processing (OLAP) to various forms of advanced
analytics.
• It's similar in nature to business analytics.
• Data analytics initiatives can help businesses increase revenue, improve operational efficiency,
optimize marketing campaigns and bolster customer service efforts. Analytics also enable
organizations to respond quickly to emerging market trends
Why Data Analytics?
Data Analytics Tools
Data Collection
• In the process of big data analysis, “Data collection” is the initial step
before starting to analyze the patterns or useful information in data.
• The data which is to be analyzed must be collected from different
valid sources.
• The data which is collected is known as raw data which is not useful
as it is but on cleaning and utilizing that data for analysis further forms
information, the information obtained is known as “knowledge”.
• The main goal of data collection is to collect information-rich data.
Data could be…
1. RDBMS: A relational database is a collection of tables, each of which is assigned a unique
name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of
tuples (records or rows). Each tuple in a relational table represents an object identified by a unique
key and described by a set of attribute values.
Data could be…
2. Data Warehouses: A data warehouse is a repository of information collected from
multiple sources, stored under a unified schema, and that usually resides at a single site. Data
warehouses are constructed via a process of data cleaning, data integration, data transformation, data
loading, and periodic data refreshing.
Data could be…
3. Transactional Databases: In general, a transactional database consists of a file where each
record represents a transaction. A transaction typically includes a unique transaction identity number (trans
ID) and a list of the items making up the transaction (such as items purchased in a store).

4. Object-Relational Databases: Object-relational databases are constructed based on an

object-relational data model. This model extends the relational model by providing a rich data type for
handling complex objects and object orientation. the object-relational data model inherits the essential
concepts of object-oriented databases, where, in general terms, each entity is considered as an object.
Data could be…
5. Temporal Databases, Sequence Databases, and Time-Series Databases
A temporal database typically stores relational data that include time-related attributes. These
attributes may involve several timestamps, each having different semantics.

A sequence database stores sequences of ordered events, with or without a concrete notion of time.
Examples include customer shopping sequences, Web click streams, and biological sequences.

A time-series database stores sequences of values or events obtained over repeated measurements of
time (e.g., hourly, daily, weekly). Examples include data collected from the stock exchange,
inventory control, and the observation of natural phenomena (like temperature and wind).
Data could be…
6. Spatial Databases and Spatiotemporal Databases
Spatial databases contain spatial-related information. Examples include geographic(map)
databases, very large-scale integration (VLSI) or computed-aided design databases, and
medical and satellite image databases.

A spatial database that stores spatial objects that change with time is called a
Spatiotemporal database, from which interesting information can be mined. For example,
we may be able to group the trends of moving objects and identify some strangely moving
vehicles, or distinguish a bioterrorist attack from a normal outbreak of the flu based on the
geographic spread of a disease with time.
Data could be…
7. Text Databases and Multimedia Databases
Text databases are databases that contain word descriptions for objects. These word descriptions are
usually not simple keywords but rather long sentences or paragraphs, such as product specifications,
error or bug reports, warning messages, summary reports, notes, or other documents.

Multimedia databases store image, audio, and video data. They are used in applications such as
picture content-based retrieval, voice-mail systems, video-on-demand systems, the World Wide Web,
and speech-based user interfaces that recognize spoken commands. Multimedia databases must
support large objects, because data objects such as video can require gigabytes of storage.
Data could be…
8. Heterogeneous Databases and Legacy Databases
A heterogeneous database consists of a set of interconnected, autonomous component databases. The
components communicate in order to exchange information and answer queries. Objects in one component
database may differ greatly from objects in other component databases, making it difficult to assimilate their
semantics into the overall heterogeneous database.

A legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as
relational or object-oriented databases, hierarchical databases, network databases, spreadsheets, multimedia
databases, or file systems
Data could be…
9. Data Streams
Many applications involve the generation and analysis of a new kind of data, called stream data, where
data flow in and out of an observation platform (or window) dynamically.

Such data streams have the following unique features: huge or possibly infinite volume, dynamically
changing, flowing in and out in a fixed order, allowing only one or a small number of scans, and
demanding fast (often real-time) response time.

Typical examples of data streams include various kinds of scientific and engineering data, time-series data,
and data produced in other dynamic environments, such as power supply, network traffic, stock exchange,
telecommunications, Web click streams, video surveillance, and weather or environment monitoring.
Data could be…
10. The World Wide Web
The World Wide Web and its associated distributed information services, where data objects are linked together
to facilitate interactive access. Users seeking information of interest traverse from one object via links to
another. Such systems provide ample opportunities and challenges for data mining. For example, understanding
user access patterns will not only help improve system design (by providing efficient access between highly
correlated objects), but also leads to better marketing decisions (e.g., by placing advertisements in frequently
visited documents, or by providing better customer/user classification and behavior analysis).
Data Collection
• Most of the data collected are of two types-
• Qualitative data: Data that is represented either in a verbal or narrative format is qualitative

data. A simple way to look at qualitative data is to think of qualitative data in the form of words. These
types of data are collected through focus groups surveys, interviews, opened ended questionnaires,
observations.

• Quantitative data: Quantitative data is data that is expressed in numerical terms, in which the
numeric values could be large or small. Numerical values may correspond to a specific category or
label. These types of data are collected through Surveys and questionnaires, Analytics tools,
Environmental sensors, Manipulation of pre-existing quantitative data.
Nominal Data

• These are the set of values that don’t possess a natural ordering.
• Ex.-The color of a smart phone can be considered as a nominal data type as we can’t compare one
color with others. It is not possible to state that ‘Red’ is greater than ‘Blue’.
• The gender of a person is another one where we can’t differentiate between male, female, or others.
• Mobile phone categories whether it is midrange, budget segment, or premium smart phone is also
nominal data type.
• Nominal data types in statistics are not quantifiable and cannot be measured through numerical
units. Nominal types of statistical data are valuable while conducting qualitative research as it
extends freedom of opinion to subjects
Ordinal Data

• These types of values have a natural ordering while maintaining their class of values.
• If we consider the size of a clothing brand then we can easily sort them according to their name tag
in the order of small < medium < large.
• The grading system while marking candidates in a test can also be considered as an ordinal data
type where A+ is definitely better than B grade.
• These categories help us deciding which encoding strategy can be applied to which type of data.
• Data encoding for Qualitative data is important because machine learning models can’t handle
these values directly and needed to be converted to numerical types as the models are mathematical
in nature.
• For nominal data type where there is no comparison among the categories, one-hot encoding can
be applied which is similar to binary coding considering there are in less number and for the
ordinal data type, label encoding can be applied which is a form of integer encoding.
Discrete Data
• The numerical values which fall under are integers or whole numbers are placed under this
category.
• The number of speakers in the phone, cameras, cores in the processor, the number of sims
supported all these are some of the examples of the discrete data type.
• Discrete data types in statistics cannot be measured – it can only be counted as the objects included
in discrete data have a fixed value.
• The value can be represented in decimal, but it has to be whole.
• Discrete data is often identified through charts, including bar charts, pie charts, and tally charts.
Continuous Data
• The fractional numbers are considered as continuous values.
• These can take the form of the operating frequency of the processors, the android version of the
phone, wifi frequency, temperature of the cores, and so on.
• Unlike discrete data types of data in research, with a whole and fixed value, continuous data can
break down into smaller pieces and can take any value.
• For example, volatile values such as temperature and the weight of a human can be included in the
continuous value.
• Continuous types of statistical data are represented using a graph that easily reflects value
fluctuation by the highs and lows of the line through a certain period of time.
Data Collection
Primary data:
The data which is Raw, original, and extracted directly from the official sources is known as primary
data. This type of data is collected directly by performing techniques such as questionnaires,
interviews, and surveys. The data collected must be according to the demand and requirements of the
target audience on which analysis is performed.
Few methods of collecting primary data:
1.Interview method
2.Survey method
3.Observation method
4.Experimental method: CRD- Completely Randomized design
RBD- Randomized Block Design
LSD – Latin Square Design
FD- Factorial design
Secondary data:
Secondary data is the data which has already been collected and reused again for some valid purpose. This type
of data is previously recorded from primary data and it has two types of sources named internal source and
external source.

1. Internal source: These types of data can easily be found within the organization such as market record, a
sales record, transactions, customer data, accounting resources, etc. The cost and time consumption is less in
obtaining internal sources.

2. External source: The data which can’t be found at internal organizations and can be gained through external
third party resources is external source data. The cost and time consumption is more because this contains a
huge amount of data. Examples of external sources are Government publications, news publications, Registrar
General of India, planning commission, international labor bureau, syndicate services, and other non-
governmental publications.
Secondary data:

3. Other sources:

• Sensors data: With the advancement of IoT devices, the sensors of these devices collect data
which can be used for sensor data analytics to track the performance and usage of products.

• Satellites data: Satellites collect a lot of images and data in terabytes on daily basis through
surveillance cameras which can be used to collect useful information.

• Web traffic: Due to fast and cheap internet facilities many formats of data which is uploaded by
users on different platforms can be predicted and collected with their permission for data analysis.
The search engines also provide their data through keywords and queries searched mostly.
Types of Data
Types of Data
Types of Data
Characteristics of Data
• Data quality is crucial – it assesses whether information can serve its purpose in a
particular context (such as data analysis).
• So, to determine the quality of a given set of information, there are data quality
characteristics of which one should be aware.
• There are five traits namely:
• Accuracy
• Completeness
• Reliability
• Relevance
• Timeliness
Characteristics of Data
Characteristics of Data
• Accuracy: This data quality characteristic means that information is correct. Accuracy is a crucial data

quality characteristic because inaccurate information can cause significant problems with severe consequences.

• Completeness: “Completeness” refers to how comprehensive the information is. When looking at data

completeness, think about whether all of the data you need is available. Ex- You might need a customer’s first
and last name, but the middle initial may be optional.

• Reliability: Reliability means that a piece of information doesn’t contradict another piece of information

in a different source or system. Ex.- if a patient’s birthday is January 1, 1970 in one system, yet it’s June 13,
1973 in another, the information is unreliable. Reliability is a vital data quality characteristic. When pieces of
information contradict themselves, you can’t trust the data and this could result in damages.
Characteristics of Data
• Relevance: When you’re looking at data quality characteristics, relevance comes into play because
there has to be a good reason as to why you’re collecting this information in the first place. You
must consider whether you really need this information, or whether you’re collecting it just for the
sake of it. If you’re gathering irrelevant information, you’re wasting time as well as money. Your
analyses won’t be as valuable.

• Timeliness: Timeliness, as the name implies, refers to how up to date information is. If it was
gathered in the past hour, then it’s timely – unless new information has come in that renders
previous information useless. Timeliness is an important data quality characteristic – out-of-date
information costs companies time and money
Introduction to Big Data
• Big data is a term that describes large, hard-to-manage volumes of data – both
structured and unstructured – that inundate businesses on a day-to-day basis.
• It is the data that contains greater variety, arriving in increasing volumes and with
more velocity. This is also known as the three Vs.
• Big data is larger, more complex data sets, especially from new data sources.
These data sets are so voluminous that traditional data processing software just
can’t manage them.
• But it’s not just the type or amount of data that’s important, it’s what organizations
do with the data that matters. Big data can be analyzed for insights that improve
decisions and give confidence for making strategic business moves.
Some Facts & Figures
Insights

Sources: People, machine, organization: Ubiquitous computing. More people carrying data-generating
devices (Mobile phones with facebook, GPS, Cameras, etc.)
Introduction to Big Data
• 3 ‘V’s of Big Data – Variety, Velocity, and Volume.
a) Variety: Variety of Big Data refers to structured, unstructured, and semi-structured data
that is gathered from multiple sources. While in the past, data could only be collected
from spreadsheets and databases, today data comes in an array of forms such as emails,
PDFs, photos, videos, audios and so much more. Variety is one of the important
characteristics of big data.

b) Velocity: Velocity essentially refers to the speed at which data is being created in real-
time. In a broader prospect, it comprises the rate of change, linking of incoming data
sets at varying speeds, and activity bursts.
Introduction to Big Data
c) Volume: It indicates huge ‘volumes’ of data that is being generated on a daily basis from various sources like
social media platforms, business processes, machines, networks, human interactions, etc. Such a large amount of data
is stored in data warehouses. Thus comes to the end of characteristics of big data.

2 more Vs added to Big data:

d) Veracity: It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get
messy and quality and accuracy are difficult to control. Big Data is also variable because of the multitude of data
dimensions resulting from multiple disparate data types and sources. Example: Data in bulk could create confusion
whereas less amount of data could convey half or Incomplete Information.

e) Value: The bulk of Data having no Value is of no good to the company, unless you turn it into something useful.
Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information.
Benefits of Big Data
• Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost advantages to
business when large amounts of data are to be stored and these tools also help in identifying more efficient ways of
doing business.

• Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of
data which helps businesses analyzing data immediately and make quick decisions based on the learning.

• Understand the market conditions: By analyzing big data you can get a better understanding of current market
conditions. For example, by analyzing customers’ purchasing behaviors, a company can find out the products that
are sold the most and produce products according to this trend. By this, it can get ahead of its competitors.

• Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get feedback about who is
saying what about your company. If you want to monitor and improve the online presence of your business, then, big
data tools can help in all this.
Benefits of Big Data
• Using Big Data Analytics to Boost Customer Acquisition and Retention: The customer is the most

important asset any business depends on. If a business is slow to learn what customers are looking for, then it

is very easy to begin offering poor quality products. In the end, loss of clientele will result, and this creates an

adverse overall effect on business success. The use of big data allows businesses to observe various customer

related patterns and trends. Observing customer behavior is important to trigger loyalty.

• Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights: Big data analytics

can help change all business operations. This includes the ability to match customer expectation, changing

company’s product line and of course ensuring that the marketing campaigns are powerful.

• Big Data Analytics As a Driver of Innovations and Product Development Another huge advantage of big data

is the ability to help companies innovate and redevelop their products.

Benefits of Big Data
Challenges
• Need For Synchronization Across Disparate Data Sources: data sets are becoming bigger and
more diverse; if overlooked leads to gaps.
• Acute Shortage Of Professionals Who Understand Big Data Analysis: shortage of
professionals who understand Big Data analysis
• Getting Meaningful Insights Through The Use Of Big Data Analytics: to gain important
insights from Big Data analytics, and also only the relevant department has access to this information.
• Getting Voluminous Data Into The Big Data Platform: need to handle a large amount of data on
daily basis.
• Uncertainty Of Data Management Landscape: to find out which technology will be best suited to them
without the introduction of new problems and potential risks.
• Data Storage And Quality: storage of the massive amount of data is becoming a real challenge; to combine
unstructured and inconsistent data from diverse sources, encounters errors. Missing data, inconsistent data, logic conflicts, and
duplicates data all result in data quality challenges.
• Security And Privacy Of Data: high risk of exposure of the data over disparate sources, making it vulnerable
Big Data Analytics
• Big data analytics is the process of collecting, examining, and analyzing large amounts of data to
discover market trends, insights, and patterns that can help companies make better business
decisions.

• This information is available quickly and efficiently so that companies can be agile in crafting
plans to maintain their competitive advantage.

• Technologies such as business intelligence (BI) tools and systems help organizations take the
unstructured and structured data from multiple sources.

• Users (typically employees) input queries into these tools to understand business operations and
performance.
Big Data Analytics
• Big data analytics is important because it helps companies leverage their data to identify
opportunities for improvement and optimization.

• Across different business segments, increasing efficiency leads to overall more intelligent
operations, higher profits, and satisfied customers.

• Big data analytics helps companies reduce costs and develop better, customer-centric products and
services.

• Data analytics helps provide insights that improve the way our society functions. In health care, big
data analytics not only keeps track of and analyzes individual records, but plays a critical role in
measuring COVID-19 outcomes on a global scale. It informs health ministries within each nation’s
government on how to proceed with vaccinations and devises solutions for mitigating pandemic
outbreaks in the future.
Big Data Analytics
Why?
To make the right decisions for your business to succeed, you need the right data. So, it’s
important to have a data analytics strategy in place.
Such plans can help organizations:
• boost revenue
• cut costs
• improve efficiencies
• enhance marketing efforts
• strengthen customer focus and customer service
• respond quickly and effectively to market events and industry trends
• reduce risk
• gain a competitive edge
Harnessing Big Data
• OLTP(Online
Transaction
Processing)- DBMS
• OLAP(Online
Analytical Processing)-
Datawarehouse
• RTAP(Real Time
Analytical Processing)-
Big Data Architecture &
Technology
Traditional Model
Traditional Data Model

Traditional Data Warehouses are divided into a three-tier structure as follows:

• The bottom tier contains the Data Warehouse server, with data pulled from many different sources
integrated into a single repository.
• The middle tier contains OLAP servers, which make data more accessible for the types of queries
that will be used on it.
• The top tier houses the front-end BI tools used for querying, reporting, and analytics.
• Traditionally, ETL has been used with batch processing (data on the rest) in data warehouse
environments
Traditional Data Model

• To integrate data across mixed application environments, you need to get data from one data environment
(source) to another data environment (destination). Extract, Transform and Load (ETL) technologies have
been used to accomplish this in traditional data warehouse environments.
• ETL tools combine three important functions required to get data from one data environment and put it into
another data environment.
• Extract: Read data from the source database.
• Transform: Convert the format of the extracted data so that it conforms to the requirements of the target
database. (Transformation is done by using rules or merging data with other data.)
• Load: Write data to the target database
• Data warehouses provide business users with a way to consolidate information across disparate sources to
analyze and report on data relevant to their specific business focus. ETL tools are used to transform the data
into the format required by the data warehouse. The transformation is actually done in an intermediate
location before the data is loaded into the data warehouse.
• Many software vendor including Oracle, Microsoft, IBM, Informatica, Talend, and Pentaho provided
Traditional ETL software tools.
Big Data Model
Big Data Model
• Data Storage: There is data stored in file stores that are distributed in nature and that can hold a
variety of format-based big files. It is also possible to store large numbers of different format-based
big files in the data lake. This consists of the data that is managed for batch built operations and is
saved in the file stores.
• Batch Processing: Each chunk of data is split into different categories using long-running jobs,
which filter and aggregate and also prepare data for analysis. These jobs typically require sources,
process them, and deliver the processed files to new files. Multiple approaches to batch processing
are employed, including Hive jobs, U-SQL jobs, Sqoop or Pig and custom map reducer jobs
written in any one of the Java or Scala or other languages such as Python.
• Real Time-Based Message Ingestion: A real-time streaming system that caters to the data being
generated in a sequential and uniform fashion is a batch processing system. When compared to
batch processing, this includes all real-time streaming systems that cater to the data being
generated at the time it is received. This data mart or store, which receives all incoming messages
and discards them into a folder for data processing, is usually the only one that needs to be
contacted. Message-based ingestion stores such as Apache Kafka, Apache Flume, Event hubs from
Azure, and others, on the other hand, must be used if message-based processing is required. The
delivery process, along with other message queuing semantics, is generally more reliable.
Big Data Model
• Stream Processing: Real-time message ingest and stream processing are different. The latter uses
the ingested data as a publish-subscribe tool, whereas the former takes into account all of the
ingested data in the first place and then utilizes it as a publish-subscribe tool. Stream processing, on
the other hand, handles all of that streaming data in the form of windows or streams and writes it to
the sink. This includes Apache Spark, Flink, Storm, etc.
• Analytics-Based Datastore: In order to analyze and process already processed data, analytical
tools use the data store that is based on HBase or any other NoSQL data warehouse technology.
The data can be presented with the help of a hive database, which can provide metadata
abstraction, or interactive use of a hive database, which can provide metadata abstraction in the
data store. NoSQL databases like HBase or Spark SQL are also available.
• Reporting and Analysis: The generated insights, on the other hand, must be processed and that is
effectively accomplished by the reporting and analysis tools that utilize embedded technology and
a solution to produce useful graphs, analysis, and insights that are beneficial to the businesses. For
example, Cognos, Hyperion, and others.
• Orchestration: Data-based solutions that utilise big data are data-related tasks that are repetitive in
nature, and which are also contained in workflow chains that can transform the source data and
also move data across sources as well as sinks and loads in stores. Sqoop, oozie, data factory, and
others are just a few examples
Big Data Process
Big Data Layers
Big Data Layers
• Big data sources layer: The data available for analysis will vary in origin and format; the format may be
structured, unstructured, or semi-structured, the speed of data arrival and delivery will vary according to the
source, the data collection mode may be direct or through data providers, in batch mode or in real-time, and
the location of the data source may be external or within the organization.
• Data Storage layer: This layer acquires data from the data sources, converts it, and stores it in a format that
is compatible with data analytics tools. Governance policies and compliance regulations primarily decide the
suitable storage format for different types of data.
• Data Query Layer: It is the layer of data architecture where active analytic processing takes place. This is a
field where interactive queries are necessary, and traditionally dominated by SQL expert developers. Before
Hadoop, we had insufficient storage, due to which it takes a long analytics process. At first, it goes through a
Lengthy process, i.e., ETL, to get a new data source ready to be stored, and after that, it puts the data in a
database or data warehouse. Data ingestion and data analytics became two essential steps that solved
problems while computing such a large amount of data while making a Data ingestion framework.
Big Data Layers
• Processing Layer: In the previous layer, we gathered the data from different sources and made it
available to go through rest of pipeline. In this layer, our data is ready we only have to route the
data to different destinations. In this layer the focus is to specialize Data Pipeline processing
system.
• Analysis layer: It extracts the data from the data storage layer (or directly from the data source) to
derive insights from the data.
• Visualization layer: This layer receives the output provided by the analysis layer and presents
them to the relevant output layer. The consumers of the output may be business processes, humans,
visualization applications, or services.
*

* Subjected to change.
Reporting Vs Analysis
• Reports and analytics help businesses improve operational efficiency and
productivity, but in different ways.
• Reports explains what is happening while Analytics helps identify why it is
happening.
• Reporting summarizes and organizes data in easily digestible ways while analytics
enables questioning and exploring that data further. It provides invaluable insights
into trends and helps create strategies to help improve operations, customer
satisfaction, growth, and other business metrics.
• Analytics enables business users to cull out insights from data, spot trends, and
help make better decisions. Next-generation analytics takes advantage of emerging
technologies like AI, NLP, and machine learning to offer predictive insights based
on historical and real-time data.
Reporting Vs Analysis
Reporting
Examples
• Take the population census, for example. This is a technical document that transmits basic
information on how many and what kind of people live in a certain country. It can be
displayed in the text, or in a visual format, such as a graph or chart. But it is static
information that can be used to assess current conditions.

• A company’s data reporting often summarizes financial information such as revenues,

accounts receivables, and net profits. This provides a timely record of the financial health
of the company, or a segment of your finances, such as sales. A sales director might report
on KPIs according to location, stage of the funnel, and close rate, to provide an accurate
picture of the total sales pipeline.
Analysis
For data analytics, the steps involved include:
• Creating a data hypothesis
• Gathering and transforming data
• Building analytical models to ingest data, process it and offer insights
• Use tools for data visualization, trend analysis, deep dives, etc.
• Using data and insights for making decisions.

• Examples:
• Marketing teams gather data on customer behavior and habits to form business strategies around them. A company like
Starbucks keeps track of its customer base through its mobile app. The mobile app provides insight into consumer spending
and buying behaviors, and the data is used in predictive analysis to orient future decisions.

• Another aspect that companies improve by using data analytics is customer experience. CX is the engagement and
interaction of customers with businesses. For example, McDonald’s stores customer data through their mobile app. These
analytical efforts help them automatically send out promotions, discounts, and other updates.
Different Types of Data Analytics
 Descriptive(business intelligence and data mining): This surface-level analysis is aimed at analyzing past data
through data aggregation and data mining.
• Descriptive analytics looks at data and analyze past event for insight as to how to approach future events. It looks at the past
performance and understands the performance by mining historical data to understand the cause of success or failure in the
past.
• Almost all management reporting such as sales, marketing, operations, and finance uses this type of analysis.
• The descriptive model quantifies relationships in data in a way that is often used to classify customers or prospects into
groups.
• Unlike a predictive model that focuses on predicting the behavior of a single customer, Descriptive analytics identifies many
different relationships between customer and product.
• Common examples of Descriptive analytics are company reports that provide historic reviews like:
• Data Queries
• Reports
• Descriptive Statistics
• Data dashboard
Different Types of Data Analytics
 Diagnostic: This kind of analysis explores the “why”. For instance, diagnostic analysis can help in understanding the
reason behind a sudden drop in customers for a company.

• In this analysis, we generally use historical data over other data to answer any question or for the solution of any problem.
We try to find any dependency and pattern in the historical data of the particular problem.

• For example, companies go for this analysis because it gives a great insight into a problem, and they also keep detailed
information about their disposal otherwise data collection may turn out individual for every problem and it will be very
time-consuming.

• Common techniques used for Diagnostic Analytics are:

• Data discovery
• Data mining
• Correlations
Different Types of Data Analytics
 Predictive(forecasting): This, as the name suggests, helps in predicting the future course. This is done
through actionable, data-driven insights which businesses can use to plan their future. Predictive analytics
holds a variety of statistical techniques from modeling, machine, learning, data mining, and game theory that
analyze current and historical facts to make predictions about a future event. Techniques that are used for
predictive analytics are:
• Linear Regression
• Time series analysis and forecasting
• Data Mining
• There are three basic cornerstones of predictive analytics:
• Predictive modeling
• Decision Analysis and optimization
• Transaction profiling
Different Types of Data Analytics
 Prescriptive(optimization and simulation): This kind of analytics helps in understanding how predicted outcomes can be
used. This is a complex type of analytics involving algorithms, machine learning, and computational modelling procedures. Prescriptive
Analytics automatically synthesize big data, mathematical science, business rule, and machine learning to make a prediction and then
suggests a decision option to take advantage of the prediction.

• Prescriptive analytics goes beyond predicting future outcomes by also suggesting action benefit from the predictions and showing the
decision maker the implication of each decision option.

• Prescriptive Analytics not only anticipates what will happen and when to happen but also why it will happen. Further, Prescriptive Analytics
can suggest decision options on how to take advantage of a future opportunity or mitigate a future risk and illustrate the implication of each
decision option.

• For example, Prescriptive Analytics can benefit healthcare strategic planning by using analytics to leverage operational and usage data
combined with data of external factors such as economic data, population demography, etc.
Different Types of Data Analytics
 Cognitive analytics: It is analytics with human-like intelligence. This can include understanding the context and
meaning of a sentence, or recognizing certain objects in an image given large amounts of information. Cognitive analytics
often uses artificial intelligence algorithms and machine learning, allowing a cognitive application to improve over time.
Cognitive analytics reveals certain patterns and connections that simple analytics cannot.
Key Roles Of Successful Analytics Projects
Each key plays a crucial role in developing a successful analytics project:
• Business User
• Project Sponsor
• Project Manager
• Business Intelligence Analyst
• Database Administrator (DBA)
• Data Engineer
• Data Scientist
Key Roles Of Successful Analytics Projects
Business User :
• The business user is the one who understands the main area of the project and is also
basically benefited from the results.
• This user gives advice and consult the team working on the project about the value of the
results obtained and how the operations on the outputs are done.
• The business manager, line manager, or deep subject matter expert in the project mains
fulfills this role.

Project Sponsor :
• The Project Sponsor is the one who is responsible to initiate the project. Project Sponsor
provides the actual requirements for the project and presents the basic business issue.
• He generally provides the funds and measures the degree of value from the final output of
the team working on the project.
• This person introduce the prime concern and brooms the desired output.
Key Roles Of Successful Analytics Projects
Project Manager :
• This person ensures that key milestone and purpose of the project is met on time
and of the expected quality.

Business Intelligence Analyst :

• Business Intelligence Analyst provides business domain perfection based on a
detailed and deep understanding of the data, key performance indicators (KPIs),
key matrix, and business intelligence from a reporting point of view.
• This person generally creates fascia and reports and knows about the data feeds
and sources.
Key Roles Of Successful Analytics Projects
Database Administrator (DBA) :
• DBA facilitates and arrange the database environment to support the analytics need of the
team working on a project.
• His responsibilities may include providing permission to key databases or tables and
making sure that the appropriate security stages are in their correct places related to the
data repositories or not.

Data Engineer :
• Data engineer grasps deep technical skills to assist with tuning SQL queries for data
management and data extraction and provides support for data intake into the analytic
sandbox.
• The data engineer works jointly with the data scientist to help build data in correct ways
for analysis.
Key Roles Of Successful Analytics Projects
Data Scientist :
• Data scientist facilitates with the subject matter expertise for analytical techniques,
data modelling, and applying correct analytical techniques for a given business
issues.
• He ensures overall analytical objectives are met.
• Data scientists outline and apply analytical methods and proceed towards the data
available for the concerned project.
Data Analytics Lifecycle
Data Analytics Lifecycle
Phase 1: Discovery -
• The data science team is trained and researches the issue.
• Create context and gain understanding.
• Learn about the data sources that are needed and accessible to the project.
• The team comes up with an initial hypothesis, which can be later confirmed with evidence.

Phase 2: Data Preparation -

• Methods to investigate the possibilities of pre-processing, analysing, and preparing data before
analysis and modelling.
• It is required to have an analytic sandbox. The team performs, loads, and transforms to bring
information to the data sandbox.
• Data preparation tasks can be repeated and not in a predetermined sequence.
• Some of the tools used commonly for this process include - Hadoop, Alpine Miner, Open Refine,
etc
Data Analytics Lifecycle
Phase 3: Model Planning -
• The team studies data to discover the connections between variables. Later, it selects the most
significant variables as well as the most effective models.
• In this phase, the data science teams create data sets that can be used for training for testing,
production, and training goals.
• The team builds and implements models based on the work completed in the modelling planning
phase.
• Some of the tools used commonly for this stage are MATLAB and STASTICA.

Phase 4: Model Building -

• The team creates datasets for training, testing as well as production use.
• The team is also evaluating whether its current tools are sufficient to run the models or if they
require an even more robust environment to run models.
• Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
• Commercial tools - MATLAB, STASTICA.
Data Analytics Lifecycle
Phase 5: Communication Results -
• Following the execution of the model, team members will need to evaluate the outcomes of the model to
establish criteria for the success or failure of the model.
• The team is considering how best to present findings and outcomes to the various members of the team
and other stakeholders while taking into consideration cautionary tales and assumptions.
• The team should determine the most important findings, quantify their value to the business and create a
narrative to present findings and summarize them to all stakeholders.

Phase 6: Operationalize -
• The team distributes the benefits of the project to a wider audience. It sets up a pilot project that will
deploy the work in a controlled manner prior to expanding the project to the entire enterprise of users.
• This technique allows the team to gain insight into the performance and constraints related to the model
within a production setting at a small scale and then make necessary adjustments before full
deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL and Octave.

Unit - 1 Notes - Introduction To Data-Analytics PDF
0% (1)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
CH - 1 Relational Database Design Updated
No ratings yet
CH - 1 Relational Database Design Updated
80 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
4 pages
Dou 10 06 2024 DBMS
No ratings yet
Dou 10 06 2024 DBMS
14 pages
dm
No ratings yet
dm
5 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
Data Description For Data Mining
No ratings yet
Data Description For Data Mining
7 pages
Midterm Notes
No ratings yet
Midterm Notes
10 pages
Unit 2 1
No ratings yet
Unit 2 1
70 pages
Chapter 5 Database Systems
No ratings yet
Chapter 5 Database Systems
7 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
UNIT-1 PPT DMA
No ratings yet
UNIT-1 PPT DMA
83 pages
Advanced Database
No ratings yet
Advanced Database
22 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
WK 3
No ratings yet
WK 3
29 pages
Introduction to Data Models 677e35511a823
No ratings yet
Introduction to Data Models 677e35511a823
45 pages
GIS Database Creation and Design
100% (2)
GIS Database Creation and Design
24 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
Unit 1
No ratings yet
Unit 1
17 pages
03) Introduction to Database data management approches
No ratings yet
03) Introduction to Database data management approches
21 pages
CH 9 IMP
No ratings yet
CH 9 IMP
5 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
BD U0 01 Overview
No ratings yet
BD U0 01 Overview
43 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
Chapter Two Overview of Contemporary Database Models Database Models
No ratings yet
Chapter Two Overview of Contemporary Database Models Database Models
11 pages
Unit 2
No ratings yet
Unit 2
34 pages
Lecture Notes 1 - AD
No ratings yet
Lecture Notes 1 - AD
67 pages
Unit 1
No ratings yet
Unit 1
61 pages
Unit-Iii Advanced Database Systems
No ratings yet
Unit-Iii Advanced Database Systems
29 pages
DBMS
No ratings yet
DBMS
50 pages
Database (2)
No ratings yet
Database (2)
72 pages
Mis Chap 6
100% (1)
Mis Chap 6
2 pages
Unit 1
No ratings yet
Unit 1
10 pages
Chapter 7-1
No ratings yet
Chapter 7-1
4 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Chapter 02
No ratings yet
Chapter 02
3 pages
HO1. Organizing Data and Information
No ratings yet
HO1. Organizing Data and Information
7 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
Digital and Leadership Acumen
No ratings yet
Digital and Leadership Acumen
35 pages
Ads UT
No ratings yet
Ads UT
11 pages
C3 Ais C4 R
No ratings yet
C3 Ais C4 R
44 pages
C3 Ais C4 R
100% (1)
C3 Ais C4 R
44 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
HRIS Chapter 2
100% (1)
HRIS Chapter 2
56 pages
MIS Chapter 4
No ratings yet
MIS Chapter 4
5 pages
Chapter 8 Introduction of Database Management System
No ratings yet
Chapter 8 Introduction of Database Management System
4 pages
Dma
No ratings yet
Dma
69 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Datas
No ratings yet
Datas
27 pages
Unit 2 of AI
No ratings yet
Unit 2 of AI
5 pages
DWDM Module II
No ratings yet
DWDM Module II
103 pages
Data Mining
No ratings yet
Data Mining
84 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Csb4318 DWDM Unit II PPT Word
No ratings yet
Csb4318 DWDM Unit II PPT Word
133 pages
05 MIS11e_ch03wKeyTermsConceptsReviewedEx
No ratings yet
05 MIS11e_ch03wKeyTermsConceptsReviewedEx
43 pages
1 - Structured Analysis Methodology and Tools (20241204172416)
No ratings yet
1 - Structured Analysis Methodology and Tools (20241204172416)
30 pages
CSC 203
No ratings yet
CSC 203
13 pages
Big Data Analytics in Telecommunications: Literature Review and Architecture Recommendations
No ratings yet
Big Data Analytics in Telecommunications: Literature Review and Architecture Recommendations
22 pages
Special Purpose Databases
No ratings yet
Special Purpose Databases
2 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
MCQ - Question Bank - IT Skills Lab 2
No ratings yet
MCQ - Question Bank - IT Skills Lab 2
21 pages
School of Computing SRM IST, Kattankulathur - 603 203
No ratings yet
School of Computing SRM IST, Kattankulathur - 603 203
6 pages
Access 2007 VBA bible for data centric Microsoft Office applications 1st Edition Helen Feddema - The latest updated ebook version is ready for download
No ratings yet
Access 2007 VBA bible for data centric Microsoft Office applications 1st Edition Helen Feddema - The latest updated ebook version is ready for download
80 pages
Import Data - English Forum - Axelor Forum
No ratings yet
Import Data - English Forum - Axelor Forum
5 pages
Tips On Mainframe
90% (10)
Tips On Mainframe
42 pages
223T5A0402_aicte_internship_document2[1].pdf
No ratings yet
223T5A0402_aicte_internship_document2[1].pdf
27 pages
Fundamentals of Data Science and Analytics - AD3491 - Important Questions with Answer - Unit 1 - Introduction to Data Science
No ratings yet
Fundamentals of Data Science and Analytics - AD3491 - Important Questions with Answer - Unit 1 - Introduction to Data Science
28 pages
Administracion Informix
No ratings yet
Administracion Informix
185 pages
Expl or Upl Oa Lo Gi Sig Nu: e e D N P
No ratings yet
Expl or Upl Oa Lo Gi Sig Nu: e e D N P
9 pages
SP13257 - Janit Bansal - 141227 - CSE - 2018
No ratings yet
SP13257 - Janit Bansal - 141227 - CSE - 2018
48 pages
PLSQL PPT 17
No ratings yet
PLSQL PPT 17
18 pages
Spring Profiles
No ratings yet
Spring Profiles
9 pages
Practical File-Xii-2024-25
No ratings yet
Practical File-Xii-2024-25
38 pages
(Ebook) Beginning Java EE 6 Platform with GlassFish 3: From Novice to Professional by Goncalves, Antonio ISBN 9781430219545, 1430219548 - Download the ebook now for an unlimited reading experience
100% (2)
(Ebook) Beginning Java EE 6 Platform with GlassFish 3: From Novice to Professional by Goncalves, Antonio ISBN 9781430219545, 1430219548 - Download the ebook now for an unlimited reading experience
50 pages
Postgresql MVCC
No ratings yet
Postgresql MVCC
5 pages
MySQL Query
No ratings yet
MySQL Query
18 pages
CoursePlan CPE204 CEIT 03 402P
No ratings yet
CoursePlan CPE204 CEIT 03 402P
4 pages
Lecture Week: 5 Chapter 6: Technology Concepts: Tutorial: Bap 71 Ais Discussion Questions & Problem
No ratings yet
Lecture Week: 5 Chapter 6: Technology Concepts: Tutorial: Bap 71 Ais Discussion Questions & Problem
5 pages
Department of Mca
No ratings yet
Department of Mca
32 pages
Lecture 6
No ratings yet
Lecture 6
34 pages
IBM Data Virtualization Manager For z/OS Developer Guide
No ratings yet
IBM Data Virtualization Manager For z/OS Developer Guide
98 pages
project final final
No ratings yet
project final final
38 pages
BSIT D 2018 Prospectus (1)
No ratings yet
BSIT D 2018 Prospectus (1)
2 pages
Exercises: Building A Model Using Database First Workflow
No ratings yet
Exercises: Building A Model Using Database First Workflow
3 pages
Paper) : Maximum Marks:5 Questions
No ratings yet
Paper) : Maximum Marks:5 Questions
4 pages
Informatica Senarios - New
No ratings yet
Informatica Senarios - New
94 pages

DA(Unit 1)

Uploaded by

DA(Unit 1)

Uploaded by

DATA ANALYTICS (KCS-051)

Departmental Elective I - 5th Semester

4. Object-Relational Databases: Object-relational databases are constructed based on an

2 more Vs added to Big data:

is the ability to help companies innovate and redevelop their products.

Traditional Data Warehouses are divided into a three-tier structure as follows:

• A company’s data reporting often summarizes financial information such as revenues,

• Common techniques used for Diagnostic Analytics are:

Business Intelligence Analyst :

Phase 2: Data Preparation -

Phase 4: Model Building -

You might also like