1 U Data-Analytics-Unit-I-1
1 U Data-Analytics-Unit-I-1
(BE-2015 PATTERN)
UNIT-I: INTRODUCTION AND LIFE CYCLE
Big Data Overview
• Industries that gather and exploit data
• Credit card companies monitor purchase
• Good at identifying fraudulent purchases
• Mobile phone companies analyze calling
patterns – e.g., even on rival networks
• Look for customers might switch providers
• For social networks data is primary product
• Intrinsic value increases as data grows
Attributes Defining
Big Data Characteristics
• Huge volume of data
• Not just thousands/millions, but billions of
items
• Complexity of data types and structures
• Varity of sources, formats, structures
• Speed of new data creation and grow
• High velocity, rapid ingestion, fast analysis
Attributes Defining
Big Data Characteristics
• Volume
• Big Data observes and tracks what happens from various
sources which include business transactions, social media and
information from machine-to-machine or sensor data. This
creates large volumes of data.
• Variety
• Data comes in all formats that may be structured, numeric in
the traditional database or the unstructured text documents,
video, audio, email, stock ticker data.
• Velocity
• The data streams in high speed and must be dealt with timely.
The processing of data that is, analysis of streamed data to
produce near or real time results is also fast.
Big Data Analytics Importance
• Cost Savings : help in identifying more efficient
ways of doing business.
• Time Reductions :helps businesses analyzing data
immediately and make quick decisions based on the
learnings.
• New Product Development : By knowing the trends
of customer needs and satisfaction through analytics
you can create products according to the wants of
customers.
• Understand the market conditions : By analyzing
big data you can get a better understanding of
current market conditions.
• Control online reputation: Big data tools can do
Sources of Big Data Deluge
BI (Business
Intelligence)
Data Science
Business Intelligence (BI) vs Data Science
Business Intelligence (BI) vs Data Science
Current Analytical Architecture
Typical Analytic Architecture
Current Analytical Architecture
Data devices
• Gather data from multiple locations and continuously generate new data about
this data. For each gigabyte of new data created, an additional petabyte of data
is created about that data.
• For example, playing an online video game, Smartphones data, Retail shopping
loyalty cards data
Data collectors
• Include sample entities that collect data from the device and users.
• For example, Retail stores tracking the path a customer
Phase 1: Discovery
Phase 6: Operationalize
• The data analytic lifecycle is designed for Big Data problems and
data science projects
• With six phases the project work can occur in several phases
simultaneously
• Question to consider
• Does the model appear valid and accurate on the test data?
• Does the model output/behavior make sense to the domain experts?
• Do the parameter values make sense in the context of the domain?
• Is the model sufficiently accurate to meet the goal?
• Does the model avoid intolerable mistakes? (see Chapters 3 and 7)
• Are more data or inputs needed?
• Will the kind of model chosen support the runtime environment?
• Is a different form of the model required to address the business problem?
Common Tools for
the Model Building Phase
• Commercial Tools
• SAS Enterprise Miner – built for enterprise-level computing and analytics
• SPSS Modeler (IBM) – provides enterprise-level computing and analytics
• Matlab – high-level language for data analytics, algorithms, data exploration
• Alpine Miner – provides GUI frontend for backend analytics tools
• STATISTICA and MATHEMATICA – popular data mining and analytics
tools
• Free or Open Source Tools
• R and PL/R - PL/R is a procedural language for PostgreSQL with R
• Octave – language for computational modeling
• WEKA – data mining software package with analytic workbench
• Python – language providing toolkits for machine learning and analysis
• SQL – in-database implementations provide an alternative tool
Phase 5: Communicate Results
Phase 5: Communicate Results
• Determine if the team succeeded or failed in its objectives
• Communicate and document the key findings and major insights derived
from the analysis