Big Data and Data Science: Case Studies: Priyanka Srivatsa

This document summarizes a case study about how Nokia uses big data and data science. It discusses: 1) Nokia's goal of leveraging digital data to improve the physical world experience. This required collecting and analyzing unlimited data types and volumes. 2) Nokia's data processing architecture using Apache Hadoop to manage and process huge data volumes. This includes data collection, cleaning, aggregation, storage and analysis components. 3) Key aspects of Nokia's data infrastructure including over 100TB of structured data, petabytes of multi-structured data on Hadoop, and streaming data into Hadoop for analysis by 60,000 employees.

Uploaded by

Yasir Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views

Big Data and Data Science: Case Studies: Priyanka Srivatsa

Uploaded by

Yasir Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705

Big Data and Data Science: Case Studies

Priyanka Srivatsa1
1
Department of Computer Science & Engineering, M.S.Ramaiah Institute of Technology, Bangalore- 560054.

Abstract- Big data is a collection of large and complex data sets error: bugs in the computer system, assumptions of the
difficult to process using on-hand database management tools models & results based on erroneous data.
or traditional data processing applications. The three V’s of
Big Data (volume, variety, velocity) constitute a more Data Components provides access to data hosted within the
comprehensive definition busting the myth that big data is only boundaries of the system.
about data volume. Data Volume is the primary attribute of
big data. It can be quantified by counting records, Data Processes are those processes that help in collection &
transactions, tables or files. It can also be quantified in terms manipulation of meaningful data.
of time & in terms of terabytes or petabytes. Data Variety, the
next significant attribute of big data, is quantified in terms of
sources like logs, clickstream or social media. Data velocity,
Data Store is a data repository of a set of integrated objects.
another important attribute of big data, describes the These objects are modeled using classes defined in database
frequency of data delivery & data generation. Analysis of big schemas.
data is very complex & time consuming. An important tool
that helps understand big data & its analysis is Data Science. Data kind refers to the variety of data available for analysis.
Data Science is the study of the generalizable extraction of It includes structured data, unstructured data & semi-
knowledge from the data sets. The study of data science structured data. Structured Data exists when the information
includes studying data processing architectures, data is clearly broken down into fields that have an explicit
components & processes, data stores & data kind and the meaning & are highly categorical, ordinal or numeric.
challenges of big data. Unstructured Data exists in the form of natural language text,
images, audio & video. It requires pre-processing to identify
Keywords:-big data, volume, variety, velocity, data science, data & extract relevant features. Semi-Structured Data is used to
processing pipelines, data processing architectures, data describe the structured data that does not conform to the
components, data processes, data stores, data kind formal structure of data models associated with a relational
database or other forms of data tables.

I. INTRODUCTION Challenges in Big Data Analysis are:

Data processing pipeline typically has 5 phases: 1. Heterogeneity & Incompleteness: Machine analysis
algorithms expect homogeneous data & cannot
1. Data Acquisition & Recording - Data is recorded from understand nuance. Even after data cleaning and error
various sources. correction, some incompleteness & errors in data are
2. Information Cleaning & Extraction - The required likely to remain.
information is extracted from the underlying processes 2. Scale: Managing large & rapidly increasing volume of
& expressed in a structured form suitable for analysis. data has been a big challenge for the past few decades.
3. Data Integration, Aggregation & Representation - 3. Timeliness: The larger the data set, the longer it will
Differences in data structure & semantics need to be take to analyze.
identified & understood and an intelligent database 4. Privacy: Managing privacy is effectively both a
design is developed making data computer technical & a sociological problem that must be
understandable. addressed jointly from both perspectives to realize the
4. Query Processing, Data Modelling & Analysis -Big data promise of big data.
is often noisy, dynamic, heterogeneous, inter-related and 5. Human Collaboration: The data system needs to be
untrustworthy. Interconnected Big data forms large designed such that it accepts the distributed expert input
heterogeneous information networks, with which & supports their collaboration.
information redundancy can be explored to compensate
for missing data, to cross-check conflicting cases, to This paper discusses the data processing pipelines, data
validate trustworthy relationships, to disclose inherent stores, data components, data processes, data kind & the
clusters and to uncover hidden relationships & models. challenges of big data analytics in a real-world scenario.
Data Mining also helps improve the quality &
trustworthiness of data, understanding its semantics &
thus provides intelligent querying functions. II. CASE STUDY - I
5. Interpretation: Ultimately, the decision-maker, provided
with the result of analysis, has to interpret the result Problem Statement:
which involves examining all the assumptions made & Nokia, a top telecom firm, has a goal; to bring the world to
retracing the analysis to check the possible sources of the third phase of mobility: leveraging digital data to make it

www.rsisinternational.org Page 22
Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705
easier to navigate the physical world. To achieve this goal, B. Components
Nokia needed to find a technology solution that would
support the collection, storage and analysis of virtually The technology ecosystem consists of:
unlimited data types and volumes. Effective collection and 1).Teradata Enterprise Data Warehouse: It stores & manages
use of data has become central to Nokia’s ability to data.
understand and improve users’ experience with their phones. 2).Oracle & My SQL Data Marts: These are simpler forms
The company leverages data processing and complex of data warehouses.
analyses in order to build maps with predictive traffic and 3).HBase: It is an extensible record store with a basic
layered elevation models, to source information about points scalability model of splitting rows & columns into multiple
of interest around the world & understand the quality of nodes.
phones. Cloudera helped Nokia in its endeavor to achieve 4).Scribe: It is used to log data directly into the HBase.
this goal by deciding to employ APACHE HADOOP to 5).Sqoop: It is a command-line interface application for
manage & process huge volumes of data. transferring data between relational databases and Hadoop
(HBase).
A. Data Processing Architecture
C. Process
Data required for analysis is acquired from various resources
like phones in use, services, log files, market research, 1).Nokia has over 100 terabytes (TB) of structured data on
discussion in forums, feedback etc. All this data is sent into a Teradata and petabytes (PB) of multi-structured data on the
DATA COLLECTOR which collects & stores these various Hadoop Distributed File System (HDFS).
kinds of data required for analysis. After initial data 2).The centralized Hadoop cluster which lies at the heart of
collection, a cleaning process is conducted with sampling & Nokia’s infrastructure contains 0.5 PB of data.
conversion of data. 3).Nokia’s data warehouses and marts continuously stream
Then the data is aggregated & sent into a DATA multi-structured data into a multi-tenant Hadoop
PROCESSOR. This complete process is supervised by a environment, allowing the company’s 60,000+ employees to
DATA SUPERVISOR that appropriately pre & post access the data.
processes the live data. The aggregated data is sent into a 4).Nokia runs hundreds of thousands of Scribe processes
COMPUTE CLOUD component that consists of 3 parts each day to efficiently move data from, for example, servers
namely the Data Broker, the Data Analyzer & the Data in Singapore to a Hadoop cluster in the UK data center.
Manager. 5).The company uses Sqoop to move data from HDFS to
1).Data Broker collects & repackages information available Oracle and/or Teradata.
in the public domain in a format readable & useful to the 6).And Nokia serves data out of Hadoop through HBase.
company.
2).Data Analyzers are tools that specialize in predictive D. Data Stores
modeling & text mining thus analyzing the information
available. 1).Teradata Enterprise Data Warehouse: This data
3).Data Manager is a tool that manages the processing of warehouse uses a "shared nothing" architecture which means
huge volumes of data by realizing the entities of applications that each server node has its own memory and processing
& efficiently creating graphs & information snapshots that power. Adding more servers and nodes increases the amount
deliver the analysis into a presentable format. of data that can be stored. The database software sits on top
There is a DATA REST unit that consists of: of the servers and spreads the workload among them.
a).QUERY unit used to query from the database 2).Oracle & My SQL Data Marts: These are focused on a
b).REPORTER unit that reports the results of the related single subject (or functional area), such as Sales, Finance or
queries Marketing. Data marts are often built and controlled by a
c).CACHE unit that stores all the temporary information single department within an organization. Given their single-
retrieved from the database subject focus, data marts usually draw data from only a few
d).VISUALIZER unit that helps process the digital data & sources. The sources could be internal operational systems,
interpret results a central data warehouse or external data.
e).AUDIT unit that keeps an account of the amount of data 3).HBase: HBase is an Apache project written in Java. It is
that is being processed & the effective time required to patterned directly after Big Table:
process this data • HBase uses the Hadoop distributed file system which
f).MONITOR unit which monitors the entire functioning of updates memory and periodically writes them out to files on
the DATA REST unit the disk.
The COMPUTE CLOUD component is supported by the • The updates go to the end of a data file, to avoid seeks. The
DATA REST unit. files are periodically compacted. Updates also go to the end
Finally the processed data is fed into the DATA SAAS of a write ahead log, to perform recovery if a server crashes.
wherein the various dimensions & prospects of the data are • Row operations are atomic, with row-level locking and
discussed & interpreted. transactions. There is optional support for transactions with
wider scope. These use optimistic concurrency control,
aborting the process; if there is a conflict with other updates.

www.rsisinternational.org Page 23
Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705
• Partitioning and distribution are transparent; there is no effectively as data marts concentrate on concrete, single
client-side hashing or fixed key space. There is multiple subjects specifically on one functional area.
master support, to avoid a single point of failure. d). Query Processing, Data Modelling & Analysis is a phase
MapReduce support allows operations to be distributed where general statistic patterns are drawn from hidden
efficiently. patterns. HBase effectively derives the statistic patterns from
• HBase’s B-trees allow fast range queries and sorting. hidden patterns.
• There is a Java API, a Thrift API and REST API; e).Interpretation is a phase wherein all the assumptions made
JDBC/ODBC support has recently been added. need to be examined & the possible errors have to be
removed. Data SaaS available in the Hadoop Framework are
E.Data Kind utilized to interpret the processed data results efficiently.

1).Unstructured Raw Data: services, log files, images,

customer feedback, discussion in various forums etc. III. CASE STUDY - II
2).Structured Data: graphs, information snapshots etc.
Problem Statement
F. Business Challenges that enabled this project:
A retail supplier and buyer of medical equipment with a
1). Numerous groups inside Nokia were building silos to growing customer base, product lines, partners and vendors
accommodate individual needs. The company realized that needed a 360-degree view of its core business entities,
for effective understanding of the customer’s needs, they transactional information and integrated data for business
needed to integrate all these individual silos into a single analytics.
comprehensive data set. The inability to get this critical information was prolonging
2). Nokia wanted to understand at a holistic level how people customer and product management time, affecting overall
interact with different applications around the world, which time to market products.
required them to implement an infrastructure that could
support daily, terabyte-scale streams of unstructured data A. Data Processing Architecture
from phones in use, services, log files and other sources.
3). Leveraging this data also required complex processing The data processing pipeline for this retail supplier consists
and computation to be consumable and useful for a variety of of 5 Phases:
uses, like gleaning market insights or understanding
collective behaviors of groups; some aggregations of that a) Data Acquisition & Recording.
data also need to be easily migrated to more structured The retail supplier has a growing customer base, product
environments in order to leverage specific analytic tools. lines, partners & vendors. This means that these pillars are
4). Capturing petabyte-scale data using a relational database the sources of critical information to understand the core
was cost prohibitive and would limit the types of data that business entities, the transactional information & the core of
could be ingested. business analytics for this company. This information needs
5). Unstructured data had to be reformatted to fit into a to be acquired & recorded for further data processing.
relational schema before it could be loaded into the system.
This required an extra data processing step that slowed b) Information Extraction & Cleaning.
ingestion, created latency and eliminated elements of the There is a huge volume of critical data being flooded for
data that could become important down the road. analysis. But this data is heterogeneous in nature & needs to
be collected & converted into readable format. Information
G. Comparative Study of the Use Case & the BIG DATA Extraction Process pulls out the required information from
PIPELINE. the underlying sources & expresses it in a structured form
suitable for analysis.
The Data Processing Pipeline we studied has 5 significant
phases which Cloudera has incorporated in the DATA c) Data Integration, Aggregation & Representation.
PROCESSING PIPELINE they designed for NOKIA. Given the heterogeneity of the flood of data, it is not
The association is as follows: enough to merely record & throw the data into a repository.
a). Data Acquisition & Recording is a phase where data is Data analysis becomes a critical phase as it requires
acquired from the various sources. The Teradata Enterprise differences in data structure & semantics to be expressed in
Warehouse acquires this continuous data for processing. forms that are computer understandable & then robotically
b). Information Extraction & Cleaning is a phase where the resolvable. Domain Scientists created effective database
data is extracted according to the requirement & made designs, either through devising tools to assist the design
analysis ready. Teradata Enterprise Warehouse extracts the process or by developing techniques so that databases can be
data as per the requirement & makes it analysis ready. used effectively in the absence of intelligent data design.
c). Data Aggregation, Integration & Representation is a
phase where relevant data for analysis is grouped d) Query Processing, Data Modelling & Analysis.
considering the heterogeneity of the data acquired. The Big Data is often more noisy, dynamic, heterogeneous,
Oracle & My SQL data marts separate the relevant data inter-related & untrustworthy. General statistics obtained

www.rsisinternational.org Page 24
Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705
from frequent patterns & correlation analysis usually 2) EBS: It provides persistent block-level storage volumes
overpower individual fluctuations & often disclose more for use with Amazon EC2.
reliable hidden patterns & knowledge. Interconnected Big 3) SAP: It serves as a storage location for consolidated &
Data forms large heterogeneous information networks, with cleansed transaction data on an individual level.
which information redundancy can be explored to 4) JDE: It is used to provide periodic updates of the
compensate for missing data, to crosscheck conflicting cases, operational data changes required.
to validate trustworthy relationships, to disclose inherent 5) PSFT: It is used as a data store to manage entire business
clusters and to uncover hidden relationships and models. process relationships.
Mining requires integrated, cleaned, trustworthy and
efficiently accessible data, declarative query and mining E. Data Kind
interfaces, scalable mining algorithms and big-data
computing environments. Data mining itself is being used to 1) Transaction data are business transactions that are
help improve the quality and trustworthiness of the data, captured during business operations and processes, such as a
understand its semantics and provide intelligent querying purchase records, inquiries, and payments.
functions. 2) Metadata, defined as “data about the data”, is the
description of the data.
e) Interpretation. 3) Master data refers to the enterprise-level data entities that
Ultimately the results of analysis need to be interpreted are of strategic value to an organization. They are
by a decision maker. The process basically involves typically non-volatile and non-transactional in nature.
examining all the assumptions made & retracing the 4) Reference data are internally managed or externally
analysis. The errors have to be debugged & the assumptions sourced facts to support an organization’s ability to
at various levels need to be critically examined. effectively process transactions, manage master data, and
Supplementary information that explains the derivation of provide decision support capabilities. Geo data and market
each result & the inputs that are involved in this process need data are among the most commonly used reference data.
to be mentioned & explained wherever necessary. 5) Unstructured data make up over 70% of an organization’s
data and information assets. They include documents, digital
B. Components images, geo-spatial data, and multi-media files.
Oracle deployed its MDM suite that consists of the following 6) Analytical data are derivations of the business operation
components: and transaction data used to satisfy reporting and analytical
a) Oracle Metadata Manager: It acquires & records the needs. They reside in data warehouses, data marts, and other
continuous inflow of data into the database. decision support applications.
b) Data Relationship Manager: It consolidates, rationalized, 7) Big data refer to large datasets that are challenging to
governs & shares the master reference data. store, search, share, visualize, and analyze.
c) Data Warehouse Manager: It divides the acquired data The growth of such data is mainly a result of the increasing
into specific functional areas & stores them in data marts. channels of data in today’s world.
d) BI Publisher: It queries, monitors & reports on the master Examples include, but are not limited to, user-generated
data. content through social media, web and software logs,
e) Data Steward Component: It facilitates the UI component cameras, information-sensing mobile devices, aerial sensory
& also helps to set up the workbench. technologies, genomics and medical records.

C. Process F. Business Challenges that enabled this project:

1) Profile the master data. Understand all possible sources 1) SUPPLY CHAIN MANAGEMENT was crucial to this
and the current state of data quality in each source. retail supplier as he was facing loses because of unstructured
2) Consolidate the master data into a central repository and data management.
link it to all participating applications. 2) The time he took to market his products was really huge
3) Govern the master data. Clean it up, de- duplicate it, and which led to other companies marketing similar stuff.
enrich it with information from 3rd party systems. Manage it 3) His revenue decreased as his sales fell significantly due to
according to business rules. his inability to manage the data.
4) Share it. Synchronize the central master data with 4) SHIPPING & INVOICING ERRORS were huge that led
enterprise business processes and the connected applications. to economic & customer losses.
Insure that data stays in sync across the IT landscape. 5) Distribution slowed down owing to inadequate
5) Leverage the fact that a single version of the truth exists management of required data.
for all master data objects by supporting business 6) Errors in acquiring orders resulted in dissatisfaction of
intelligence systems and reporting. the customers.

D. Data Stores G. Comparative Study of the Use Case & the BIG DATA
PIPELINE.
1) Siebel: It is used exclusively to store CRM data.

www.rsisinternational.org Page 25
Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705
a) Data Acquisition & Recording is a phase where data is
acquired from the various sources. Oracle Metadata Manager
Tool acquires the required data.
b) Information Extraction & Cleaning is a phase where the
data is extracted according to the requirement & made
analysis ready. Data Relationship Manager extracts the data
as per requirement & makes it analysis ready.
c) Data Aggregation, Integration & Representation is a phase
where relevant data for analysis is grouped considering the
heterogeneity of the data acquired.
Data Warehouse Manager separates the relevant data
effectively as data marts concentrate on concrete, single
subjects specifically on one functional area.
d) Query Processing, Data Modelling & Analysis is a phase
where general statistic patterns are drawn from hidden
patterns. BI Publisher effectively derives the statistic patterns
from hidden patterns.
e) Interpretation is a phase wherein all the assumptions made
need to be examined & the possible errors have to be
removed. Data Steward Component is utilized to interpret
the processed data results to the workbench efficiently.

REFERENCES
[1] Big Data Analytics by Philip Russom
[2] Challenges and Opportunities with Big Data
[3] Scalable SQL & NoSQL Data Stores by Rick Catell
[4] Field Guide to Data Science by Booz, Allen & Hamilton
[5] An Architects’ Guide to Big Data - Oracle white paper
[6] Cloudera - Nokia Case Study
[7] Oracle & Big Data White Paper
[8] Master Data Management by Oracle

www.rsisinternational.org Page 26

Questioning Tool
No ratings yet
Questioning Tool
6 pages
Worksheet - Decision-Making and Problem-Solving
100% (1)
Worksheet - Decision-Making and Problem-Solving
4 pages
Worksheet - Decision-Making and Problem-Solving
100% (1)
Worksheet - Decision-Making and Problem-Solving
4 pages
What Is Data
No ratings yet
What Is Data
20 pages
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
No ratings yet
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
20 pages
Informatics Engineering, An International Journal (IEIJ)
No ratings yet
Informatics Engineering, An International Journal (IEIJ)
20 pages
Aditya 18cs03 Seminar Report
No ratings yet
Aditya 18cs03 Seminar Report
27 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Three V of Big Data
No ratings yet
Three V of Big Data
4 pages
A_Review_of_Machine_Learning_Techniques
No ratings yet
A_Review_of_Machine_Learning_Techniques
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
The Influence of Big Data Analytics in The Industry
No ratings yet
The Influence of Big Data Analytics in The Industry
15 pages
R19 BDA UNIT-1
No ratings yet
R19 BDA UNIT-1
22 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
What Is Big Data
No ratings yet
What Is Big Data
3 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
Big_Data_Big_Data_Analysis,_I
No ratings yet
Big_Data_Big_Data_Analysis,_I
10 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Unit 5 Concepts of Big Data and Data Lake
No ratings yet
Unit 5 Concepts of Big Data and Data Lake
15 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Bigdata Documentation
No ratings yet
Bigdata Documentation
20 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
35 pages
big_data-intro
No ratings yet
big_data-intro
31 pages
3-2 Csd Bda Full Notes
No ratings yet
3-2 Csd Bda Full Notes
115 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Unit-I Bdaur-Bcom
No ratings yet
Unit-I Bdaur-Bcom
5 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
100% (1)
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
8 pages
Overview of Big Data: Saidatul Rahah Hamidi
No ratings yet
Overview of Big Data: Saidatul Rahah Hamidi
25 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Unit-I_Big Data_ (1)
No ratings yet
Unit-I_Big Data_ (1)
29 pages
(IJCST-V9I6P1) :yew Kee Wong
No ratings yet
(IJCST-V9I6P1) :yew Kee Wong
7 pages
UNIT I notes
No ratings yet
UNIT I notes
26 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
221 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
UNIT I BDA
No ratings yet
UNIT I BDA
18 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Unit 1
No ratings yet
Unit 1
10 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
How Robots Take Our Jobs1
No ratings yet
How Robots Take Our Jobs1
25 pages
Dynamic Pricing and Learning
No ratings yet
Dynamic Pricing and Learning
39 pages
Lecture3 Eval Metric
No ratings yet
Lecture3 Eval Metric
26 pages
AI Business Ideas
100% (1)
AI Business Ideas
12 pages
Ai Compliance in Control
No ratings yet
Ai Compliance in Control
16 pages
MATH 685/ CSI 700/ OR 682 Lecture Notes: Optimization Problems
No ratings yet
MATH 685/ CSI 700/ OR 682 Lecture Notes: Optimization Problems
69 pages
Welcome To Electromechanical Systems Course
No ratings yet
Welcome To Electromechanical Systems Course
15 pages
Electric Machine: Dr. Shorouk Ossama
No ratings yet
Electric Machine: Dr. Shorouk Ossama
28 pages
Assad Abu-Jasser, PHD: Electric Power Engineering
No ratings yet
Assad Abu-Jasser, PHD: Electric Power Engineering
25 pages
Requirement of Stylus - Touch Pads
No ratings yet
Requirement of Stylus - Touch Pads
2 pages
ET 332bcomplete16
No ratings yet
ET 332bcomplete16
15 pages
Prospectus
No ratings yet
Prospectus
148 pages
Electric Machine: Magnetic Circuit
No ratings yet
Electric Machine: Magnetic Circuit
39 pages
Statistics MCQs - Hypothesis Testing For One Population Part 1 - Examrace
No ratings yet
Statistics MCQs - Hypothesis Testing For One Population Part 1 - Examrace
7 pages
Lecture 7: Weak Duality: 7.1.1 Primal Problem
No ratings yet
Lecture 7: Weak Duality: 7.1.1 Primal Problem
10 pages
Parameter Estimation of Discrete-Time Systems Using Short-Periodic Pseudo-Random Sequences
No ratings yet
Parameter Estimation of Discrete-Time Systems Using Short-Periodic Pseudo-Random Sequences
11 pages
Stoicism For Inner Strength 9798854189552
100% (1)
Stoicism For Inner Strength 9798854189552
109 pages
Math 7 Quarter 3
100% (2)
Math 7 Quarter 3
5 pages
Twitter Privacy Policy en
No ratings yet
Twitter Privacy Policy en
19 pages
Generator - Motor Operations
No ratings yet
Generator - Motor Operations
5 pages
S01L05 Grubl Fritz
No ratings yet
S01L05 Grubl Fritz
2 pages
Sat Practice Test 4
No ratings yet
Sat Practice Test 4
29 pages
Das Mahavidya
No ratings yet
Das Mahavidya
7 pages
English
No ratings yet
English
10 pages
Lecture 9
No ratings yet
Lecture 9
34 pages
Farming Heritage Chicken Breeds of The Philippines
No ratings yet
Farming Heritage Chicken Breeds of The Philippines
28 pages
Reading Lesson Plan
No ratings yet
Reading Lesson Plan
3 pages
OWASP6thAppSec TestingGuidev2 MatteoMeuci
No ratings yet
OWASP6thAppSec TestingGuidev2 MatteoMeuci
6 pages
#2: Adaptation To Extreme Heat Waves: Integrate Climate of Change Case Study 6.1
No ratings yet
#2: Adaptation To Extreme Heat Waves: Integrate Climate of Change Case Study 6.1
2 pages
ARAMCO Pre-Qualification Questionnaire (Contractors) - EMPTY
No ratings yet
ARAMCO Pre-Qualification Questionnaire (Contractors) - EMPTY
12 pages
Automatic Power Factor Correction2
100% (1)
Automatic Power Factor Correction2
40 pages
Approved Loan Application 1727591018602
No ratings yet
Approved Loan Application 1727591018602
3 pages
Tybcom Sem 5 Computer Systems & Applications - SAMPLE MCQS Dec-2020
No ratings yet
Tybcom Sem 5 Computer Systems & Applications - SAMPLE MCQS Dec-2020
11 pages
L01 0000 EAUT DD 0001 Unifilar Tablero PC
No ratings yet
L01 0000 EAUT DD 0001 Unifilar Tablero PC
49 pages
P129
No ratings yet
P129
4 pages
The Strategic Sport Marketing Planning Process
No ratings yet
The Strategic Sport Marketing Planning Process
8 pages
Toshiba_History
No ratings yet
Toshiba_History
3 pages
Ghid CMP 2023
No ratings yet
Ghid CMP 2023
93 pages
Jurnal Discharge Planning Bahasa Inggris PDF
No ratings yet
Jurnal Discharge Planning Bahasa Inggris PDF
7 pages
Complete Download Introductory Chemistry, 2nd Edition Kevin Revell PDF All Chapters
100% (1)
Complete Download Introductory Chemistry, 2nd Edition Kevin Revell PDF All Chapters
47 pages
Catalent Script
No ratings yet
Catalent Script
3 pages
Saloni Utekar8-20
No ratings yet
Saloni Utekar8-20
13 pages
Diagrama de Solucion Mediano Avaya
No ratings yet
Diagrama de Solucion Mediano Avaya
1 page
(Ebook) Social Anthropology by E.E. Evans-Pritchard ISBN 9780415330305, 0415330300 download
100% (1)
(Ebook) Social Anthropology by E.E. Evans-Pritchard ISBN 9780415330305, 0415330300 download
47 pages
Thenaturalrsdmax
No ratings yet
Thenaturalrsdmax
52 pages

Big Data and Data Science: Case Studies: Priyanka Srivatsa

Uploaded by

Big Data and Data Science: Case Studies: Priyanka Srivatsa

Uploaded by

Volume I, Issue II, July 2014 IJRSI ISSN 2321 - 2705

Big Data and Data Science: Case Studies

I. INTRODUCTION Challenges in Big Data Analysis are:

1).Unstructured Raw Data: services, log files, images,

C. Process F. Business Challenges that enabled this project:

You might also like