0% found this document useful (0 votes)
30 views21 pages

Analyzing Limitations and Solutions of Existing Data Analytics

Uploaded by

kshitijseven1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views21 pages

Analyzing Limitations and Solutions of Existing Data Analytics

Uploaded by

kshitijseven1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Analyzing limitations

and solutions of
existing data analytics
Dr. Vandana Bhatia
Objectives:
 Understanding Big data Analytics
 Difference between data analytics
and Big data analytics
 Limitations
 Solutions
Big Data Challenges
Big Data Challenges
Why to put Big Data and analytics together?

➢ Big data provides gigantic statistical samples, which


enhance analytic tool results.
➢ Analytic tools and databases can now handle big data
➢ The economics of analytics is now more embraceable than
ever
➢ There’s a lot to learn from messy data, as long as it’s big.
➢ Big data is a special asset that merits leverage
➢ Analytics based on large data samples reveals and leverages
business change
Drivers and Enablers

Big Data

Business Technology
Need Advances

Analytical
Platforms
Technologies for Big Data
(and Analytics)

 Data warehouses
 Appliances
 Analytical sandboxes
 In-memory analytics
 In-database analytics
 Columnar databases
Technologies for Big Data (and Analytics)

Streaming and Critical Event


Processing (CEP) Engines
 Cloud-based services
 Non relational databases
 Hadoop/MapReduce
Hadoop/MapReduce

• Grew out of the efforts of Google, Yahoo, and


others to handle massive volumes of data

• Handles multi-structured data

• Process the data across commodity parallel


servers

• Open source software from the Apache


Software Foundation
1. Hadoop
• Apache Hadoop is the most prominent and used tool in big data
industry with its enormous capability of large-scale processing
data.
• This is 100% open source framework and runs on commodity
hardware in an existing data center. Furthermore, it can run
on a cloud infrastructure.
• Hadoop consists of four parts:
• Hadoop Distributed File System: Commonly known as HDFS,
it is a distributed file system compatible with very high scale
bandwidth.
• MapReduce: A programming model for processing big data.
• YARN: It is a platform used for managing and scheduling
Hadoop’s resources in Hadoop infrastructure.
• Libraries: To help other modules to work with Hadoop.
5. RapidMiner
• a software platform for data science activities and
provides an integrated environment for:
• Preparing data
• Machine learning
• Text mining
• Predictive analytics
• Deep learning
• Application development
• Prototyping
• RapidMiner follows a client/server model where the
server could be located on-premise, or in a cloud
infrastructure.
6. MongoDB
• MongoDB is an open source NoSQL database which is
cross-platform compatible with many built-in features.
• It runs on MEAN software stack, NET applications and,
Java platform.
• It can store any type of data like integer, string, array,
object, boolean, date etc.
• It provides flexibility in cloud-based infrastructure.
• It is flexible and easily partitions data across the servers
in a cloud structure.
• MongoDB uses dynamic schemas. Hence, you can prepare
data on the fly and quickly. This is another way of cost
saving.
R Programming Tool
• although used for statistical analysis, as a user you
don’t have to be a statistical expert.
• R has its own public library CRAN (Comprehensive R
Archive Network) which consists of more than 9000
modules and algorithms for statistical analysis of
data.
• R can run on Windows and Linux server as well inside
SQL server. It also supports Hadoop and Spark.
• Using R tool one can work on discrete data and try
out a new analytical algorithm for analysis.
• R model built and tested on a local data source can
be easily implemented in other servers or even
against a Hadoop data lake.
8. Neo4j
• Neo4j is one of the big data tools that is widely used
graph database in big data industry. It follows the
fundamental structure of graph database which is
interconnected node-relationship of data.
• It supports ACID (Atomicity, Consistency, Isolation,
Durability)transaction
• High availability
• Scalable and reliable
• Flexible as it does not need a schema or data type to
store data
• It can integrate with other databases
• Supports query language for graphs which is commonly
known as Cypher
Data
Scientists
Design and
implement
ation in
Applying 4A’s
advanced
program techniques in
ming and mathematics
developm and statistics
Analytical ent skills, to model data
and algorithm for deep
ethical developm analysis
Communi reasoning ent skills
cation skills
and
business
skills
Data
Architecture

Data 4 A’s of Data Data


Archival science Acquisition

Data
Analysis
For further reading :

https://www.educba.com/data-scientist-vs-big-data/

You might also like