0% found this document useful (0 votes)

67 views14 pages

Big Data Module 1

Big data presents several challenges for conventional data systems. These include: 1) The uncertainty around new data management tools and frameworks for big data. 2) A lack of skills and talent for working with big data technologies. 3) The complexity of transmitting, accessing, and loading large amounts of data from various sources into big data platforms. 4) Keeping data synchronized across different sources that update at different rates and schedules.

Uploaded by

OK BYE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views14 pages

Big Data Module 1

Uploaded by

OK BYE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CS6CRT19 Big Data Analytics Module 1

Big Data Definitions

Big Data is high-volume, high-velocity and/or high-variety information
asset that requires new forms of processing for enhanced decision making, insight
discovery and process optimization.
Other definitions found in the existing literature includes the following:
A collection of data sets so large or complex that traditional data processing
applications are inadequate. - Wikipedia
Data of a very large size, typically to the extent that its manipulation and
management present significant logistical challenges. - Oxford English Dictionary

The Characteristics (5Vs) of Big Data

For a dataset to be considered Big Data, it must possess one or more of the
following characteristics that require accommodation in the solution design and
architecture of the analytic environment:
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value
Volume
The anticipated volume of data that is processed by Big Data solutions is
substantial and ever-growing. High data volumes impose distinct data storage and
processing demands, as well as additional data preparation, curation, and
management processes.
Velocity
In Big Data environments, data can arrive at fast speeds, and enormous
datasets can accumulate within very short periods of time. From an enterprise;s point

Swamy Saswathikananda College, Poothotta 1

CS6CRT19 Big Data Analytics Module 1

of view, the velocity of data translates into the amount of time it takes for the data
to be processed once it enters the enterprise’s perimeter. Coping with the fast in flow
of data requires the enterprise to design highly elastic and available data processing
solutions and corresponding data storage capabilities.
Variety
Data variety refers to the multiple formats and types of data that need to be
supported by Big Data solutions. Data variety brings challenges for enterprises in
terms of data integration, transformation, processing, and storage.
Veracity
Veracity refers to the quality or fidelity of data. Data that enters Big Data
environments needs to be assessed for quality, which can lead to data processing
activities to resolve invalid data and remove noise. In relation to veracity, data can
be part of the signal or noise of a dataset.
Noise is data that cannot be converted into information and thus has no value,
whereas signals have value and lead to meaningful information. Data with a high
signal-to-noise ratio has more veracity than data with a lower ratio. Data that is
acquired in a controlled manner (for example, via online customer registration)
usually contains less noise. Data acquired via uncontrolled sources (such as blog
postings) contains more noise.
Value
Value is defined as the usefulness of data for an enterprise. The value
characteristic is intuitively related to the veracity characteristic in that the higher the
data fidelity, the more value it holds for the business. Value is also dependent on
how long it takes to process the data because analytics results have a shelf-life. For
instance, a 20 minute delayed stock quote has no value for making a stock trade.

Swamy Saswathikananda College, Poothotta 2

CS6CRT19 Big Data Analytics Module 1

Types/Sources of Big Data

The following are the types (sources) of big data, as suggested by IBM and
the Big Data task team:
● Social networks and web data, such as Facebook, Twitter, e-mails, blogs,
and YouTube.
● Transactions data and Business Processes data, such as credit card
transactions, flight bookings, etc. and public agencies data such as medical
records, insurance business data, etc.
● Customer master data, such as data for facial recognition and for the name,
date of birth, marriage anniversary, gender, location and income category.
● Machine-generated data, such as machine-to-machine or Internet of Things
(IOT) data, and the data from sensors, trackers, web logs and computer
systems log. Computer generated data is also considered as machine generated
data from data stores. Usage of programs for processing of data using data
repositories, such as database or file, generates data and also machine
generated data.
● Human-generated data such as biometrics data, human-machine interaction
data, e-mail records with a mail server and MySql database of student grades.

Classification/Nature of Data
Data can be classified based on its nature, as structured, semi-structured, and
unstructured data.
Structured Data
Structured data conform and associate with data schemas and data models.
Structured data are found in tables.Structured data enables the following:
● Data insert, delete, update, and append
● Indexing to enable faster data retrieval

Swamy Saswathikananda College, Poothotta 3

CS6CRT19 Big Data Analytics Module 1

● Scalability which enables increasing or decreasing capacities and data

processing operations such as storing, processing, and analytics.
● Transactions processing which follows ACID (Atomicity, Consistency,
Isolation, and Durability) rules.
● Encryption and decryption for data security
Semi-structured Data
Examples of semi-structured data are XML(Extended Markup Language) and
JSON (JavaScript Object Notation) documents. Semi-structured data contains tags
or other markers, which separate semantic elements and enforce hierarchies of
records and fields within the data. Semi-structured data does not conform and
associate with formal data model structures. Semi-structured data do not associate
data models, such as the relational database and table models.
Unstructured Data
Unstructured data does not possess data features such as tables or a database.
Unstructured data are found in file types such as .TXT, .CSV. Data may be as key-
value pairs. Data may have internal structures, such as in emails. The data do not
reveal relationships, hierarchy relationships, or object oriented features, such as
extensibility. The relationships, schema, and features need to be separately
established. Growth in data today is mostly in the form of unstructured data.

Swamy Saswathikananda College, Poothotta 4

CS6CRT19 Big Data Analytics Module 1

Challenges of Conventional Systems

1. The Uncertainty of Data Management:

One disruptive facet of big data management is the use of a wide range of
innovative data management tools and frameworks whose designs are dedicated to
supporting operational and analytical processing. The NoSQL (not only SQL)
frameworks are used that differentiate it from traditional relational database
management systems and are also largely designed to fulfill performance demands
of big data applications such as managing a large amount of data and quick response
times. There are a variety of NoSQL approaches such as hierarchical object
representation (such as JSON, XML and BSON) and the concept of a key-value
storage. The wide range of NoSQL tools, developers and the status of the market are
creating uncertainty with the data management.

2. Talent Gap in Big Data:

It is difficult to win the respect of media and analysts in tech without being
bombarded with content touting the value of the analysis of big data and
corresponding reliance on a wide range of disruptive technologies. The new tools
evolved in this sector can range from traditional relational database tools with some
alternative data layouts designed to maximize access speed while reducing the
storage footprints, NoSQL data management frameworks, in-memory analytics, and
as well as the broad Hadoop ecosystem. The reality is that there is a lack of skills
available in the market for big data technologies. The typical expert has also gained
experience through tool implementation and its use as a programming model, apart
from the big data management aspects.

Swamy Saswathikananda College, Poothotta 1

CS6CRT19 Big Data Analytics Module 1

3. Getting Data into Big Data Structure:

It might be obvious that the intent of big data management involves analyzing
and processing a large amount of data. There are many people who have raised
expectations considering analyzing huge data sets for a big data platform. They also
may not be aware of the complexity behind the transmission, access, and delivery of
data and information from a wide range of resources and then loading these data into
a big data platform. The intricate aspects of data transmission, access and loading
are only part of the challenge. The requirement to navigate transformation and
extraction is not limited to conventional relational data sets.

4. Syncing Across Data Sources:

Once you import data into big data platforms you may also realize that data
copies migrated from a wide range of sources on different rates and schedules can
rapidly get out of the synchronization with the originating system. This implies that
the data coming from one source is not out of date as compared to the data coming
from another source. It also means the commonality of data definitions, concepts,
metadata and the like. The traditional data management and data warehouses, the
sequence of data transformation, extraction and migrations all arise the situation in
which there are risks for data to become unsynchronized.

5. Extracting Information from the Data in Big Data Integration:

The most practical use cases for big data involve the availability of data,
augmenting existing storage of data as well as allowing access to end-user
employing business intelligence tools for the purpose of the discovery of data. This
business intelligence must be able to connect different big data platforms and also
provide transparency of the data consumers to eliminate the requirement of custom
coding. At the same time, if the number of data consumers grows, then one can

Swamy Saswathikananda College, Poothotta 2

CS6CRT19 Big Data Analytics Module 1

provide a need to support an increasing collection of many simultaneous user

accesses. This increment of demand may also spike at any time in reaction to
different aspects of business process cycles. It also becomes a challenge in big data
integration to ensure the right-time data availability to the data consumers.

6. Miscellaneous Challenges:

Other challenges may occur while integrating big data. Some of the challenges
include integration of data, skill availability, solution cost, the volume of data, the
rate of transformation of data, veracity and validity of data. The ability to merge data
that is not similar in source or structure and to do so at a reasonable cost and in time.
It is also a challenge to process a large amount of data at a reasonable speed so that
information is available for data consumers when they need it. The validation of data
sets is also fulfilled while transferring data from one source to another or to
consumers as well.

Intelligent Data Analysis (IDA)

Intelligent Data Analysis (IDA) discloses hidden facts that are not known previously
and provides potentially important information or facts from large quantities of data. It also
helps in making a decision. IDA helps to obtain useful information, necessary data and
interesting models from a lot of data available online in order to make the right choices.

Steps Involved in IDA

IDA, in general, includes three stages: (1) Preparation of data; (2) data mining;
(3) data validation and explanation. The preparation of data involves opting for the
required data from the related data source and incorporating it into a data set that can be
used for data mining.The main goal of intelligent data analysis is to obtain knowledge.

Swamy Saswathikananda College, Poothotta 3

CS6CRT19 Big Data Analytics Module 1

Processes in Big Data Analytics

Big data analytics refers to collecting, processing, cleaning, and analyzing
large datasets to help organizations operationalize their big data.

1. Collect Data

Data collection looks different for every organization. With today’s

technology, organizations can gather both structured and unstructured data from a
variety of sources — from cloud storage to mobile applications to in-store IoT
sensors and beyond. Some data will be stored in data warehouses where business
intelligence tools and solutions can access it easily.

Raw or unstructured data that is too diverse or complex for a warehouse may
be assigned metadata and stored in a data lake.

2. Process Data

Once data is collected and stored, it must be organized properly to get accurate
results on analytical queries, especially when it’s large and unstructured. Available
data is growing exponentially, making data processing a challenge for organizations.

One processing option is batch processing, which looks at large data blocks
over time. Batch processing is useful when there is a longer turnaround time between
collecting and analyzing data.

Stream processing looks at small batches of data at once, shortening the

delay time between collection and analysis for quicker decision-making. Stream
processing is more complex and often more expensive.

3. Clean Data

Swamy Saswathikananda College, Poothotta 1

CS6CRT19 Big Data Analytics Module 1

Data requires scrubbing to improve data quality and get stronger results; all
data must be formatted correctly, and any duplicative or irrelevant data must be
eliminated or accounted for. Dirty data can obscure and mislead, creating flawed
insights.

4. Analyze Data

Getting big data into a usable state takes time. Once it’s ready, advanced
analytics processes can turn big data into big insights. Some of these big data
analysis methods include:

● Data mining sorts through large datasets to identify patterns and relationships

by identifying anomalies and creating data clusters.

● Predictive analytics uses an organization’s historical data to make

predictions about the future, identifying upcoming risks and opportunities.

● Deep learning imitates human learning patterns by using artificial

intelligence and machine learning to layer algorithms and find patterns in the
most complex and abstract data.

Analysis Vs Reporting

Following are the five major differences between Analysis and Reporting:

1. Purpose

Reporting helps companies monitor their data even before digital technology
booms. Various organizations have been dependent on the information it brings to
their business, as reporting extracts that and makes it easier to understand.

Swamy Saswathikananda College, Poothotta 2

CS6CRT19 Big Data Analytics Module 1

Analysis interprets data at a deeper level. While reporting can link between
cross-channels of data, provide comparison, and make understand information easier
(think of a dashboard, charts, and graphs, which are reporting tools and not analysis
reports), analysis interprets this information and provides recommendations on
actions.

2. Tasks

Reporting includes building, configuring, consolidating, organizing,

formatting, and summarizing. It’s very similar to the above mentioned like turning
data into charts, graphs, and linking data across multiple channels.

Analysis consists of questioning, examining, interpreting, comparing, and

confirming. With big data, predicting is possible as well.

3. Outputs

Reporting has a push approach, as it pushes information to users and outputs

come in the forms of canned reports, dashboards, and alerts.

Analysis has a pull approach, where a data analyst draws information to

further probe and to answer business questions. Analysis presentations are
comprised of insights, recommended actions, and a forecast of its impact on the
company—all in a language that’s easy to understand at the level of the user who’ll
be reading and deciding on it.

4. Delivery

Analysis requires a more custom approach, with human minds doing superior
reasoning and analytical thinking to extract insights, and technical skills to provide

Swamy Saswathikananda College, Poothotta 3

CS6CRT19 Big Data Analytics Module 1

efficient steps towards accomplishing a specific goal. This is why data analysts and
scientists are demanded these days, as organizations depend on them to come up
with recommendations for leaders or business executives to make decisions about
their businesses.

5. Value

Reporting itself is just numbers. Without drawing insights and getting reports
aligned with your organization’s big picture, you can’t make decisions based on
reports alone.

Data analysis is the most powerful tool to bring into your business. Employing
the powers of analysis can be comparable to finding gold in your reports, which
allows your business to increase profits and further develop.

Modern Big Data Analytics Tools and Technology

Big data analytics cannot be narrowed down to a single tool or technology.

Instead, several types of tools work together to help you collect, process, cleanse,
and analyze big data. Some of the major players in big data ecosystems are listed
below.

● Hadoop is an open-source framework that efficiently stores and processes big

datasets on clusters of commodity hardware. This framework is free and can

handle large amounts of structured and unstructured data, making it a valuable
mainstay for any big data operation.
● NoSQL databases are non-relational data management systems that do not

require a fixed scheme, making them a great option for big, raw, unstructured

Swamy Saswathikananda College, Poothotta 4

CS6CRT19 Big Data Analytics Module 1

data. NoSQL stands for “not only SQL,” and these databases can handle a
variety of data models.
● Spark is an open source cluster computing framework that uses implicit data

parallelism and fault tolerance to provide an interface for programming entire

clusters. Spark can handle both batch and stream processing for fast
computation.
● R-Programming R is a free open source software programming language
and a software environment for statistical computing and graphics. It is
used by data miners for developing statistical software and data analysis. It
has become a highly popular tool for big data in recent years.

Swamy Saswathikananda College, Poothotta 5

CS6CRT19 Big Data Analytics Module 1

Statistical Concepts: Sampling Distributions

In statistics, a population is the entire pool from which a statistical sample is

drawn. A population may refer to an entire group of people, objects, events, hospital
visits, or measurements. A population can thus be said to be an aggregate
observation of subjects grouped together by a common feature.

A lot of data drawn and used by academicians, statisticians, researchers,

marketers, analysts, etc. are actually samples, not populations. A sample is a subset
of a population.

A sampling distribution is a probability distribution of a statistic obtained

from a larger number of samples drawn from a specific population. The sampling
distribution of a given population is the distribution of frequencies of a range of
different outcomes that could possibly occur for a statistic of a population.

Re-Sampling

The problem with the sampling process is that we only have a single estimate
of the population parameter, with little idea of the variability or uncertainty in the
estimate. One way to address this is by estimating the population parameter multiple
times from our data sample. This is called resampling.

Re-sampling is the method that consists of creating or drawing repeated

samples from the original samples. Resampling involves the selection of randomized
cases with replacement from the original data sample in such a manner that each
number of a sample drawn has a number of cases that are similar to the original data
sample.

Swamy Saswathikananda College, Poothotta 1

CS6CRT19 Big Data Analytics Module 1

Statistical Inference

The primary objective of a sample study is to create inferences (conclusions)

about the population by examining only a part of the population. Inferences created
in such a way are called statistical inferences. Statistical inference is a process by
which we create conclusions about the population based on samples drawn from the
population.

Prediction Error

Predictive analytical processes use new and historical data to forecast activity,
behaviour, and trends. A prediction error is the failure of some expected event to
occur. When prediction fails, humans can use different methods, examining
predictions and failures and deciding some methods to overcome such errors in the
future. Applying that type of knowledge can inform decisions and improve the
quality of future prediction.

Swamy Saswathikananda College, Poothotta 2

Autocratic-Democratic Leadership Style Questionnaire PDF
100% (5)
Autocratic-Democratic Leadership Style Questionnaire PDF
6 pages
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
No ratings yet
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
8 pages
Understanding Bird Behaviour
100% (5)
Understanding Bird Behaviour
226 pages
Bioactivities of Glycoalkaloids and Their Aglycones From Solanum Species
No ratings yet
Bioactivities of Glycoalkaloids and Their Aglycones From Solanum Species
31 pages
Chapter 9 - Pas 1 Statement of Comprehensive Income
No ratings yet
Chapter 9 - Pas 1 Statement of Comprehensive Income
26 pages
OM - Hypermotard 796 - en - MY11
No ratings yet
OM - Hypermotard 796 - en - MY11
123 pages
Module_1_Session_1 Introduction to Big Data Platform_Characteristics_Sources_Nature
No ratings yet
Module_1_Session_1 Introduction to Big Data Platform_Characteristics_Sources_Nature
4 pages
Unit 5
No ratings yet
Unit 5
63 pages
Course Material
100% (1)
Course Material
57 pages
Unit I - BDA
No ratings yet
Unit I - BDA
12 pages
Unit 4
No ratings yet
Unit 4
29 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
R19 BDA UNIT-1
No ratings yet
R19 BDA UNIT-1
22 pages
Big Data Lecture # 1
No ratings yet
Big Data Lecture # 1
15 pages
Data and Information Management
No ratings yet
Data and Information Management
18 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
117769
No ratings yet
117769
20 pages
Assignment: Ce Marketing Research & Data Analytics
No ratings yet
Assignment: Ce Marketing Research & Data Analytics
7 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
AMR Assignment
No ratings yet
AMR Assignment
11 pages
NJ CSE4261-1
No ratings yet
NJ CSE4261-1
26 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
BD 1
No ratings yet
BD 1
15 pages
Big Data
No ratings yet
Big Data
3 pages
What Is Data Mining?: Warehousing
No ratings yet
What Is Data Mining?: Warehousing
12 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
BigData_UNIT-1.docx
No ratings yet
BigData_UNIT-1.docx
19 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Cloud computing
No ratings yet
Cloud computing
86 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
module 2-3 fuba midterms
100% (1)
module 2-3 fuba midterms
5 pages
UNIT 2 BDA
No ratings yet
UNIT 2 BDA
5 pages
unit 1
No ratings yet
unit 1
20 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
bigdata_Writing
No ratings yet
bigdata_Writing
11 pages
Module 2-4
No ratings yet
Module 2-4
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
UNIT 1Big Data Introduction (1)
No ratings yet
UNIT 1Big Data Introduction (1)
56 pages
CC&BD Unit 3
No ratings yet
CC&BD Unit 3
16 pages
The Influence of Big Data Analytics in The Industry
No ratings yet
The Influence of Big Data Analytics in The Industry
15 pages
Data , Big
No ratings yet
Data , Big
90 pages
Big Data
No ratings yet
Big Data
11 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
Unit 1
No ratings yet
Unit 1
26 pages
Big Data
No ratings yet
Big Data
7 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
Big Data Analytics_Lecture Slides
No ratings yet
Big Data Analytics_Lecture Slides
72 pages
A_Review_of_Machine_Learning_Techniques
No ratings yet
A_Review_of_Machine_Learning_Techniques
6 pages
Big Data UNIT1
No ratings yet
Big Data UNIT1
23 pages
Project FInal Report
No ratings yet
Project FInal Report
67 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
(IJCST-V9I6P1) :yew Kee Wong
No ratings yet
(IJCST-V9I6P1) :yew Kee Wong
7 pages
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
No ratings yet
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
13 pages
Unit 1
No ratings yet
Unit 1
54 pages
Chapter-4.2
No ratings yet
Chapter-4.2
30 pages
unit 1 big data
No ratings yet
unit 1 big data
34 pages
Big Data Analytics: Recent Achievements and New Challenges
No ratings yet
Big Data Analytics: Recent Achievements and New Challenges
5 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Big Data and Business Opportunities
100% (1)
Big Data and Business Opportunities
6 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Appendix C Design Examples
No ratings yet
Appendix C Design Examples
2 pages
Terrestrial Natural and Man-Made Electromagnetic Noise
No ratings yet
Terrestrial Natural and Man-Made Electromagnetic Noise
0 pages
HC15 Manual PDF
No ratings yet
HC15 Manual PDF
24 pages
Technical Data CSDM Baumann - JWE-Baumann GmbH
No ratings yet
Technical Data CSDM Baumann - JWE-Baumann GmbH
8 pages
Polo
No ratings yet
Polo
9 pages
Haattat Tractors
No ratings yet
Haattat Tractors
31 pages
Accounting 203 Chapter 11 Test
No ratings yet
Accounting 203 Chapter 11 Test
5 pages
MDSReport 942254230
No ratings yet
MDSReport 942254230
183 pages
SR320 THRU SR3200: Schottky Barrier Rectifier
No ratings yet
SR320 THRU SR3200: Schottky Barrier Rectifier
2 pages
(PDF) PBA Manual - Shark Tank Project
No ratings yet
(PDF) PBA Manual - Shark Tank Project
29 pages
Wall Turbulence: It's The Little Details That Are Vital. Little Things Make Big Things Happen
No ratings yet
Wall Turbulence: It's The Little Details That Are Vital. Little Things Make Big Things Happen
20 pages
Are Ironic Acts Deliberate?
0% (1)
Are Ironic Acts Deliberate?
12 pages
Operations Management Reviewer
No ratings yet
Operations Management Reviewer
3 pages
Human Resources Management & Labour Relation
No ratings yet
Human Resources Management & Labour Relation
93 pages
3757 Peugeot 206how
100% (1)
3757 Peugeot 206how
10 pages
project wine and spirits
No ratings yet
project wine and spirits
16 pages
Adr Interview Question Bank
No ratings yet
Adr Interview Question Bank
5 pages
Case Hardrock
No ratings yet
Case Hardrock
2 pages
Witchy Academia Aesthetics School Center
No ratings yet
Witchy Academia Aesthetics School Center
53 pages
If You Cant Beat Them
No ratings yet
If You Cant Beat Them
628 pages
CSR and Ethical Issues of Kotak Mahindra Bank Limited
No ratings yet
CSR and Ethical Issues of Kotak Mahindra Bank Limited
12 pages
The Meaning of Power
No ratings yet
The Meaning of Power
4 pages
Aisin Atf-0t4 Technical Data
No ratings yet
Aisin Atf-0t4 Technical Data
1 page
Nanda Farah Febriana B.ingg Vredeburg (23) X-MIPA 8
No ratings yet
Nanda Farah Febriana B.ingg Vredeburg (23) X-MIPA 8
6 pages
Case 1-1 Starbucks - Going Global Fast
92% (13)
Case 1-1 Starbucks - Going Global Fast
2 pages

Big Data Module 1

Uploaded by

Big Data Module 1

Uploaded by

CS6CRT19 Big Data Analytics Module 1

Big Data Definitions

The Characteristics (5Vs) of Big Data

Swamy Saswathikananda College, Poothotta 1

Swamy Saswathikananda College, Poothotta 2

Types/Sources of Big Data

Swamy Saswathikananda College, Poothotta 3

● Scalability which enables increasing or decreasing capacities and data

Swamy Saswathikananda College, Poothotta 4

Challenges of Conventional Systems

1. The Uncertainty of Data Management:

2. Talent Gap in Big Data:

Swamy Saswathikananda College, Poothotta 1

3. Getting Data into Big Data Structure:

4. Syncing Across Data Sources:

5. Extracting Information from the Data in Big Data Integration:

Swamy Saswathikananda College, Poothotta 2

provide a need to support an increasing collection of many simultaneous user

Intelligent Data Analysis (IDA)

Steps Involved in IDA

Swamy Saswathikananda College, Poothotta 3

Processes in Big Data Analytics

Data collection looks different for every organization. With today’s

Stream processing looks at small batches of data at once, shortening the

Swamy Saswathikananda College, Poothotta 1

by identifying anomalies and creating data clusters.

predictions about the future, identifying upcoming risks and opportunities.

Swamy Saswathikananda College, Poothotta 2

Reporting includes building, configuring, consolidating, organizing,

Analysis consists of questioning, examining, interpreting, comparing, and

Reporting has a push approach, as it pushes information to users and outputs

Analysis has a pull approach, where a data analyst draws information to

Swamy Saswathikananda College, Poothotta 3

Modern Big Data Analytics Tools and Technology

Big data analytics cannot be narrowed down to a single tool or technology.

● Hadoop is an open-source framework that efficiently stores and processes big

datasets on clusters of commodity hardware. This framework is free and can

Swamy Saswathikananda College, Poothotta 4

parallelism and fault tolerance to provide an interface for programming entire

Swamy Saswathikananda College, Poothotta 5

Statistical Concepts: Sampling Distributions

In statistics, a population is the entire pool from which a statistical sample is

A lot of data drawn and used by academicians, statisticians, researchers,

A sampling distribution is a probability distribution of a statistic obtained

Re-sampling is the method that consists of creating or drawing repeated

Swamy Saswathikananda College, Poothotta 1

The primary objective of a sample study is to create inferences (conclusions)

Swamy Saswathikananda College, Poothotta 2

You might also like