0% found this document useful (0 votes)
2 views

What is Big Data

Big Data refers to vast, complex datasets that cannot be managed by traditional databases, requiring advanced analytics techniques and technologies like Hadoop. It offers significant benefits across industries, including improved decision-making and innovation, while its five V's—volume, velocity, variety, veracity, and value—highlight the challenges and opportunities it presents. Operationalizing Big Data involves collecting, processing, cleaning, and analyzing data to extract actionable insights for strategic decisions.

Uploaded by

Victor Matsvaire
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

What is Big Data

Big Data refers to vast, complex datasets that cannot be managed by traditional databases, requiring advanced analytics techniques and technologies like Hadoop. It offers significant benefits across industries, including improved decision-making and innovation, while its five V's—volume, velocity, variety, veracity, and value—highlight the challenges and opportunities it presents. Operationalizing Big Data involves collecting, processing, cleaning, and analyzing data to extract actionable insights for strategic decisions.

Uploaded by

Victor Matsvaire
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

What is Big Data?

Big Data is the data which cannot be managed using traditional databases. In other words, big data
gets generated in multi terabyte quantities. It changes fast and comes in varieties of forms that are
difficult to manage and process using RDBMS or other traditional technologies.

Big Data is the new Oil.

The main difference between big data analytics and traditional data analytics is the type of data
handled and the tools used to analyze it.

Traditional analytics deals with structured data, typically stored in relational databases. This type of
database helps ensure that data is well-organized and easy for a computer to understand. Uses sql

Big data analytics involves massive amounts of data in various formats, including structured, semi-
structured and unstructured data. The complexity of this data requires more sophisticated analysis
techniques. Big data analytics employs advanced techniques like machine learning and data mining
to extract information from complex data sets. It often requires distributed processing systems like
Hadoop to manage the sheer volume of data.

Big data analytics allows for the uncovering of trends, patterns and correlations in large amounts of
raw data to help analysts make data-informed decisions.

80% of the data getting generated today is unstructured and cannot be handled by our traditional
technologies.

Benefits of Big Data

Big Data provides some astonishing benefits to all kinds of businesses across the globe. From the
education sector to the healthcare industry, almost every industry is now bound to Big Data
Analytics in some or the other way. Some of the Big Data benefits include better decision making,
helping in greater innovations, product price optimization and many more.

 Better decision making


 Greater innovations invention of Driverless Cars.
 Improvement in education sector
 Product price optimization that profit is maximized.
 Recommendation engines
 Life-Saving application in the healthcare industry

Big Data Use-Cases

 Netflix Uses Big Data to Improve Customer Experience


 Promotion and campaign analysis by Sears Holding
 Sentiment analysis
 Customer Churn analysis
 Predictive analysis
 Real-time ad matching and serving

Big Data Technologies

There are lots of technologies to solve the problem of Big data Storage and processing.
Such technologies are Hadoop, mongodb

Four main data analysis methods

Descriptive analytics - the focus is on summarizing and describing past data to understand its basic
characteristics.

Diagnostic analytics ‘why it happened’ identifies the root patterns and trends observed in
descriptive analytics.

Predictive analytics “what will happen” stage. It uses historical data, statistical modeling and
machine learning to forecast trends.

Prescriptive analytics Describes the “what to do” stage, which goes beyond prediction to provide
recommendations for optimizing future actions based on insights derived from all previous

The five V's of big data analytics

1. Volume
The sheer volume of data generated today, from social media feeds, IoT devices, transaction
records and more, presents a significant challenge.
2. Velocity
Data is being produced at unprecedented speeds, from real-time social media updates to
high-frequency stock trading records. The velocity at which data flows into organizations
requires robust processing capabilities to capture, process and deliver accurate analysis in
near real-time. Stream processing frameworks and in-memory data processing are designed
to handle these rapid data streams and balance supply with demand.
3. Variety
Today's data comes in many formats, from structured to numeric data in traditional
databases to unstructured text, video and images from diverse sources like social media and
video surveillance.
4. Veracity
Data reliability and accuracy are critical, as decisions based on inaccurate or incomplete data
can lead to negative outcomes. Veracity refers to the data's trustworthiness, encompassing
data quality, noise and anomaly detection issues. Techniques and tools for data cleaning,
validation and verification are integral to ensuring the integrity of big data, enabling
organizations to make better decisions based on reliable information.
5. Value
Big data analytics aims to extract actionable insights that offer tangible value. This involves
turning vast data sets into meaningful information that can inform strategic decisions,
uncover new opportunities and drive innovation. Advanced analytics, machine learning and
AI are key to unlocking the value contained within big data, transforming raw data into
strategic assets.
Operationalizing big data analytics

Collect data: The first step involves gathering data, which can be a mix of structured and
unstructured forms from myriad sources like cloud, mobile applications and IoT sensors. This
step is where organizations adapt their data collection strategies and integrate data from
varied sources into central repositories like a data lake, which can automatically assign
metadata for better manageability and accessibility.
Process data: After being collected, data must be systematically organized, extracted,
transformed and then loaded into a storage system to ensure accurate analytical outcomes.
Processing involves converting raw data into a format that is usable for analysis, which might
involve aggregating data from different sources, converting data types or organizing data
into structure formats. Given the exponential growth of available data, this stage can be
challenging. Processing strategies may vary between batch processing, which handles large
data volumes over extended periods and stream processing, which deals with smaller real-
time data batches.
Clean data: Regardless of size, data must be cleaned to ensure quality and relevance.
Cleaning data involves formatting it correctly, removing duplicates and eliminating irrelevant
entries. Clean data prevents the corruption of output and safeguard’s reliability and
accuracy.
Analyze data: Advanced analytics, such as data mining, predictive analytics, machine learning
and deep learning, are employed to sift through the processed and cleaned data. These
methods allow users to discover patterns, relationships and trends within the data, providing
a solid foundation for informed decision-making.

Types of big data

Structured Data
Structured data refers to highly organized information that is easily searchable and typically
stored in relational databases or spreadsheets. It adheres to a rigid schema, meaning each
data element is clearly defined and accessible in a fixed field within a record or file. Examples
of structured data include:

Customer names and addresses in a customer relationship management (CRM) system


Transactional data in financial records, such as sales figures and account balances
Employee data in human resources databases, including job titles and salaries
Structured data's main advantage is its simplicity for entry, search and analysis, often using
straightforward database queries like SQL.

Unstructured Data

Unstructured data lacks a pre-defined data model, making it more difficult to collect, process
and analyze. It comprises the majority of data generated today, and includes formats such
as:

Textual content from documents, emails and social media posts


Multimedia content, including images, audio files and videos
Data from IoT devices, which can include a mix of sensor data, log files and time-series data
Semi-structured data

Semi-structured data occupies the middle ground between structured and unstructured data.

https://classroom.google.com/c/NzU1Mjk2NjI1NTQ1?cjc=f45i4yc

You might also like