0% found this document useful (0 votes)

77 views

BDA Unit 1

Big data refers to large, complex datasets that traditional software cannot effectively manage due to the immense volume. Sources like stock trades, jet engine sensors, and internet usage generate petabytes of new data daily. The evolution of big data has seen technological advances enable the storage and analysis of massive amounts of structured and unstructured data from sources like the internet, sensors, and social media. Big data brings challenges like sharing data across organizations while maintaining privacy and security, and ensuring analytics can scale to handle extremely large, fast-growing datasets.

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

BDA Unit 1

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Introduction to Big Data Analytics

1.Introduction
big data is larger, more complex data sets, especially from new data sources. These
data sets are so voluminous that traditional data processing software just can't
manage them. But these massive volumes of data can be used to address business
problems you wouldn't have been able to tackle before.
Big Data is a collection of data that is huge in volume, yet growing exponentially with
time. It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently. Big data is also a data but with
huge size.
The New York Stock Exchange is an example of Big Data that generates about one
terabyte of new trade data per day.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time.
With many thousand flights per day, generation of data reaches up to
many Petabytes.

Evolution of Big Data

The evolution of big data has been a dynamic journey shaped by technological
advancements, changing business needs, and societal shifts. Here's a timeline
highlighting key milestones in the evolution of big data:

1. Early Days (1950s-1980s):

 Big data's origins can be traced back to the early days of computing
when mainframe computers were used to process large volumes of
data for scientific and government applications.
 Databases like IBM's Information Management System (IMS) and
relational databases emerged during this period, laying the groundwork
for data storage and management.
2. Data Warehousing (1980s-1990s):
 The concept of data warehousing gained prominence as organizations
sought to centralize their data for analysis and reporting.
 Technologies like Online Analytical Processing (OLAP) and Enterprise
Data Warehouses (EDW) enabled businesses to store and analyze
vast amounts of structured data.
3. Internet Era (1990s-2000s):
 The proliferation of the internet led to an explosion of digital data.
Websites, emails, and online transactions generated massive volumes
of unstructured and semi-structured data.
 Search engines like Google and Yahoo pioneered techniques for
indexing and searching web content, paving the way for scalable data
processing and retrieval.
4. Hadoop and MapReduce (2000s):
 Apache Hadoop, an open-source framework for distributed storage and
processing of big data, emerged as a game-changer.
Introduction to Big Data Analytics
 Inspired by Google's MapReduce paper, Hadoop introduced a
scalable, fault-tolerant architecture that could handle petabytes of data
across clusters of commodity hardware.
5. Emergence of NoSQL Databases (2000s-2010s):
 Traditional relational databases struggled to handle the variety and
volume of big data. NoSQL databases, designed for non-relational,
distributed data models, gained traction.
 Technologies like MongoDB, Cassandra, and HBase offered flexible,
schema-less data storage options suitable for big data use cases.
6. Cloud Computing (2010s-present):
 Cloud computing platforms like Amazon Web Services (AWS),
Microsoft Azure, and Google Cloud Platform democratized access to
scalable computing and storage resources.
 Organizations leveraged cloud services to deploy big data solutions
without the upfront costs and complexities associated with managing
on-premises infrastructure.
7. Real-time Analytics and AI (2010s-present):
 With the rise of IoT devices, social media, and mobile applications, the
demand for real-time analytics surged.
 Technologies like Apache Spark, Kafka, and Flink emerged to enable
real-time data processing and stream analytics.
 Artificial Intelligence (AI) and Machine Learning (ML) techniques
became integral to big data analytics, offering predictive insights and
automation capabilities.
8. Ethical and Regulatory Considerations (2010s-present):
 As big data applications proliferated, concerns about data privacy

Types Of Big Data

Following are the types of Big Data:

1. Structured
2. Unstructured
3. Semi-structured

Structured
Any data that can be stored, accessed and processed in the form of fixed format is
termed as a ‘structured’ data. Over the period of time, talent in computer science has
achieved greater success in developing techniques for working with such kind of
data (where the format is well known in advance) and also deriving value out of it.

An ‘Employee’ table in a database is an example of Structured Data.

Introduction to Big Data Analytics
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In
addition to the size being huge, un-structured data poses multiple challenges in
terms of its processing for deriving value out of it. A typical example of unstructured
data is a heterogeneous data source containing a combination of simple text files,
images, videos etc

Examples Of Un-Structured Data

The output returned by ‘Google Search’.

Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-
structured data as a structured in form but it is actually not defined with e.g. a table
definition in relational DBMS. Example of semi-structured data is a data represented
in an XML file.

Examples Of Semi-Structured Data

Personal data stored in an XML file.

1.1. Characteristics Of Data

Big data can be described by the following characteristics:

 Volume
 Variety
 Velocity
 Variability

(i) Volume – The name Big Data itself is related to a size which is enormous. Size
of data plays a very crucial role in determining value out of data. Also, whether a
particular data can actually be considered as a Big Data or not, is dependent upon
the volume of data. Hence, ‘Volume’ is one characteristic which needs to be
considered while dealing with Big Data solutions.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only
sources of data considered by most of the applications. Nowadays, data in the form
of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being
considered in the analysis applications. This variety of unstructured data poses
certain issues for storage, mining and analyzing data.
Introduction to Big Data Analytics
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How
fast the data is generated and processed to meet the demands, determines real
potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like
business processes, application logs, networks, and social media sites,
sensors, Mobile devices, etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data
at times, thus hampering the process of being able to handle and manage the data
effectively.

1.2. Challenges with Big Data

The challenges in Big Data are the real implementation hurdles. These
require immediate attention and need to be handled because if not handled
then the failure of the technology may take place which can also lead to some
unpleasant result. Big data challenges include the storing, analysing the
extremely large and fast-growing data.

1. Sharing and Accessing Data:

 Perhaps the most frequent challenge in big data efforts is the
inaccessibility of data sets from external sources.
 Sharing data can cause substantial challenges.
 It includes the need for inter and intra- institutional legal documents.
2.Privacy and Security:
 It is another most important challenge with Big Data. This challenge
includes sensitive, conceptual, technical as well as legal significance.
 Most of the organizations are unable to maintain regular checks due to
large amounts of data generation. However, it should be necessary to
perform security checks and observation in real time because it is most
beneficial.
3. Analytical Challenges:
 There are some huge analytical challenges in big data which arise some
main challenges questions like how to deal with a problem if data volume
gets too large?
 Or how to find out the important data points?
4. Fault tolerance:
 Fault tolerance is another technical challenge and fault tolerance
computing is extremely hard, involving intricate algorithms.
5. Scalability:
 Big data projects can grow and evolve rapidly. The scalability issue of Big
Data has lead towards cloud computing.
Introduction to Big Data Analytics
 It leads to various challenges like how to run and execute various jobs so
that goal of each workload can be achieved cost-effectively.
1.3 Why Big Data

Big data refers to the massive volume of structured and unstructured data that
inundates businesses on a day-to-day basis. There are several reasons why big data
is important:

1. Insights Generation: Big data provides valuable insights into customer

behaviors, market trends, and operational patterns that were previously
inaccessible or difficult to obtain.
2. Decision Making: With the help of big data analytics, organizations can make
data-driven decisions rather than relying solely on intuition or past
experiences.
3. Competitive Advantage: Companies that effectively harness big data can
gain a competitive edge by optimizing processes, improving customer
experiences, and identifying new business opportunities.
4. Innovation: Big data fuels innovation by enabling organizations to develop
new products, services, and business models based on a deep understanding
of data.
5. Cost Reduction: By analyzing big data, companies can identify inefficiencies
in their operations and streamline processes, leading to cost savings.
2.1 Introduction to Big Data Analytics
Big data analytics refers to the process of examining large and varied datasets,
typically referred to as "big data," to uncover hidden patterns, unknown correlations,
market trends, customer preferences, and other useful information that can help
organizations make more informed decisions.

Big data analytics involves using advanced analytics techniques such as predictive
analytics, data mining, machine learning, and statistical analysis to extract insights
from massive datasets. These datasets are often too large or complex for traditional
data processing applications to handle.

The primary goals of big data analytics are to:

1. Gain insights: Discover patterns, correlations, and trends within large

datasets that can help organizations understand their operations, customers,
and market dynamics better.
2. Make data-driven decisions: Use the insights gained from big data
analytics to inform strategic and operational decisions, optimize processes,
and improve outcomes.
3. Improve efficiency and effectiveness : Identify opportunities to
streamline operations, improve resource allocation, enhance customer
experiences, and increase productivity.
Introduction to Big Data Analytics
Big data analytics is widely used across various industries, including finance,
healthcare, retail, manufacturing, telecommunications, and many others, to drive
innovation, improve competitiveness, and create value from data.
2.2 Classification of Analytics
Analytics can be classified into several categories based on different criteria such as
the type of data being analysed, the methods used for analysis, and the goals of the
analysis. Here are some common classifications:

1. Descriptive Analytics: This type of analytics focuses on summarizing

historical data to understand what has happened in the past. It provides
insights into past performance and often involves basic statistical analysis and
data visualization techniques.
2. Diagnostic Analytics: Diagnostic analytics aims to identify the reasons
behind past outcomes or events. It involves analysing historical data to
determine why certain outcomes occurred, often through root cause analysis
or correlation analysis.
3. Predictive Analytics: Predictive analytics uses historical data to forecast
future outcomes or trends. It involves applying statistical models and machine
learning algorithms to identify patterns in data and make predictions about
future events.
4. Prescriptive Analytics: Prescriptive analytics goes beyond predicting
future outcomes to recommend actions that can be taken to achieve desired
outcomes. It involves using optimization and simulation techniques to
generate actionable insights and recommendations.

2.3 Why is Big Data Analytics Important

Introduction to Big Data Analytics

Big data analytics helps organisations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient
operations, higher profits and happier customers. Businesses that use big data with
advanced analytics gain value in many ways, such as:

1. Reducing cost. Big data technologies like cloud-based analytics can

significantly reduce costs when it comes to storing large amounts of data (for
example, a data lake). Plus, big data analytics helps organisations find more
efficient ways of doing business.
2. Making faster, better decisions. The speed of in-memory analytics –
combined with the ability to analyse new sources of data, such as streaming
data from IoT – helps businesses analyse information immediately and make
fast, informed decisions.
3. Developing and marketing new products and services. Being able to
gauge customer needs and customer satisfaction through analytics empowers
businesses to give customers what they want, when they want it. With big
data analytics, more companies have an opportunity to develop innovative
new products to meet customers’ changing needs.

2.4 Data Science

Data science is a multidisciplinary field that uses scientific methods, algorithms, processes,
and systems to extract knowledge and insights from structured and unstructured data. It
combines aspects of statistics, computer science, and domain expertise to analyse complex
datasets and solve real-world problems. Data scientists employ various techniques such as
data mining, machine learning, predictive analytics, and data visualization to uncover
patterns, trends, and correlations in data that can inform decision-making and drive
innovation in various industries.
Introduction to Big Data Analytics

2.5 Responsibilities of a Data Scientist

The responsibilities of a data scientist can vary depending on the organization and
the specific role, but here are some common tasks and responsibilities:

1. Data Collection and Cleaning: Gathering data from various sources such as
databases, APIs, or web scraping. Cleaning and preprocessing the data to
remove errors, missing values, and inconsistencies.
2. Data Analysis and Exploration: Exploring the data to understand patterns,
trends, and relationships. Using statistical methods and visualization
techniques to gain insights and identify potential areas for further
investigation.
3. Model Development: Developing machine learning models and algorithms to
solve specific business problems or make predictions. This involves selecting
appropriate models, feature engineering, hyperparameter tuning, and
evaluating model performance.
4. Model Deployment: Deploying models into production environments, which
may involve working with software engineers to integrate models into existing
systems or develop new applications.
5. Testing and Validation: Testing the performance of models using validation
techniques such as cross-validation or holdout validation. Ensuring that
models are robust and generalize well to new data.
Introduction to Big Data Analytics
6. Communication and Visualization: Communicating findings and insights to
stakeholders through reports, presentations, or interactive dashboards.
Visualizing data and model results in a clear and understandable way.

Overall, data scientists play a crucial role in extracting meaningful insights from data to
inform decision-making and drive business value.

2.6 Terminologies Used in Big Data Environment

In a big data environment, there are several terminologies commonly used to
describe various concepts, technologies, and processes. Here are some of the key
ones:

1. Big Data: Refers to large volumes of data, both structured and unstructured,
that inundates a business on a day-to-day basis.
2. Hadoop: An open-source framework used for distributed storage and
processing of large datasets across clusters of computers.
3. MapReduce: A programming model for processing and generating large data
sets with a parallel, distributed algorithm on a cluster.
4. Data Warehouse: A central repository of integrated data from one or more
disparate sources, used for reporting and data analysis.
5. Data Lake: A storage repository that holds a vast amount of raw data in its
native format until it is needed.
6. ETL (Extract, Transform, Load): The process of extracting data from various
sources, transforming it to fit operational needs, and loading it into a data
warehouse or data lake.
7. NoSQL: A type of database that provides a mechanism for storage and
retrieval of data that is modeled in means other than the tabular relations
used in relational databases.
8. SQL (Structured Query Language): A domain-specific language used in
programming and designed for managing data held in a relational database
management system or for stream processing in a relational data stream
management system.
9. Data Mining: The process of discovering patterns in large data sets involving
methods at the intersection of machine learning, statistics, and database
systems.
10. Machine Learning: A subset of artificial intelligence that uses statistical
techniques to enable computer systems to learn from and make predictions or
decisions based on data.
11. Data Visualization: The graphical representation of information and data to
communicate complex information clearly and efficiently.
Introduction to Big Data Analytics

Acca f2 Fma Study Text PDF Full View Dow
No ratings yet
Acca f2 Fma Study Text PDF Full View Dow
155 pages
Archer c80 Manual
No ratings yet
Archer c80 Manual
2 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
Overview of Big Data: Saidatul Rahah Hamidi
No ratings yet
Overview of Big Data: Saidatul Rahah Hamidi
25 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Module 1
No ratings yet
Module 1
21 pages
Advanced DataBase Assignment
No ratings yet
Advanced DataBase Assignment
8 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Mtech Scheme
No ratings yet
Mtech Scheme
54 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
UNIT 1Big Data Introduction (1)
No ratings yet
UNIT 1Big Data Introduction (1)
56 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
# What is Big Data
No ratings yet
# What is Big Data
10 pages
Big Data Introduction
No ratings yet
Big Data Introduction
58 pages
Assignment: Advance Marketing Research & Data Analytics
No ratings yet
Assignment: Advance Marketing Research & Data Analytics
4 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Big Data Analysis
No ratings yet
Big Data Analysis
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
UNIT 1 QUESTION&ANSWERS
No ratings yet
UNIT 1 QUESTION&ANSWERS
36 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Unit 2
No ratings yet
Unit 2
35 pages
Unit I LM
No ratings yet
Unit I LM
12 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Bigdata
No ratings yet
Bigdata
12 pages
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
No ratings yet
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
12 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
BigData_UNIT-1.docx
No ratings yet
BigData_UNIT-1.docx
19 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages
Big Data
No ratings yet
Big Data
3 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
BDA-1
No ratings yet
BDA-1
26 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
29 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
BDA Unit 1
No ratings yet
BDA Unit 1
68 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
117769
No ratings yet
117769
20 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
23 pages
BDA PST
No ratings yet
BDA PST
11 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Computer Networks TCP
No ratings yet
Computer Networks TCP
48 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
Hamid Seminar PPT
No ratings yet
Hamid Seminar PPT
24 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
DWM Unit 4 Introduction To Data Mining
100% (2)
DWM Unit 4 Introduction To Data Mining
17 pages
6 File Handling and Exception Handling
No ratings yet
6 File Handling and Exception Handling
20 pages
4 Python Functions, Modules and Packages
No ratings yet
4 Python Functions, Modules and Packages
13 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
5 Object Oriented Programming in Python
No ratings yet
5 Object Oriented Programming in Python
22 pages
2 - PPT Multi Keyword Search in Cloud Data
No ratings yet
2 - PPT Multi Keyword Search in Cloud Data
13 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
BDA Unit 2
No ratings yet
BDA Unit 2
12 pages
DWM QP Win 2022
No ratings yet
DWM QP Win 2022
2 pages
4.CPU Scheduling and Algorithm-Notes
No ratings yet
4.CPU Scheduling and Algorithm-Notes
31 pages
Unit 1 Android and Its Tools
No ratings yet
Unit 1 Android and Its Tools
10 pages
C Language
No ratings yet
C Language
29 pages
Cs Maas360 MDM Flipkart
100% (1)
Cs Maas360 MDM Flipkart
2 pages
DWDM Lecture 7
No ratings yet
DWDM Lecture 7
21 pages
C++ Ori w3
No ratings yet
C++ Ori w3
6 pages
Bosch Dinion IP - NWC-0455
No ratings yet
Bosch Dinion IP - NWC-0455
70 pages
Thinapp Virt Registry
No ratings yet
Thinapp Virt Registry
7 pages
Emu Log
No ratings yet
Emu Log
14 pages
Unit 1 DS BCA NOTES
No ratings yet
Unit 1 DS BCA NOTES
7 pages
Unit - 4 Notes
No ratings yet
Unit - 4 Notes
28 pages
Python L3 Selection
No ratings yet
Python L3 Selection
11 pages
How To Spot A Fake aXXo or FXG Release Before You Download!
100% (3)
How To Spot A Fake aXXo or FXG Release Before You Download!
4 pages
Wfx report
No ratings yet
Wfx report
8 pages
Dot Net - Chapter-12
No ratings yet
Dot Net - Chapter-12
6 pages
A Methodology Based On PBL To Form Software Test Engineer
No ratings yet
A Methodology Based On PBL To Form Software Test Engineer
10 pages
Core Java Programming
No ratings yet
Core Java Programming
41 pages
Sim Card
0% (1)
Sim Card
109 pages
AL ICT Grade12 Teachers Guide 2017 English
No ratings yet
AL ICT Grade12 Teachers Guide 2017 English
281 pages
Premium2 Cracked by Aldz
No ratings yet
Premium2 Cracked by Aldz
35 pages
PDMS Bulletin116sp5
No ratings yet
PDMS Bulletin116sp5
160 pages
Ieee Paper 2
No ratings yet
Ieee Paper 2
6 pages
Ad5260 5262
No ratings yet
Ad5260 5262
24 pages
How To Configure Trunk Between Huawei Router and Switch PDF
100% (1)
How To Configure Trunk Between Huawei Router and Switch PDF
2 pages
Data Collection Use and Parameters
No ratings yet
Data Collection Use and Parameters
5 pages
Android Programming 1st Edition Bill Phillips Brian Hardy download
No ratings yet
Android Programming 1st Edition Bill Phillips Brian Hardy download
54 pages
12.+Tuba+F18 Copy
No ratings yet
12.+Tuba+F18 Copy
3 pages
Recommended Price Control For SFG and FG
No ratings yet
Recommended Price Control For SFG and FG
2 pages
System of Systems Modeling and Analysis: Sand Report
No ratings yet
System of Systems Modeling and Analysis: Sand Report
135 pages
II&CT Lecture 21 - 22 Programming Fundamentals in C++ (Conditinal Statements)
No ratings yet
II&CT Lecture 21 - 22 Programming Fundamentals in C++ (Conditinal Statements)
37 pages

BDA Unit 1

Uploaded by

BDA Unit 1

Uploaded by

Introduction to Big Data Analytics

Evolution of Big Data

1. Early Days (1950s-1980s):

Types Of Big Data

An ‘Employee’ table in a database is an example of Structured Data.

Examples Of Un-Structured Data

The output returned by ‘Google Search’.

Examples Of Semi-Structured Data

Personal data stored in an XML file.

1.1. Characteristics Of Data

(ii) Variety – The next aspect of Big Data is its variety.

1.2. Challenges with Big Data

1. Sharing and Accessing Data:

1. Insights Generation: Big data provides valuable insights into customer

The primary goals of big data analytics are to:

1. Gain insights: Discover patterns, correlations, and trends within large

1. Descriptive Analytics: This type of analytics focuses on summarizing

2.3 Why is Big Data Analytics Important

1. Reducing cost. Big data technologies like cloud-based analytics can

2.4 Data Science

2.5 Responsibilities of a Data Scientist

2.6 Terminologies Used in Big Data Environment

You might also like