0% found this document useful (0 votes)
12 views

Lecture 3a Big Data

Big Data

Uploaded by

Bilal Rauf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 3a Big Data

Big Data

Uploaded by

Bilal Rauf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Analytics in Software Engineering

(MSE669)

Dr. Assad Abbas

Associate Professor
Department of Computer Science
COMSATS University Islamabad, Islamabad Campus
[email protected]
Outline
n Big Data
n Big Data 5Vs
n Big Data and Software Engineering
n Sources and Types of Big Data

June 23, 2024 2


What is Big Data?
n Big data is a term that refers to data sets that are large,
complex, and diverse in terms of volume, velocity,
variety, veracity, and value
n Big data can be structured, semi-structured, or
unstructured, and can come from various domains and
applications, such as social media, e-commerce, health
care, science, and engineering
n Big data also offers opportunities for discovering new
insights, patterns, and knowledge that can benefit
various stakeholders, such as individuals, organizations,
and society

June 23, 2024 3


Big Data 5 Vs
n Volume
5 The size and amount of big data that companies manage and
analyze, which can range from terabytes to zettabytes
n Velocity
5 The speed and frequency of big data generation and processing,
which can be real-time, near-real-time, or batch
n Variety
5 The diversity and complexity of big data types and sources, which
can be structured, semi-structured, or unstructured
n Veracity
5 The quality and accuracy of big data and its analytics, which can be
affected by noise, inconsistency, incompleteness, and ambiguity
n Value
5 The benefit and impact of big data and its analytics, which can be
measured by the insights, knowledge, and decisions derived from the
data
June 23, 2024 4
Some Make it 4V’s

June 23, 2024 5


Harnessing Big Data

n OLTP: Online Transaction Processing (DBMSs)


n OLAP: Online Analytical Processing (Data Warehousing)
n RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

June 23, 2024 6


The Model Has Changed…
n The Model of Generating/Consuming Data has Changed

Old Model: Few companies were generating data, all others were consuming data

New Model: all of us are generating data, and all of us are consuming
data

June 23, 2024 7


Big Data Analytics vs Data Warehousing
n Related but different concepts:
5 Big Data Analytics
g Focuses on massive and diverse datasets.
g Involves processing unstructured and streaming data.
g Utilizes distributed computing frameworks like Apache Hadoop or
Apache Spark.
g Emphasizes real-time or near-real-time analysis.
g Aims to uncover patterns, trends, and correlations.
5 Data Warehousing
g Involves the consolidation and storage of structured data.
g Creates a centralized repository known as a data warehouse.
g Data is optimized for querying and reporting.
g Follows a predefined schema.
g Used for historical analysis and business intelligence.
g Provides a unified view of an organization's data.

June 23, 2024 8


Big Data and Software Engineering
n Big data and software engineering are interrelated
and interdependent fields that influence and benefit
each other
n Big data can be seen as both a product and a
resource of software engineering
n Software engineering can be seen as both a
consumer and a provider of big data analytics

June 23, 2024 9


Big Data as a Product of Software Engineering
n Software engineering produces big data because of
its activities and processes, such as:
5 code, documentation, logs, bugs, reviews, tests,
metrics, and traces
n Software data can be used to monitor, measure, and
improve the quality, performance, and productivity of
software systems and software engineering practices
n Software data can also be used to understand,
predict, and recommend various aspects of software
systems and software engineering tasks, such as
requirements, design, defects, code clones,
evolution, and developers

June 23, 2024 10


Big Data as a Resource of Software Engineering
n Software engineering consumes big data from other
domains and applications, such as social media, e-
commerce, health care, science, and engineering
n Software engineering needs to manage, process,
analyze, and visualize big data to provide solutions
and services that meet the needs and expectations
of the users and customers
n Software engineering also needs to ensure the
quality, reliability, security, and privacy of big data
and its analytics

June 23, 2024 11


Software Engineering as a Consumer of Big Data
Analytics
n Software engineering uses big data analytics to support its

activities and processes, such as requirements


engineering, design, implementation, testing, deployment,
and evolution
n Big data analytics can help software engineering to extract,

transform, and load software data from various sources and


formats
n Big data analytics can also help software engineering to

apply various methods and techniques to software data,


such as data mining, machine learning, natural language
processing, sentiment analysis, and network analysis

June 23, 2024 12


Software Engineering as a Provider of Big Data
Analytics
n Software engineering develops big data analytics for

other domains and applications, such as social


media, e-commerce, health care, science, and
engineering
n Software engineering needs to design and

implement big data systems, such as distributed file


systems (Hadoop Distributed File System, Google
File Sysetm) , parallel processing frameworks,
stream processing engines, and NoSQL databases
n Software engineering also needs to evaluate and

optimize big data systems, such as scalability,


performance, reliability, security, and privacy

June 23, 2024 13


Sources of Big Data
n Software data can also be classified into different
sources, depending on the origin, location, and
availability of the data, such as:
5 Internal data
g Data that is generated and stored within the software
organization, such as code, documentation, logs, bugs,
reviews, tests, metrics, and traces
5 External data
g Data that is collected and accessed from outside the
software organization, such as social media, e-
commerce, health care, science, and engineering

June 23, 2024 14


Types of Big Data Analytics
n Big data analytics can be categorized into different types,
depending on the purpose, method, and output of the
analysis, such as:
5 Descriptive analytics: Analytics that describe what has
happened or is happening in the data, using methods such as
statistics, aggregation, and visualization
5 Diagnostic analytics: Analytics that diagnose why something
has happened or is happening in the data, using methods such
as correlation, causation, and anomaly detection
5 Predictive analytics: Analytics that predict what will happen or is
likely to happen in the data, using methods such as regression,
classification, and forecasting
5 Prescriptive analytics: Analytics that prescribe what should
happen or what actions should be taken in the data, using
methods such as optimization, simulation, and
recommendation
June 23, 2024 15
Methods of Big Data Analytics
n Big data analytics can use various methods and techniques to
process and analyze big data, such as:
5 Data mining: The process of discovering patterns, associations,
and rules from large and complex data sets, using techniques
such as clustering, association rule mining, and frequent
itemset mining
5 Machine learning: The process of learning from data and
making predictions or decisions, using techniques such as
supervised learning, unsupervised learning, and reinforcement
learning
5 Natural language processing: The process of understanding
and generating natural language from data, using techniques
such as tokenization, stemming, lemmatization, part-of-speech
tagging, named entity recognition, and text summarization

June 23, 2024 16


Methods of Big Data Analytics
5 Sentiment analysis: The process of detecting and
measuring the emotions, opinions, and attitudes of
the data, using techniques such as lexicon-based,
rule-based, and machine learning-based approaches
5 Network analysis: The process of analyzing the
structure, properties, and dynamics of the data, using
techniques such as graph theory, centrality
measures, community detection, and link prediction

June 23, 2024 17


Output of Big Data Analytics
n Big data analytics can produce various outputs and
results from the data, such as:
5 Information
g The processed and organized data that can be understood
and interpreted by the users and customers, such as tables,
charts, and reports
5 Insights
g The derived and inferred data that can provide value and
knowledge to the users and customers, such as trends,
patterns, and correlations
5 Knowledge
g The synthesized and generalized data that can support
decision making and problem solving for the users and
customers, such as rules, models, and recommendations

June 23, 2024 18

You might also like