0% found this document useful (0 votes)
39 views

BDA Class1

Uploaded by

Celina Sawan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

BDA Class1

Uploaded by

Celina Sawan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

DIGITAL IMAGE PROCESSING

BIG DATA ANALYTICS

Lecture 3
Tools for Big Data Analysis
WHAT IS BIG DATA
 Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.

 The challenges include capture, curation, storage, search, sharing,


transfer, analysis, and visualization.

 The trend to larger data sets is due to the additional information


derivable from analysis of a single large set of related data, as compared
to separate smaller sets with the same total amount of data, allowing
correlations to be found to "spot business trends, determine quality of
research, prevent diseases, link legal citations, combat crime, and
determine real-time roadway traffic conditions.”

 Big data analytics deals with the tools and techniques to handle,
analyze and extract uncovered trends, patterns, correlations
among data to provide meaningful insight for improved decision
making.
TERMINOLOGIES USED
DATA
 Raw facts, facts on mind, number, text written in books, bit/bytes stored
in computer memory, statistics.
 We get information by analyzing and interpreting data
 Types: Structured (log files, tables), unstructured (NoSQL), semi-
structured(audio,v ideo, social media posts)
 Types of Data: Data can be categorized into different types based on its
nature. These include:
 Numerical Data: Includes quantitative values such as numbers and
measurements.
 Categorical Data: Represents categories or labels, often used for grouping or
classification.
 Textual Data: Involves written or spoken language, often in the form of
sentences, paragraphs, or documents.
 Image Data: Consists of visual information or pictures.
 Audio Data: Represents sound and auditory information.
 Video Data: Combines visual and auditory information in motion.
 Data Sources: Data can come from various sources, including sensors,
devices, surveys, social media, websites, transactions, and more. It can
be generated by humans or collected automatically by machines.
 Data Formats: Data can be stored in different formats, such as
databases, spreadsheets, documents, images, audio files, and more.
Each format has its own structure and characteristics.
 Data Processing: Data is often processed to extract meaningful insights
and information. This can involve cleaning and organizing the data,
performing calculations, applying statistical analyses, and generating
visualizations.
 Big Data: Big data refers to extremely large and complex datasets that
are beyond the capabilities of traditional data processing methods. Big
data technologies enable the storage, processing, and analysis of such
datasets.
 Data Analytics: Data analytics involves the examination of data to
uncover patterns, trends, correlations, and other valuable insights. It
includes descriptive analytics (understanding what happened),
predictive analytics (predicting what might happen), and prescriptive
analytics (suggesting actions to take).
 Data Privacy and Security: Data privacy refers to the protection of
individuals' personal and sensitive information. Data security involves
safeguarding data from unauthorized access, breaches, and cyber
threats.
 Data Mining: Data mining is the process of discovering patterns and
relationships in large datasets. It involves using techniques from
statistics, machine learning, and artificial intelligence to extract
valuable information.
A BRIEF HISTORY OF BIG DATA
 The concept of big data has evolved over the years, driven by
advancements in technology, data collection methods, and the growing
need to process and make sense of vast amounts of information. Here's
a brief history of the evolution of big data:
 Early Days - Pre-2000s:
 Data processing and storage were primarily limited by the
capabilities of mainframe computers and early databases.
 The term "business intelligence" emerged, referring to the use of
data analysis to aid business decision-making.
 Emergence of the Internet - Late 1990s:
 The growth of the internet led to the rapid increase in digital data
creation, including web pages, emails, and online transactions.
 Search engines like Google introduced methods for indexing and
retrieving information from the web.
 First Mention of the Term "Big Data" - Early 2000s:
 Doug Laney's paper in 2001 introduced the concept of the "3V"
model (Volume, Velocity, Variety) to describe the challenges of
managing large datasets.
 The term "big data" started to gain traction as a way to describe
datasets that were too large to be managed using traditional
methods.
 Open Source Technologies - Mid-2000s:
 Open source projects like Apache Hadoop began to address the
challenges of processing and analyzing massive datasets.
 Hadoop introduced a distributed computing framework that could
process data across clusters of computers.
 Social Media and Web 2.0 - Late 2000s:
 The explosion of social media platforms like Facebook, Twitter, and
YouTube generated enormous amounts of user-generated content,
contributing to the growth of big data.
 The concept of "Web 2.0" emphasized user-generated content and
interactivity on the internet.
 Mainstream Recognition - Early 2010s:
 Big data gained significant attention in the business world, as
organizations realized the potential value of analyzing large datasets
to gain insights and make better decisions.
 Companies started investing in data analytics tools and platforms.
 Advanced Analytics and Machine Learning - Mid-2010s:
 Advances in machine learning and artificial intelligence led to more
sophisticated data analysis techniques.
 Organizations began using predictive and prescriptive analytics to
anticipate future trends and make proactive decisions.
 IoT and Real-time Data - Late 2010s:
 The proliferation of Internet of Things (IoT) devices led to the generation of
real-time data streams from sensors and connected devices.
 Organizations focused on processing and analyzing data in real-time to gain
immediate insights.

 Data Privacy and Ethics Concerns - 2010s:


 As data collection increased, concerns about data privacy, security, and ethical
use of data became more prominent.
 Regulations like GDPR (General Data Protection Regulation) were introduced to
protect individuals' data rights.

 Continued Growth and Specialized Tools - Present:


 The volume and complexity of data continue to grow exponentially, leading to
the development of specialized tools, platforms, and cloud services for big data
processing and analysis.
 Big data continues to play a crucial role in various sectors, including business,
healthcare, science, and more.

 Throughout its history, big data has transformed from a concept focused on
managing large volumes of data to a complex ecosystem of technologies,
methodologies, and practices that enable organizations to derive insights and drive
innovation from their data
BUSINESS DRIVES FOR BIGDATA
INNOVATIONS
 Businesses are increasingly embracing big data innovations to gain a
competitive edge, make informed decisions, and unlock new
opportunities. Several key business drivers are behind the adoption of
big data innovations:

 Data-Driven Decision Making: Big data allows businesses to base


their decisions on data-driven insights rather than intuition or
assumptions. By analyzing large volumes of data, businesses can
identify patterns, trends, and correlations that inform strategic choices.

 Competitive Advantage: Companies that effectively harness big data


can gain a competitive advantage. Analyzing customer behavior, market
trends, and competitor activities enables businesses to identify unique
opportunities and differentiate themselves in the market.
 Customer Insights and Personalization: Big data enables businesses
to better understand their customers' preferences, behaviors, and
needs. This insight supports personalized marketing campaigns,
product recommendations, and tailored customer experiences.

 Innovation and Product Development: Analyzing big data can reveal


gaps in the market and provide insights into customer demands. This
information drives the development of innovative products, services,
and features that cater to specific customer needs.

 Operational Efficiency: Big data analytics helps optimize business


processes, streamline operations, and reduce inefficiencies. By
identifying bottlenecks and areas for improvement, companies can
enhance their operational efficiency and reduce costs.

 Risk Management and Fraud Detection: Businesses use big data


analytics to identify potential risks and fraud in real time. This is
particularly relevant in industries like finance, where detecting
anomalies and unusual patterns can prevent financial losses.
CHARACTERISTICS OF BIG DATA

 5 V’s are characteristics of big data


WHAT IS 5V’S OF BIG DATA
 Volume: Volume refers to the sheer size of data that is being generated and
collected. Traditional data management systems might struggle to handle the
enormous volumes of data that are characteristic of big data. This aspect
highlights the need for scalable and efficient storage solutions and processing
capabilities. With the advent of technologies like cloud computing and
distributed computing frameworks, handling massive volumes of data has
become more feasible.

 Velocity: Velocity represents the speed at which data is generated, collected, and
processed. In the era of big data, data is often generated and transmitted rapidly,
sometimes in real time or near-real time. Examples include data streams from
social media, sensors, financial transactions, and more. Organizations need to
have systems that can capture, process, and analyze this data at the required
speed to make timely decisions and gain insights.

 Variety: Variety refers to the diverse types and sources of data that exist. Data
comes in various formats such as structured data (like databases and
spreadsheets), unstructured data (like text documents and images), and semi-
structured data (like JSON and XML files). In addition to text, data can include
images, videos, audio recordings, and other forms of multimedia. Managing and
analyzing this diverse data requires flexible and adaptable technologies and
approaches
 Veracity: Veracity refers to the quality and trustworthiness of data.
With the abundance of data sources, ensuring data accuracy and
reliability becomes crucial for making informed decisions.

 Value: While not a traditional V, "Value" is often considered an


important aspect. Ultimately, the goal of working with big data is to
extract value from it, whether through insights, improvements in
decision-making, innovation, or other benefits.
ADVANTAGES OF BIG DATA IN
MANAGEMENT
 Big data offers numerous advantages in management across various
industries. Here are some key advantages:
 Informed Decision-Making: Big data provides access to vast amounts
of information that can help managers make more informed and data-
driven decisions. It enables them to analyze trends, patterns, and
correlations, leading to better strategic planning and resource
allocation.
 Improved Efficiency and Productivity: Analyzing big data can identify
inefficiencies and bottlenecks within processes. This allows managers
to optimize workflows, allocate resources more effectively, and
streamline operations, ultimately increasing overall efficiency and
productivity.
 Enhanced Customer Insights: Big data analytics can help managers
gain deeper insights into customer behavior, preferences, and needs.
This information is crucial for creating personalized marketing
strategies, improving customer service, and developing products that
align with customer demands.
 Risk Management and Fraud Detection: Big data analytics can be
used to identify potential risks and predict potential issues before they
escalate. In sectors such as finance and insurance, it's valuable for
detecting fraudulent activities and minimizing risks.
 Innovation and Product Development: By analyzing big data,
managers can identify gaps in the market and opportunities for
innovation. This helps in developing new products or services that
address customer needs more effectively and stand out from
competitors.
 Supply Chain Optimization: Big data can be used to track and
analyze supply chain processes in real-time. This helps in optimizing
inventory levels, reducing supply chain disruptions, and ensuring
timely delivery of goods.
 Employee Performance and Engagement: Managers can use big
data to evaluate employee performance, identify training needs, and
enhance engagement. This leads to a more motivated and productive
workforce.
 Market and Competitor Analysis: Big data allows managers to monitor
market trends, track competitor activities, and identify emerging market
opportunities. This knowledge is essential for staying competitive and
adapting to changing market dynamics.
 Predictive Analytics: With the help of big data, managers can employ
predictive analytics to forecast future trends and outcomes. This assists in
proactive decision-making and planning for various scenarios.
 Cost Reduction: Analyzing data can help identify areas where costs can be
reduced without compromising quality or efficiency. This could involve
optimizing resource allocation, reducing waste, and streamlining operations.
 Real-time Monitoring: Big data enables real-time monitoring of various
aspects of the business, such as website traffic, social media interactions, and
production processes. This real-time information allows managers to respond
quickly to emerging issues or opportunities.
 Data-Driven Culture: Embracing big data fosters a data-driven culture
within an organization, where decisions are based on evidence rather than
gut feelings. This cultural shift can lead to more transparent, accountable, and
effective management practices.
CHALLENGES AND LIMITATIONS OF BIG
DATA
While big data offers immense potential for businesses and research, it
also comes with several challenges and limitations:

 Data Quality and Accuracy: Big data often includes data from various
sources, and ensuring its quality and accuracy can be challenging.
Inaccurate or incomplete data can lead to flawed analyses and decision-
making.

 Data Privacy and Security: With the increasing volume of data,


ensuring data privacy and security becomes paramount. Protecting
sensitive information from unauthorized access, breaches, and cyber
threats is a significant challenge.

 Data Governance: Managing the ownership, integrity, and access


control of large datasets across different departments and
organizations is complex. Establishing effective data governance
policies and practices is crucial.
 Data Storage and Retrieval: Storing massive volumes of data requires
significant infrastructure and resources. Efficiently retrieving specific
data when needed, especially in real-time applications, can be a challenge.

 Data Complexity and Variety: Big data comes in various formats,


including structured, unstructured, and semi-structured data. Managing
this diversity and extracting meaningful insights from different data types
can be complex.

 Data Processing Speed (Velocity): Processing vast amounts of data


quickly is necessary for real-time analytics. Traditional data processing
methods may not be able to handle the speed at which big data is
generated and needs to be processed.

 Scalability: Big data systems need to scale horizontally to handle


increasing volumes of data. Ensuring that the system can scale seamlessly
without compromising performance is a significant challenge.
 Ethical Concerns: Big data analytics can raise ethical questions,
especially concerning privacy, bias in algorithms, and the responsible
use of data. Ensuring that data-driven decisions are ethical and fair is a
growing concern.
 Legal and Regulatory Compliance: Data usage is subject to numerous
regulations and laws, such as GDPR. Ensuring compliance with these
regulations, which vary across regions and industries, is a significant
challenge.
 Interoperability: Integrating and making sense of data from various
sources and formats, including legacy systems, can be complicated.
Ensuring interoperability between different data sources and tools is a
challenge.
CYBER ATTACKS
 A cyber attack refers to a malicious attempt by individuals, groups, or
organizations to compromise computer systems, networks, devices, or
digital information for various purposes, including unauthorized access,
data theft, disruption of services, or other malicious activities.

 Cyber attacks can target individuals, businesses, governments, and


other entities, and they come in various forms and levels of complexity.
Here are some common types of cyber attacks:
MALWARE:

 Malicious software, such as viruses, worms, Trojans, and ransomware, is


designed to infiltrate and infect computer systems to cause damage,
steal data, or extort money from victims.

 Phishing: Phishing attacks involve sending deceptive emails or


messages that appear to be from legitimate sources to trick recipients
into revealing sensitive information, such as passwords, credit card
details, or login credentials.

 Denial of Service (DoS) and Distributed Denial of Service (DDoS):


These attacks overwhelm a target system or network with a flood of
traffic, causing it to become inaccessible to legitimate users, resulting in
disruption of services.
 Man-in-the-Middle (MitM): In a MitM attack, cybercriminals
intercept and potentially modify communications between two
parties without their knowledge. This could involve eavesdropping
on sensitive information or altering messages.
 SQL Injection: This attack targets poorly secured web applications
by injecting malicious SQL queries into input fields, potentially
gaining unauthorized access to databases and sensitive data.
 Zero-Day Exploits: Cyber attackers exploit vulnerabilities in
software or hardware that are not yet known to the vendor or the
public. These exploits can be used to gain unauthorized access or
control over systems.
 Ransomware: Ransomware encrypts a victim's data and demands a
ransom payment in exchange for the decryption key. If the payment
is not made, the victim's data remains inaccessible.
 Credential Stuffing: Attackers use stolen or leaked usernames and
passwords to gain unauthorized access to multiple accounts of
victims who reuse their credentials across different platforms.

You might also like