0% found this document useful (0 votes)
42 views

Introduction to information and big data security

Uploaded by

joabjoshuajr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Introduction to information and big data security

Uploaded by

joabjoshuajr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Information and Big

Data Security
Mr. Joab Mumbere MCA (Amity), BIS (Mak).
[email protected]
+256703729371 (WhatsApp).
Introduction to Big
Data
The quantity of data created by humans is quickly
increasing every year as a result of the introduction
of new technology, gadgets, and communication
channels such as social networking sites. Big data is
a group of enormous datasets that can't be handled
with typical computer methods.
It is no longer a single technique or tool; rather, it
has evolved into a comprehensive subject including
a variety of tools, techniques, and frameworks.
Quantities, letters, or symbols on which a computer
performs operations and which can be stored and
communicated as electrical signals and recorded on
magnetic, optical, or mechanical media
What is Big Data

Big Data is a massive collection of data that continues to


increase dramatically over time. It is a data set that is so
huge and complicated that no typical data management
technologies can effectively store or process it. Big data
is similar to regular data, except it is much larger. Big
data analytics is the use of advanced analytic
techniques to very large, heterogeneous data sets,
which can contain structured, semi-structured, and
unstructured data, as well as data from many sources
and sizes ranging from terabytes to zettabytes.
Big data is a term that defines the massive amount of organized and
unstructured data that a company encounters on a daily basis.
Small Data vs. Big
Data
Small Data Big Data
Volume
Small datasets that can be easily handled and Massive volumes of data that exceed the
processed by traditional databases and tools processing capabilities of conventional
databases. Typically measured in terabytes,
petabytes, or even larger.
Velocity
Data is generated at a manageable and steady Data is generated rapidly and continuously, often
pace. in real-time or near-real-time.
Variety
Relatively simple and structured data, often Diverse data types, including structured, semi-
stored in traditional databases. structured, and unstructured data. This can
include text, images, videos, social media posts,
etc.
Veracity
Small Data vs. Big
Data
Small Data Big Data
Value
The focus is often on extracting insights from a Aims to extract meaningful insights from large
limited dataset. and diverse datasets, often leading to more
comprehensive and valuable results.
Analytics Approach
Traditional analytics tools and methods are often Requires advanced analytics techniques,
sufficient. including machine learning and data mining, due
to the scale and complexity.
Examples
Customer records in a local database, sales Social media posts, sensor data from IoT devices,
transactions for a small business. financial transactions in a global network.
Infrastructure
Can be processed on a single machine or a small Requires distributed computing frameworks like
cluster. Hadoop or Spark to process data across multiple
Evolution of Data/Big
The Data
evolution of data and the emergence of big data have been
influenced by technological advancements, changes in business
practices, and the increasing digitization of various aspects of
our lives. Here is an overview of the evolution of data and the
advent of big data:
1. Early Data Processing (Pre-20th Century):
• Data processing was manual and paper-based.
• Limited data sources, primarily handwritten records.
2. Punch Card Era (Late 19th to Early 20th Century):
• Introduction of punch card systems for automated data
processing.
• Used in the 1890 U.S. Census and later adopted by
businesses.
3. Mainframe Computers (1950s-1970s):
• Mainframes enabled faster and more efficient data
processing.
• Rise of structured databases for storing and managing data.
Evolution of Data/Big
Data
4. Relational Databases (1970s-1990s):
• Development of the relational database model.
• SQL became a standard language for querying databases.
5. Data Warehousing (1980s-1990s):
• Centralized storage of large volumes of data for analytical purposes.
• Decision support systems emerged for extracting insights from data.
6. Internet and Web 1.0 (1990s):
• The internet became widely accessible.
• Websites generated data, but data volumes were still relatively small.
7. Digital Transformation (Late 20th Century):
• Organizations increasingly digitized processes, creating more electronic
data.
• Growth in databases, but not yet at the scale of big data.
8. Web 2.0 and Social Media (2000s):
• Rise of user-generated content on platforms like Facebook, Twitter, and
YouTube.
• Massive increase in data creation and sharing.
9. Mobile Devices and IoT (2010s):
• Proliferation of smartphones and connected devices.
Evolution of Data/Big
Data
10. Big Data Concept Emerges (Early 21st Century):
• Coined the term "big data" to describe datasets too large for traditional
databases.
• Shift from structured to semi-structured and unstructured data.
11. Advancements in Data Storage and Processing (2010s):
• Development of distributed computing frameworks like Hadoop and Spark.
• Cloud computing services enable scalable storage and processing.
12. Machine Learning and Advanced Analytics (2010s-Present):
• Integration of machine learning and data analytics into big data processing.
• Real-time analytics and predictive modeling become feasible.
13. Current Trends (2020s):
• Continued growth in data volumes, especially with the expansion of edge
computing.
• Increasing focus on data privacy, security, and ethical considerations.
• Adoption of advanced technologies like blockchain for secure data
transactions.
The evolution of data into the era of big data reflects a transformative journey
driven by technological innovation, societal changes, and the increasing
importance of data-driven insights in various domains. As technology continues
to advance, the landscape of data and big data will likely undergo further
changes and developments.
Evolution of Data/Big
Data
10. Big Data Concept Emerges (Early 21st Century):
• Coined the term "big data" to describe datasets too large for traditional
databases.
• Shift from structured to semi-structured and unstructured data.
11. Advancements in Data Storage and Processing (2010s):
• Development of distributed computing frameworks like Hadoop and Spark.
• Cloud computing services enable scalable storage and processing.
12. Machine Learning and Advanced Analytics (2010s-Present):
• Integration of machine learning and data analytics into big data processing.
• Real-time analytics and predictive modeling become feasible.
13. Current Trends (2020s):
• Continued growth in data volumes, especially with the expansion of edge
computing.
• Increasing focus on data privacy, security, and ethical considerations.
• Adoption of advanced technologies like blockchain for secure data
transactions.
The evolution of data into the era of big data reflects a transformative journey
driven by technological innovation, societal changes, and the increasing
importance of data-driven insights in various domains. As technology continues
to advance, the landscape of data and big data will likely undergo further
changes and developments.
Evolution of Data/Big
Data
10. Big Data Concept Emerges (Early 21st Century):
• Coined the term "big data" to describe datasets too large for traditional
databases.
• Shift from structured to semi-structured and unstructured data.
11. Advancements in Data Storage and Processing (2010s):
• Development of distributed computing frameworks like Hadoop and Spark.
• Cloud computing services enable scalable storage and processing.
12. Machine Learning and Advanced Analytics (2010s-Present):
• Integration of machine learning and data analytics into big data processing.
• Real-time analytics and predictive modeling become feasible.
13. Current Trends (2020s):
• Continued growth in data volumes, especially with the expansion of edge
computing.
• Increasing focus on data privacy, security, and ethical considerations.
• Adoption of advanced technologies like blockchain for secure data
transactions.
The evolution of data into the era of big data reflects a transformative journey
driven by technological innovation, societal changes, and the increasing
importance of data-driven insights in various domains. As technology continues
to advance, the landscape of data and big data will likely undergo further
changes and developments.
Big Data
Characteristics
Volume
The term 'Big Data' refers to a massive amount of
information. The term "volume" refers to a large
amount of data. The magnitude of data plays a critical
role in determining its worth. When the
amount of data is extremely vast, it is referred to as 'Big
Data.'
This means that the volume of data determines whether
or not a set of data may be classified as Big
Data. As a result, while dealing with Big Data, it is vital to
consider a certain 'Volume.'
Example:
In 2016, worldwide mobile traffic was predicted to be 6.2
Exabytes (6.2 billion GB) per month.
Furthermore, by 2020, we will have about 40000
ExaBytes of data.
Big Data
Characteristics
Velocity
The term "velocity" refers to the rapid collection of data. Data comes in
at a high rate from machines, networks, social media, mobile phones,
and other sources in Big Data velocity. A large and constant influx of
data exists. This influences the data's potential, or how quickly data is
created and processed in order to satisfy needs. Data sampling can
assist in dealing with issues such as'velocity.' For instance, Google
receives more than 3.5 billion queries every day. In addition, the
number of Facebook users is growing at a rate of around 22% every
year.
Variety
Structured data is just data that has been arranged. It usually refers to
data that has been specified in terms of length and format.
Semi-structured data is a type of data that is semi-organized. It's a type
of data that doesn't follow the traditional data structure. This sort of
data is represented by log files.
Unstructured data is just data that has not been arranged. It usually
refers to data that doesn't fit cleanly into a relational database's
standard row and column structure.Texts, pictures, videos etc. are the
examples of unstructured data which can’t be stored in the form of rows
and columns.
Benefits of Big Data
Processing
Ability to process Big Data brings in multiple benefits, such as-
1. Businesses can utilize outside intelligence while taking
decisions.
2. Access to social data from search engines and sites like
facebook, twitter are enabling organizations to fine tune their
business strategies.
3. Improved customer service (Traditional customer feedback
systems are getting replaced by new systems designed with Big
Data technologies.
4. Improved customer service (In these new systems, Big Data
and natural language processing technologies are being used to
read and evaluate consumer responses.
5. Early identification of risk to the product/services, if any
6. Better operational efficiency
Big Data technologies can be used for creating a staging area
or landing zone for new data before identifying what data
should be moved to the data warehouse. In addition, such
integration of Big
Importance of Big Data
• Cost Savings
Big data helps in providing business intelligence that can
reduce costs and improve the efficiency of operations.
Processes like quality assurance and testing can involve many
complications particularly in industries like biopharmaceuticals
and nanotechnologies
• Time Reductions
Companies may collect data from a variety of sources using
real-time in-memory analytics. Tools like Hadoop enable
businesses to evaluate data quickly, allowing them to make
swift decisions based on their findings.
• Understand the market conditions
Businesses can benefit from big data analysis by gaining a
better grasp of market conditions.
Analysing client purchase behaviour, for example, enables
businesses to discover the most
popular items and develop them appropriately. This allows
businesses to stay ahead of the
Importance of Big Data
• Social Media Listening’s
Companies can perform sentiment analysis using Big Data
tools. These enable them to get feedback about their company,
that is, who is saying what about the company. Companies can
use Big data tools to improve their online presence
• Using Big Data Analytics to Boost Customer Acquisition and
Retention.
Customers are a crucial asset that each company relies on.
Without a strong consumer base, no company can be
successful.However, even with a strong consumer base,
businesses cannot ignore market rivalry. It will be difficult for
businesses to succeed if they do not understand what their
consumers desire.It will be difficult for businesses to succeed if
they do not understand what their consumers desire. It will
result in a loss of
customers, which will have a negative impact on business
growth. Businesses may use big data analytics to detect
customer-related trends and patterns. Customer behaviour
analysis is the key to a successful business.
Importance of Big Data
• Using Big Data Analytics to Solve Advertisers Problem and
Offer Marketing Insights
All company activities are shaped by big data analytics. It
allows businesses to meet client
expectations. Big data analytics aids in the modification of a
company's product range. It guarantees that marketing
initiatives are effective.
• Big Data Analytics as a Driver of Innovations and Product
Development
Companies may use big data to innovate and revamp their
goods.
Sources of Big Data
Big data originates from various sources, and its diversity in
terms of volume, variety, and velocity makes it a valuable
resource for gaining insights and making informed decisions.
Here are common sources of big data:
•Social Media:
Platforms like Facebook, Twitter, Instagram, and LinkedIn
generate vast amounts of data through user interactions, posts,
comments, and multimedia content.
•Internet of Things (IoT):
Connected devices and sensors in IoT applications, such as
smart home devices, wearables, industrial sensors, and smart
cities, generate real-time data.
•Transactional Data:
• Data generated through financial transactions, e-commerce
activities, and online banking provide insights into customer
behavior, preferences, and purchasing patterns.
•Mobile Devices:
• Data from mobile devices, including location data, app usage,
and user interactions, contribute to big data analytics.
Sources of Big Data
• Websites and Web Logs:
Web servers, online platforms, and e-commerce websites
generate log files, clickstream data, and user interactions,
offering valuable information about user behavior.
• Machine and Sensor Data:
Machinery, industrial equipment, and scientific instruments
produce large volumes of sensor data, which is crucial for
monitoring and optimizing processes.
• Government and Public Data:
Public sector information, including census data, government
reports, and open data initiatives, contributes to big data
analytics for public policy and planning.
• Healthcare Data:
Electronic Health Records (EHRs), medical imaging data, patient
monitoring systems, and wearable health devices generate
substantial data for healthcare analytics.
• Genomic Data:
DNA sequencing and genomic research generate massive
datasets, contributing to advancements in personalized
medicine and genetic studies.
Sources of Big Data
•Weather and Environmental Data:
Meteorological stations, satellites, and environmental sensors
provide large datasets for weather forecasting, climate research,
and environmental monitoring.
•Search Engines:
Search queries and user behavior on search engines like Google
generate extensive data, which is valuable for understanding user
intent and improving search algorithms.
•Log Files and IT Infrastructure Data:
Server logs, network logs, and IT infrastructure data provide
insights into system performance, security incidents, and user
activities.
•Video and Image Data:
Surveillance cameras, video-sharing platforms, and image
repositories contribute to big data, especially with the growth of
multimedia content.
•Text and Documents:
Emails, customer support chats, legal documents, and social
media text contribute to textual data for natural language
processing and sentiment analysis.
Formats of Data
Data comes in various formats, and understanding these
formats is crucial for effective data management and analysis.
The three primary formats of data are structured, semi-
structured, and unstructured.
Structured Data:
Description: Well-organized and highly formatted data.
Characteristics:
• Organized into rows and columns.
• Conforms to a predefined schema or data model.
• Easily queryable using traditional databases.
Examples:
• Relational databases (MySQL, PostgreSQL).
• Spreadsheets (Excel).
• CSV files.
Formats of Data
Unstructured Data:
Description: Data with no predefined data model or
structure.
Characteristics:
• Not organized in a predefined manner.
• Lacks a clear schema, making it challenging to query using
traditional methods.
• Varied and often human-generated content.
Examples:
• Text documents (Word, PDF).
• Images (JPEG, PNG).
• Audio and video files.
• Social media posts.
• Emails and other free-form text.
Formats of Data
Textual Data:
Description: Data primarily consisting of text.
Characteristics:
• Unstructured nature with sentences, paragraphs, and characters.
• Suited for natural language processing (NLP) and text analytics.
Examples:
• Books, articles, and blog posts.
• Emails and chat transcripts.
Temporal Data:
Description: Data associated with time.
Characteristics:
• Timestamps or time-related information.
• Essential for time series analysis.
•Examples:
• Stock prices over time.
• Sensor data with timestamps.
Formats of Data
Textual Data:
Description: Data primarily consisting of text.
Characteristics:
• Unstructured nature with sentences, paragraphs, and characters.
• Suited for natural language processing (NLP) and text analytics.
Examples:
• Books, articles, and blog posts.
• Emails and chat transcripts.
Temporal Data:
Description: Data associated with time.
Characteristics:
• Timestamps or time-related information.
• Essential for time series analysis.
•Examples:
• Stock prices over time.
• Sensor data with timestamps.
Formats of Data
Geospatial Data:
Description: Data related to geographical locations.
Characteristics:
• Contains coordinates or spatial information.
• Used in geographic information systems (GIS).
• Examples:
• GPS data.
• Maps and satellite imagery.
Multimedia Data:
Description: Data consisting of a combination of text,
images, audio, and video.
Characteristics:
• Requires specialized processing for each data type.
• Used in content-rich applications.
• Examples:
• YouTube videos, Image galleries, Podcasts.
Formats of Data
Binary Data:
Description: Data represented in binary code.
Characteristics:
• Composed of 0s and 1s.
• Requires specific decoding methods for
interpretation.
Examples:
• Executable files.
• Binary-encoded images.
Understanding the different formats of data is
essential for designing appropriate storage and
processing solutions. In the context of big data,
the ability to handle and analyze diverse data
formats is crucial for extracting meaningful
insights
Formats of Data
Binary Data:
Description: Data represented in binary code.
Characteristics:
• Composed of 0s and 1s.
• Requires specific decoding methods for
interpretation.
Examples:
• Executable files.
• Binary-encoded images.
Understanding the different formats of data is
essential for designing appropriate storage and
processing solutions. In the context of big data,
the ability to handle and analyze diverse data
formats is crucial for extracting meaningful
insights
Formats of Data
Binary Data:
Description: Data represented in binary code.
Characteristics:
• Composed of 0s and 1s.
• Requires specific decoding methods for
interpretation.
Examples:
• Executable files.
• Binary-encoded images.
Understanding the different formats of data is
essential for designing appropriate storage and
processing solutions. In the context of big data,
the ability to handle and analyze diverse data
formats is crucial for extracting meaningful
insights
Applications of Big Data
Binary Data:
All of the data must be recorded and processed, which takes a
lot of expertise, resources, and time. Data may be creatively
and meaningfully used to provide business benefits. There are
three sorts of business applications, each with varying
degrees of revolutionary potential.
Applications of Big Data
Monitoring and tracking application
These are the first and most fundamental Big Data
applications. In practically all industries, they aid in
increasing corporate efficiency. The following are a few
examples of specialised applications:
• Public health monitoring
The US government is encouraging all healthcare
stakeholders to establish a national
platform for interoperability and data sharing standards.
This would enable secondary use
of health data, which would advance BIG DATA analytics
and personalized holistic
precision medicine. This would be a broad-based platform
like Google flu trends.
Applications of Big Data
Consumer Sentiment Monitoring
Social media has become more powerful than
advertising. Many good companies have moved a bulk of
their advertising budgets from traditional media into
social media.They have setup Big Data listening
platforms, where social media data streams (including
tweets, and Facebook posts and blog posts) are filtered
and analysed for certain keywords or sentiments, by
certain demographics and regions. Actionable
information from this
analysis is delivered to marketing professionals for
appropriate action, especially when the product is new to
the market.
Applications of Big Data
Asset Tracking
The US department of defence is encouraging the industry to devise a
tiny RFID chip that could prevent the counterfeiting of electronic parts
that end up in avionics or circuit board for other devices. Airplanes are
one of the heaviest users of sensors which track every aspect of the
performance of every part of the plane. The data can be displayed on
the dashboard as well as stored for later detailed analysis. Working with
communicating devices, these sensors can produce a torrent of
data.Theft by shoppers and employees is a major source of loss of
revenue for retailers. All valuable items in the store can be assigned
RFID tags, and the gates of the store can be equipped with RF readers.
This can help secure the products, and reduce leakage(theft) from the
store
Applications of Big Data
Supply chain monitoring
All containers on ships communicate their status and location
using RFID tags. Thus retailers and their suppliers can gain real-
time visibility to the inventory throughout the global supply
chain. Retailers can know exactly where the items are in the
warehouse, and so can bring them into the store at the right
time. This is particularly relevant for seasonal items that must
be sold on time, or else they will be sold at a discount.With
item-level. RFID tacks, retailers also gain full visibility of each
item and can serve their customers better.
Applications of Big Data
Flexible Auto Insurance
An auto insurance company can use the GPS data from cars to
calculate the risk of accidents based on travel patterns. The
automobile companies can use the car sensor data to track the
performance of a car. Safer drivers can be rewarded and the
errant drivers can be penalized.
Tools used in BIG DATA
Apache Hadoop
A large data framework is the Apache Hadoop software library.
It enables massive data sets to be processed across clusters of
computers in a distributed manner. It's one of the most
powerful big data technologies, with the ability to grow from a
single server to thousands of computers.
Features
• When utilising an HTTP proxy server, authentication is
improved.
• Hadoop Compatible Filesystem effort specification. Extended
characteristics for POSIX-style filesystems are supported.
• It has big data technologies and tools that offers robust
ecosystem that is well suited to meet the analytical needs of
developer.
• It brings Flexibility in Data Processing. It allows for faster data
Processing
Tools used in BIG DATA

HPCC is a big data tool developed by LexisNexis Risk Solution.


It delivers on a single platform, asingle architecture and a single
programming language for data processing.
Features
• It is one of the Highly efficient big data tools that accomplish
big data tasks with far less code.
• It is one of the big data processing tools which offers high
redundancy and availability.
• It can be used both for complex data processing on a Thor
cluster. Graphical IDE for simplifies development, testing and
debugging. It automatically optimizes code forparallel
processing
• Provide enhance scalability and performance. ECL code
compiles into optimized C++, and it can also extend using C++
libraries
Tools used in BIG DATA
Apache STORM
Storm is a free big data open source computation system. It is
one of the best big data tools which offers distributed real-time,
fault-tolerant processing system. With real-time computation
capabilities.
Features
• It is one of the best tool from big data tools list which is
benchmarked as processing one million 100 byte messages per
second per node
• It has big data technologies and tools that uses parallel
calculations that run across a cluster of machines.
• It will automatically restart in case a node die. The worker will
be restarted on another node. Storm guarantees that each unit
of data will be processed at least once or exactly once
• Once deployed Storm is surely easiest tool for Bigdata
analysis
Tools used in BIG DATA

Qubole
Data is Autonomous Big data management platform. It is a big
data open-source tool which is self-managed, self-optimizing
and allows the data team to focus on business outcomes.
Features
• Features:
• Single Platform for every use case
• It is an Open-source big data software having Engines,
optimized for the Cloud.
• Comprehensive Security, Governance, and Compliance
• Provides actionable Alerts, Insights, and Recommendations to
optimize reliability, performance, and costs.
• Automatically enacts policies to avoid performing repetiti
Questions!
Any Question ?

You might also like