0% found this document useful (0 votes)
160 views

Introduction To Big Data: Types of Digital Data, History of Big Data Innovation

The document provides an introduction to big data, including types of digital data, the history of big data innovation, big data architecture and characteristics, and the 5 Vs of big data. It discusses unstructured, semi-structured, and structured data. It describes how big data has evolved with advances in technology and the growing digitization of information. The architecture of big data systems includes data sources, storage, processing, analytics, and orchestration. Characteristics include volume, variety, velocity, veracity, and value.

Uploaded by

Uma Tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

Introduction To Big Data: Types of Digital Data, History of Big Data Innovation

The document provides an introduction to big data, including types of digital data, the history of big data innovation, big data architecture and characteristics, and the 5 Vs of big data. It discusses unstructured, semi-structured, and structured data. It describes how big data has evolved with advances in technology and the growing digitization of information. The architecture of big data systems includes data sources, storage, processing, analytics, and orchestration. Characteristics include volume, variety, velocity, veracity, and value.

Uploaded by

Uma Tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit Topic

Introduction to Big Data: Types of digital data, history of Big Data innovation,
introduction to Big Data platform, drivers for Big Data, Big Data architecture and
characteristics, 5 Vs of Big Data, Big Data technology components, Big Data importance
I
and applications, Big Data features – security, compliance, auditing and protection, Big
Data privacy and ethics, Big Data Analytics, Challenges of conventional systems,
intelligent data analysis, nature of data, analytic processes and tools, analysis vs reporting,
modern data analytic tools.

Types of digital data

DIGITAL DATA
Digital data is information stored on a computer system as a series of 0’s and 1’s in a
binary language. Digital data jumps from one value to the next in a step by step sequence.
Example: Whenever we send an email, read a social media post, or take pictures with our digital camera, we
are working with digital data.
Digital data can be classified into three forms:
a. Unstructured Data: The data which does not conform to a data model or is not in a form that can be used
easily by a computer program is categorized as unstructured data. About 80—90% data of an organization is
in this format.
Example: Memos, chat rooms, PowerPoint presentations, images, videos, letters, researches, white papers,
the body of an email, etc.
b. Semi-Structured Data: The data which does not conform to a data model but has some structure is
categorized as semi-structured data. However, it is not in a form that can be used easily by a computer
program.
Example : Emails, XML, markup languages like HTML, etc. Metadata for this data is available but is not
sufficient.
c. Structured Data: The data which is in an organized form (ie. in rows and columns) and can be easily used
by a computer program is categorized as semi-structured data. Relationships exist between entities of data,
such as classes and their objects.
Example: Data stored in databases.

HISTORY OF BIG DATA


The 21 st century is characterized by the rapid advancement in the field of information technology.
IT has become an integral part of daily life as well as various other industries like: health, education,
entertainment, science and technology, genetics, or business operations and these industries generate a lot of
data, this can be called Big Data.
Big Data consists of large datasets that cannot be managed efficiently by the common database management
systems.
These datasets range from terabytes to exabytes.
Mobile phones, credit cards, Radio Frequency Identification (RFID) devices, and social networking platforms
create huge amounts of data that may reside unutilized at unknown servers for many years.
And with the evolution of Big Data, this data can be accessed and analyzed on a regular basis to generate
useful information.
“Big Data” is a relative term depending on who is discussing it. For Example, Big Data to Amazon or Google is
very different from Big Data to a medium-sized insurance organization.
Introduction to Big Data platform
A big data platform is a type of IT solution that combines the features and capabilities of several big data
applications and utilities within a single solution, this is then used further for managing as well as analyzing Big
Data.
It focuses on providing its users with efficient analytics tools for massive datasets. 
The users of such platforms can custom build applications according to their use case like to calculate
customer loyalty (E-Commerce user case), and so on.
Goal: The main goal of a Big Data Platform is to achieve: Scalability, Availability, Performance, and Security.
Example: Some of the most commonly used Big Data Platforms are :

 Hadoop Delta Lake Migration Platform


 Data Catalog Platform
 Data Ingestion Platform
 IoT Analytics Platform

 Drivers for Big Data


Big Data has quickly risen to become one of the most desired topics in the industry.
The main business drivers for such rising demand for Big Data Analytics are :
1. The digitization of society
2. The drop in technology costs
3. Connectivity through cloud computing
4. Increased knowledge about data science
5. Social media applications
6. The rise of Internet-of-Things(IoT)
Example: A number of companies that have Big Data at the core of their strategy like :
Apple, Amazon, Facebook and Netflix have become very successful at the beginning of the 21st century.

Big Data Architecture :


Big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or
complex for traditional database systems.
The big data architectures include the following components:

Data sources: All big data solutions start with one or more data sources. 
Example,
o Application data stores, such as relational databases.
o Static files produced by applications, such as web server log files.
o Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is stored in a distributed file store that can hold high
volumes of large files in various formats (also called data lake). 
Example,
Azure Data Lake Store or blob containers in Azure Storage.
Batch processing: Since the data sets are so large, therefore a big data solution must process data files
using long-running batch jobs to filter, aggregate, and prepare the data for analysis.
Real-time message ingestion: If a solution includes real-time sources, the architecture must include a way to
capture and store real-time messages for stream processing.
Stream processing: After capturing real-time messages, the solution must process them by filtering,
aggregating, and preparing the data for analysis. The processed stream data is then written to an output sink.
We can use open-source Apache streaming technologies like Storm and Spark Streaming for this.
Analytical data store: Many big data solutions prepare data for analysis and then serve the processed data
in a structured format that can be queried using analytical tools. Example: Azure Synapse Analytics provides a
managed service for large-scale, cloud-based data warehousing.
Analysis and reporting: The goal of most big data solutions is to provide insights into the data through
analysis and reporting. To empower users to analyze the data, the architecture may include a data modelling
layer. Analysis and reporting can also take the form of interactive data exploration by data scientists or data
analysts.
Orchestration: Most big data solutions consist of repeated data processing operations, that transform source
data, move data between multiple sources and sinks, load the processed data into an analytical data store, or
push the results straight to a report. To automate these workflows, we can use an orchestration technology
such as Azure Data Factory.
Big Data Characteristics :
Big data can be described by the following characteristics:

 Volume
 Variety
 Velocity

5 Vs of Big Data, Big Data technology components


5 Vs of Big Data :

1. Volume :
Big Data is a vast “volumes” of data generated from many sources daily, such as business processes,
machines, social media platforms, networks, human interactions, and so on.
Example: Facebook generates approximately a billion messages, 4.5 billion times the “Like” button is
recorded, and more than 350 million new posts are uploaded each day.
Big data technologies can handle large amounts of data.
2. Variety :
Big Data can be structured, unstructured, and semi-structured that are being collected from different sources.
Data were only collected from databases and sheets in the past, But these days the data will come in an array
of forms ie.- PDFs, Emails, audios, Social Media posts, photos, videos, etc.
3. Velocity :
Velocity refers to the speed with which data is generated in real-time.
Velocity plays an important role compared to others.
It contains the linking of incoming data sets speeds, rate of change, and activity bursts.
The primary aspect of Big Data is to provide demanding data rapidly.
Example of data that is generated with high velocity - Twitter messages or Facebook posts.
4. Veracity :
Veracity refers to the quality of the data that is being analyzed. 
It is the process of being able to handle and manage data efficiently.
Example: Facebook posts with hashtags.
5. Value :
Value is an essential characteristic of big data.
It is not the data that we process or store, it is valuable and reliable data that we
store, process and analyse.

Big Data Technology Components :

1. Ingestion :
The ingestion layer is the very first step of pulling in raw data.
It comes from internal sources, relational databases, non-relational databases, social media, emails, phone
calls etc.
There are two kinds of ingestions :
Batch, in which large groups of data are gathered and delivered together.
Streaming, which is a continuous flow of data. This is necessary for real-time data analytics.
2. Storage :
Storage is where the converted data is stored in a data lake or warehouse and eventually processed.
The data lake/warehouse is the most essential component of a big data ecosystem.
It needs to contain only thorough, relevant data to make insights as valuable as possible.
It must be efficient with as little redundancy as possible to allow for quicker processing.
3. Analysis :
In the analysis layer, data gets passed through several tools, shaping it into actionable insights.
There are four types of analytics on big data :

 Diagnostic: Explains why a problem is happening.


 Descriptive: Describes the current state of a business through historical data.
 Predictive: Projects future results based on historical data. 
 Prescriptive: Takes predictive analytics a step further by projecting best future efforts. 
4. Consumption :
The final big data component is presenting the information in a format digestible to the end-user.
This can be in the forms of tables, advanced visualizations and even single numbers if requested.
The most important thing in this layer is making sure the intent and meaning of the output is understandable.

Big Data importance and applications

Big Data Importance :


Big Data importance doesn’t revolve around the amount of data a company has but lies in the fact that how
the company utilizes the gathered data.
Every company uses its collected data in its own way. More effectively the company uses its data, more
rapidly it grows.

By analysing the big data pools effectively the companies can get answers to :
Cost Savings :
o Some tools of Big Data like  Hadoop can bring cost advantages to business when large amounts of data are
to be stored.
o These tools help in identifying more efficient ways of doing business.
Time Reductions :
o The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of data which
helps businesses analyzing data immediately.
o This helps us to make quick decisions based on the learnings.
Understand the market conditions :
o By analyzing big data we can get a better understanding of current market conditions.
o For example: By analyzing customers’ purchasing behaviours, a company can find out the products that are
sold the most and produce products according to this trend. By this, it can get ahead of its competitors.
Control online reputation :
o Big data tools can do sentiment analysis.
o Therefore, you can get feedback about who is saying what about your company.
o If you want to monitor and improve the online presence of your business, then big data tools can help in all
this.

Using Big Data Analytics to Boost Customer Acquisition(purchase) and Retention :


o The customer is the most important asset any business depends on. 
o No single business can claim success without first having to establish a solid customer base.
o If a business is slow to learn what customers are looking for, then it is very likely to deliver poor quality
products.
o The use of big data allows businesses to observe various customer-related patterns and trends.
Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights :
o Big data analytics can help change all business operations.
o Like the ability to match customer expectations, changing
company’s product line, etc.
o And ensuring that the marketing campaigns are powerful.
Big Data Applications :
In today’s world big data have several applications, some of them are listed below :
Tracking Customer Spending Habit, Shopping Behavior :
In big retails stores, the management team has to keep data of customer’s spending habits, shopping
behaviour, most liked product, which product is being searched/sold most, based on that data, the
production/collection rate of that product gets fixed.
Recommendation :
By tracking customer spending habits, shopping behaviour, big retail stores provide recommendations to the
customers.
Smart Traffic System :
Data about the condition of the traffic of different roads, collected through cameras, GPS devices placed in the
vehicle.
All such data are analyzed and jam-free or less jam way, less time taking ways are recommended.
One more profit is fuel consumption can be reduced.
Secure Air Traffic System :
At various places of flight, sensors are present.
These sensors capture data like the speed of flight, moisture, temperature, and other environmental
conditions.
Based on such data analysis, an environmental parameter within flight is set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the machine can operate
flawlessly and when it can be replaced/repaired.
Auto Driving Car :
In the various spots of the car camera, a sensor is placed that gathers data like the size of the surrounding
car, obstacle, distance from those, etc.
These data are being analyzed, then various calculations are carried out.
These calculations help to take action automatically.
Virtual Personal Assistant Tool :
Big data analysis helps virtual personal assistant tools like Siri, Cortana and Google Assistant to provide the
answer to the various questions asked by users.
This tool tracks the location of the user, their local time, season, other data related to questions asked, etc. 
Analyzing all such data provides an answer.
Example: Suppose one user asks “Do I need to take Umbrella?”The tool collects data like location of the user,
season and weather condition at that location, then analyzes these data to conclude if there is a chance of
raining, then provides the answer.
IoT :
Manufacturing companies install IOT sensors into machines to collect operational data.
Analyzing such data, it can be predicted how long a machine will work without any problem when it requires
repair.
Thus, the cost to replace the whole machine can be saved.
Education Sector Energy Sector :
Online educational courses conducting organization utilize big data to search candidates interested in that
course.
If someone searches for a YouTube tutorial video on a subject, then an online or offline course provider
organization on that subject sends an ad online to that person about their course.
Media and Entertainment Sector :
Media and entertainment service providing company like Netflix,
Amazon Prime, Spotify do analysis on data collected from their users.
Data like what type of video, music users are watching, listening to most,
how long users are spending on site, etc are collected and analyzed to set
the next business strategy.

Big Data features –security, compliance, auditing and protection

BIG DATA SECURITY :

Big data security is the collective term for all the measures and tools used to guard both the data and analytics
processes from attacks, theft, or other malicious activities that could harm or negatively affect them.

For companies that operate on the cloud, big data security challenges are multi-faceted.

When customers give their personal information to companies, they trust them with personal data which can
be used against them if it falls into the wrong hands.

BIG DATA COMPLIANCE :

Data compliance is the practice of ensuring that sensitive data is organized and managed in such a way as to
enable organizations to meet enterprise business rules along with legal and governmental regulations.

Organizations that don’t implement these regulations can be fined up to tens of millions of dollars and even
receive a 20-year penalty.

BIG DATA AUDITING :

Auditors can use big data to expand the scope of their projects and draw comparisons over larger populations
of data.

Big data also helps financial auditors to streamline the reporting process and detect fraud.

These professionals can identify business risks in time and conduct more relevant and accurate audits.

BIG DATA PROTECTION :

Big data security is the collective term for all the measures and tools used to guard both the data and analytics
processes from attacks, theft, or other malicious activities that could harm or negatively affect them.

That’s why data privacy is there to prot ect those customers but also companies and their employees
from security breaches.

When customers give their personal information to companies, they trust them with personal
data which can be used against them if it falls into the wrong hands.

Data protection is also important as organizations that don’t implement these regulations can be
fined up to tens of millions of dollars and even receive a 20-year penalty.

Big Data privacy and ethics

Most data is collected through surveys, interviews, or observation.


When customers give their personal information to companies, they trust them with personal
data which can be used against them if it falls into the wrong hands.
That’s why data privacy is there to protect those customers but also companies and their employees
from security breaches.

One of the main reasons why companies comply with data privacy regulations is to avoid fines.

Organizations that don’t implement these regulations can be fined up to tens of millions of
dollars and even receive a 20-year penalty.

Reasons, why we need to take data privacy seriously, are :

Data breaches could hurt your business.

Protecting your customers’ privacy

Maintaining and improving brand value

It gives you a competitive advantage

It supports the code of ethics

Big Data Analytics:

Big data analytics is a complex process of examining big data to uncover information, such as - hidden
patterns, correlations, market trends and customer preferences.
This can help organizations make informed business decisions.
Data Analytics technologies and techniques give organizations a way to analyze data sets and gather new
information.
Big Data Analytics enables enterprises to analyze their data in full context quickly and some also offer real-
time analysis.
Importance of Big Data Analytics :
Organizations use big data analytics systems and software to make data-driven decisions that can improve
business-related outcomes.
The benefits include more effective marketing, new revenue opportunities, customer personalization and
improved operational efficiency.
With an effective strategy, these benefits can provide competitive advantages over
rivals.
Big Data Analytics tools also help businesses save time and money and aid in gaining insights to inform data-
driven decisions.
Big Data Analytics enables enterprises to narrow their Big Data to the most relevant information and analyze it
to inform critical business decisions. 

Challenges of conventional systems

 Big data is the storage and analysis of large data sets.


 These are complex data sets that can be both structured or unstructured.
 They are so large that it is not possible to work on them with traditional analytical tools.
 One of the major challenges of conventional systems was the uncertainty of the Data Management
Landscape.
 Big data is continuously expanding, there are new companies and technologies that are being
developed every day.
 A big challenge for companies is to find out which technology works bests for them without the
introduction of new risks and problems.
 These days, organizations are realising the value they get out of big data analytics and hence they
are deploying big data tools and processes to bring more efficiency in their work environment. 

Intelligent data analysis, nature of data


Intelligent Data Analysis (IDA) is one of the most important approaches in the field of data mining.
Based on the basic principles of IDA and the features of datasets that IDA handles, the development of IDA is
briefly summarized from three aspects :

 Algorithm principle
 The scale
 Type of the dataset
Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and information.
Intelligent data analysis discloses hidden facts that are not known previously and provide potentially important
information or facts from large quantities of data.
It also helps in making a decision.
Based on machine learning, artificial intelligence, recognition of pattern, and records and visualization
technology, IDA helps to obtain useful information, necessary data and interesting models from a lot of data
available online in order to make the right choices.
IDA includes three stages:
(1) Preparation of data
(2) Data mining
(3) Data validation and Explanation

Analytic processes and tools


Big Data Analytics is the process of collecting large chunks of structured/unstructured data, segregating and
analyzing it and discovering the patterns and other useful business insights from it.
These days, organizations are realising the value they get out of big data analytics and hence they are
deploying big data tools and processes to bring more efficiency in their work environment. 
Many big data tools and processes are being utilised by companies these days in the processes of discovering
insights and supporting decision making.
Big data processing is a set of techniques or programming models to access large- scale data to extract useful
information for supporting and providing decisions.
Below is the list of some of the data analytics tools used most in the industry :

 R Programming (Leading Analytics Tool in the industry)


 Python
 Excel
 SAS
 Apache Spark
 Splunk
 RapidMiner
 Tableau Public
 KNime

Analysis vs reporting
Reporting :

 Once data is collected, it will be organized using tools such as graphs and tables.
 The process of organizing this data is called reporting.
 Reporting translates raw data into information.
 Reporting helps companies to monitor their online business and be alerted when data falls outside of
expected ranges.
 Good reporting should raise questions about the business from its end users. 
Analysis :

 Analytics is the process of taking the organized data and analyzing it.
 This helps users to gain valuable insights on how businesses can improve their performance.
 Analysis transforms data and information into insights.
 The goal of the analysis is to answer questions by interpreting the data at a deeper level and
providing actionable recommendations.
Conclusion :

 Reporting shows us “what is happening”.


 The analysis focuses on explaining “why it is happening”  and “what we can do about it”.

Modern data analytic tools

 These days, organizations are realising the value they get out of big data analytics and hence they
are deploying big data tools and processes to bring more efficiency to their work environment. 
 Many big data tools and processes are being utilised by companies these days in the processes of
discovering insights and supporting decision making.
 Data Analytics tools are types of application software that retrieve data from one or more systems
and combine it in a repository, such as a data warehouse, to be reviewed and analysed.
 Most organizations use more than one analytics tool including spreadsheets with statistical functions,
statistical software packages, data mining tools, and predictive modelling tools.
 Together, these Data Analytics Tools give the organization a complete overview of the company to
provide key insights and understanding of the market/business so smarter decisions may be made.
 Data analytics tools not only report the results of the data but also explain why the results occurred
to help identify weaknesses, fix potential problem areas, alert decision-makers to unforeseen events
and even forecast future results based on decisions the company might make.
 Below is the list some of data analytics tools :
 R Programming (Leading Analytics Tool in the industry)
 Python
 Excel
 SAS
 Apache Spark
 Splunk
 RapidMiner
 Tableau Public
 KNime

You might also like