BDA Unit 1 Notes
BDA Unit 1 Notes
Data?
According to Gartner, the definition of Big Data
–
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to complex
and large data sets that have to be processed and analyzed to uncover valuable information that
can benefit businesses and organizations.
However, there are certain basic tenets of Big Data that will make it even simpler to answer
what is Big Data:
It refers to a massive amount of data that keeps on growing exponentially with time.
It is so voluminous that it cannot be processed or analyzed using conventional
data processing techniques.
It includes data mining, data storage, data analysis, data sharing, and data
visualization.
The term is an all-comprehensive one including data, data frameworks, along with the
tools and techniques used to process and analyze the data.
Although the concept of big data itself is relatively new, the origins of large data sets go back
to the 1960s and '70s when the world of data was just getting started with the first data centers
and the development of the relational database.
Around 2005, people began to realize just how much data users generated through Facebook,
YouTube, and other online services. Hadoop (an open-source framework created specifically to
store and analyze big data sets) was developed that same year. NoSQL also began to
gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was
essential for the growth of big data because they make big data easier to work with and cheaper
to store. In the years since then, the volume of big data has skyrocketed. Users are still
generating huge amounts of data—but it’s not just humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the
internet, gathering data on customer usage patterns and product performance. The emergence of
machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has
expanded big data possibilities even further. The cloud offers truly elastic scalability, where
developers can simply spin up ad hoc clusters to test a subset of data.
Big data makes it possible for you to gain more complete answers because you have
more information.
More complete answers mean more confidence in the data—which means a
completely different approach to tackling problems.
Types of Big
Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
a)
Structured
Structured is one of the types of big data and By structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information
that can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms. For instance, the employee table in a company database will be structured as
the employee details, their job positions, their salaries, etc., will be present in an organized
manner.
b)
Unstructured
Unstructured data refers to the data that lacks any specific form or st ructure whatsoever.
This makes it very difficult and time-consuming to process and analyze unstructured data.
Email is an example of unstructured data. Structured and unstructured are two important types
of big data.
c) Semi-
structured
Semi structured is the third type of big data. Semi-structured data pertains to the data
containing both the formats mentioned above, that is, structured and unstructured data. To be
precise, it refers to the data that although has not been classified under a particular reposit
ory (database), yet contains vital information or tags that segregate individual elements within
the data. Thus we come to the end of types of data.
Characteristics of Big
Data
Back in 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data – Variety, Velocity,
and Volume. Let’s discuss the characteristics of big
data. These characteristics, isolated, are enough to know what big data is. Let’s look at them in
depth: a) Variety
Variety of Big Data refers to structured, unstructured, and semi-structured data that is
gathered
BIG DATA ANALYTICS 3
from multiple sources. While in the past, data could only be collected from spreadsheets and
databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios,
SM posts, and so much more. Variety is one of the important characteristics of big data.
1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics
can bring cost advantages to business when large amounts of data are to be stored and
these tools also help in identifying more efficient ways of doing business.
2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics
can easily identify new sources of data which helps businesses analyzing data
immediately and make quick decisions based on the learning.
3. Understand the market conditions: By analyzing big data you can get a better
understanding of current market conditions. For example, by analyzing
customers’ purchasing behaviors, a company can find out the products that are sold
the most and produce products according to this trend. By this, it can get ahead of its
competitors.
4. Control online reputation: Big data tools can do sentiment analysis. Therefore,
you can get feedback about who is saying what about your company. If you want to
monitor and improve the online presence of your business, then, big data tools can
help in all this.
5. Using Big Data Analytics to Boost Customer Acquisition and Retention
The customer is the most important asset any business depends on. There is no single
business that can claim success without first having to establish a solid customer base.
However, even with a customer base, a business cannot afford to disregard the high
competition it faces. If a business is slow to learn what customers are looking for,
then it is very easy to begin offering poor quality products. In the end, loss of
clientele will result, and this creates an adverse overall effect on business success. The
use of big data allows businesses to observe various customer related patterns and
trends. Observing customer behavior is important to trigger loyalty.
6. Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing
Insights
BIG DATA ANALYTICS 5
Big data analytics can help change all business operations. This includes
the ability to match customer expectation, changing company’s product
line and of course ensuring that the marketing campaigns are powerful.
7. Big Data Analytics As a Driver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate and