0% found this document useful (0 votes)
12 views

10 - (Module-6) Data Generation, Data Gathering-07-03-2023

The document discusses different aspects of data analytics including data generation, gathering, preprocessing, analysis, and application of analytics. It covers sources of big data, collection methods, types of data management, and existing analytics systems for real-time, offline, memory-level, BI, and massive analytics.

Uploaded by

Shubham Kodilkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

10 - (Module-6) Data Generation, Data Gathering-07-03-2023

The document discusses different aspects of data analytics including data generation, gathering, preprocessing, analysis, and application of analytics. It covers sources of big data, collection methods, types of data management, and existing analytics systems for real-time, offline, memory-level, BI, and massive analytics.

Uploaded by

Shubham Kodilkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Module-6

✓ Data generation,
✓ Data gathering,
✓ Data Pre-processing,
✓ Data analyzation,
✓ Application of analytics,
✓ Vertical-specific algorithms,
✓ Exploratory Data Analysis.
Data Generation
✓ Data generation is the beginning of big data.
✓ Some current sources of big data, such as trading data, mobile data, user
behavior, sensing data, Internet data, and other sources that are usually ignored.
✓ For example, nowadays Internet data has become a major source of big data
where huge amounts of data in terms of searching entries, chatting records, and
microblog messages are produced every day.
✓ Such data are closely related to people's daily lives, and may contain users’
behavior.
✓ For individuals, the data seems valueless; however, useful information
including user habits and hobbies can be determined and collected through the
exploitation of such accumulated big data.
✓ Big data even makes it possible to predict users’ behaviors and emotional
moods.
✓ Internet data is one of the most successful data sources utilized by
many Internet companies to generate user portraits and provide
personalized recommendation services.
✓ Other main sources of big data include the operation and trading
information in enterprises, logistic and sensing information in the
Internet of Things (IoT) networks, human interaction information,
position information in the Internet world, etc.
✓ In addition, digital telescopes also generate massive data ranging
from hundreds of GB to tens of TB or even larger, which is a
rising source of big data. (Astrophotography)
Some of the data collecting/gathering sources:
✓ Collecting new data from internet and other sources
✓ Using the previously collected and stored data
✓ Reusing someone else’s data
✓ Purchasing data

The data collection/gathering methods depend on the


following:
• The research problem under study
• The research design
• The information gathered about the variable
Data collection methods:
Primary data collection methods:
✓ The data collected for the first
time by the researcher, which is
original is the primary data.
✓ It is collected for unique
problem research on which
anyone has done no other related
examinations.
✓ The results of such studies are
precisely using the primary data
collected by the researcher.
✓ But it is time-consuming and
costly.
Secondary Data Collection Methods:
✓ It is the second-hand data
provided by someone else for
research work.
✓ It must have already been
passed through the statistical
analysis.
✓ The difference between the
primary and secondary data is
that the results of the
secondary will not be so
authentic as primary and
comparatively it is less
expensive and readily
available than the primary
data.
Different types of data management as per its collection methods:
Quantitative information:

✓ It is the numerical information obtained


from research methods like surveys of
populations and repeated experimental
procedures.
✓ During their recording, it is pertinent to
include detailed information like dates,
place of collection, units of
measurements, and its methods.
Qualitative Information:

✓ Non numerical information which is


collected in the form of video and
audio is the qualitative information.
✓ It could be transcribed in the written
form later.
Why IoT Data Analytics?
1. Establishes a variety of smarter environments (smarter homes, hotels, hospitals, etc.)
2. Uncovers timely and actionable insights for machines and men
3. Enables the realization of smart objects, devices, networks and environments,
4. Leads to the production of pioneering and people-centric applications and services
5. Helps to come out with precise predictions and prescriptions,
6. Facilitates process excellence and people productivity
7. Guarantees preventive maintenance of infrastructures
8. Ensures the optimized utilization of distributed assets through monitoring,
measurement, and management for perfect inventory replenishment
9. Safeguards the safety and security of people and properties
10. Monitors complex environments to guarantee business performance, productivity
and resilience
Relationship between IoT and big data analytics
Overview of big data analytics methods.
IoT architecture and big data analytics.
Comparison of IoT big data analytics use cases.
Comparison of different analytics types and their levels.
EXISTING ANALYTICS SYSTEMS

✓ Real-time analytics is typically performed on data collected from sensors.


✓ In this situation, data change constantly, and rapid data analytics techniques are
required to obtain an analytical result within a short period.
✓ Consequently, two existing architectures have been proposed for real-time
analysis:
parallel processing clusters using traditional relational databases and memory-
based computing platforms .
✓ Greenplum and Hana are examples of real-time analytics architecture.
✓ Off-line analytics is used when a quick response is not required.
✓ For example, many Internet enterprises use Hadoop-based off-line analytics architecture
to reduce the cost of data format conversion.
✓ Such analytics improves data acquisition efficiency.
✓ SCRIBE , Kafka , TimeTunnel, and Chukwa are examples of architectures that conduct
off-line analytics and can satisfy the demands of data acquisition.
✓ Memory-level analytics is applied when the size of data is smaller than the memory of a
cluster.
✓ To date, the memory of clusters has reached terabyte (TB) level.
✓ Therefore, several internal database technologies are required to improve analytical
efficiency.
✓ Memory-level analytics is suitable for conducting real-time analysis. MongoDB is
an example of this architecture.
✓ BI analytics is adopted when the size of data is larger than the memory level, but in this
case, data may be imported to the BI analysis environment.
✓ BI analytic currently supports TB-level data.
✓ Moreover, BI can help discover strategic business opportunities from the flood of data.
✓ In addition, BI analytics allows easy interpretation of data volumes.
✓ Identifying new opportunities and implementing an effective strategy provide competitive
market advantage and long-term stability.
✓ Massive analytics is applied when the size of data is greater than the entire capacity of the
BI analysis product and traditional databases.
✓ Massive analytics uses the Hadoop distributed file system for data storage and
map/reduce for data analysis.
✓ Massive analytics helps create the business foundation and increases market
competitiveness by extracting meaningful values from data.
✓ Moreover, massive analytics obtains accurate data that leverage the risks involved in
making any business decision.

You might also like