Unit 1 Big Data Tutorial
Unit 1 Big Data Tutorial
Topics to be covered...
Evolution of Technology
What is Big Data?
Types of Big Data?
Big Data Examples & Use Cases
Big Data architecture
When to use this architecture
5Vs of Big Data
Big Data technology
Big Data importance
Big Data applications
Big Data Analytics
Need for Big Data Analytics
What is Big Data Analytics
Types of Big Data Analytics
Happy Ending!
Engineering in One Video (EIOV) Watch video on
Structured
Any data that can be stored,
accessed and processed in
the form of fixed format is
termed as a 'structured' data.
Table
Engineering in One Video (EIOV) Watch video on
Unstructured
Any data with unknown form
or the structure is classified as
unstructured data.
Engineering in One Video (EIOV) Watch video on
Semi-structured
Semi-structured data is information that
does not reside in a relational database
or any other data table, but nonetheless
has some organizational properties to
make it easier to analyze, such as
semantic tags.
Engineering in One Video (EIOV) Watch video on
Engineering in One Video (EIOV) Watch video on
Batch
Data Storage
Processing
Analytics
Data Analytical
and
Sources Data Store
Reporting
Real-Time Message Stream
Ingestion Processing
Orchestration
Engineering in One Video (EIOV) Watch video on
Orchestration
Data sources: All big data solutions start with one or more
data sources.
Examples include:
-> Application data stores, such as relational databases.
-> Static files produced by applications, such as web server log files.
-> Real-time data sources, such as IoT devices.
Engineering in One Video (EIOV) Watch video on
Orchestration
Orchestration
Batch processing: Because the data sets are so large, often a big data
solution must process data files using long-running batch jobs to
filter, aggregate, and otherwise prepare the data for analysis. Usually
these jobs involve reading source files, processing them and writing
the output to new files. Options include running U-SQL jobs in Azure
Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in
an HDInsight Hadoop cluster, or using Java, Scala, or Python
programs in an HDInsight Spark cluster.
Engineering in One Video (EIOV) Watch video on
Orchestration
Orchestration
Analytical data store: Many big data solutions prepare data for
analysis and then serve the processed data in a structured format
that can be queried using analytical tools. The analytical data store
used to serve these queries can be a Kimball-style relational data
warehouse, as seen in most traditional business intelligence (BI)
solutions.
Engineering in One Video (EIOV) Watch video on
Orchestration
Orchestration