Data Ingestion Use Cases: Moving Big Data Into Hadoop

This document discusses common data ingestion use cases: 1) Moving large amounts of data from sources like IoT and social media into Hadoop for analysis, 2) Streaming data from databases into Elasticsearch for search indexing, 3) Processing logs from microservices in centralized servers using ELK stack for analytics and monitoring, and 4) Using stream processing engines like Kafka and Storm to ingest and analyze real-time data for applications like sports scoring.

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Data Ingestion Use Cases: Moving Big Data Into Hadoop

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Ingestion Use Cases

In this lesson, we will discuss some common data ingestion use cases in the industry.

WE'LL COVER THE FOLLOWING

• Moving Big Data Into Hadoop

• Streaming Data from Databases to Elasticsearch Server
• Log Processing
• Stream Processing Engines for Real-Time Events

This is the part where I talk about some of the data streaming use cases
commonly required in the industry.

Moving Big Data Into Hadoop #

This is the most popular use case of data ingestion. As discussed before, Big
Data from IoT devices, social apps & other sources, streams through data
pipelines, moves into the most popular distributed data processing framework
Hadoop for analysis & stuff.

Streaming Data from Databases to Elasticsearch Server #

Elastic search is an open-source framework for implementing search in web
applications. It is a defacto search framework used in the industry simply
because of its advanced features, & it being open-source. These features
enable businesses to write their own custom solutions when they need them.

In the past, with a few of my friends, I wrote a product search software as a

service using Java, Spring Boot & Elastic Search. Speaking of its design, we
would stream & index quite a large amount of product data from the legacy
storage solutions to the Elastic search server in order to make the products
come up in the search results.
All the data intended to show up in the search was replicated from the main
storage to the Elastic search storage. Also, as the new data was persisted in the

main storage it was asynchronously rivered to the Elastic server in real-time

for indexing.

Log Processing #
If your project isn’t a hobby project, chances are it’s running on a cluster.
When we talk about running a large-scale service, monolithic systems are a
thing of the past. With so many microservices running concurrently. There is
a massive number of logs which is generated over a period of time. And logs
are the only way to move back in time, track errors & study the behaviour of
the system.

So, to study the behaviour of the system holistically, we have to stream all the
logs to a central place. Ingest logs to a central server to run analytics on it with
the help of solutions like ELK Elastic LogStash Kibana stack etc.

Stream Processing Engines for Real-Time Events #

Real-time streaming & data processing is the core component in systems
handling LIVE information such as sports. It’s imperative that the
architectural setup in place is efficient enough to ingest data, analyse it, figure
out the behaviour in real-time & quickly push the updated information to the
fans. After all, the whole business depends on it.

Message queues like Kafka, Stream computation frameworks like Apache

Storm, Apache Nifi, Apache Spark, Samza, Kinesis etc are used to implement
the real-time large-scale data processing features in online applications.

This is a good read on the topic:

An Insight into Netflix’s real-time streaming platform

Alright!! time to have a look into data pipelines in the lesson up-next.

Comcolor FW Series Technical Manual Panel Test
100% (2)
Comcolor FW Series Technical Manual Panel Test
352 pages
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
From Everand
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
Robert Johnson
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
TOPIC 4 ER Modeling
No ratings yet
TOPIC 4 ER Modeling
37 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
Reference Guide To Stream Processing
No ratings yet
Reference Guide To Stream Processing
14 pages
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
SPA_L1_To_L7
No ratings yet
SPA_L1_To_L7
52 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
DE Skills and Tools Guide
No ratings yet
DE Skills and Tools Guide
20 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
32Study_of_Data_Ingestion_Tools
No ratings yet
32Study_of_Data_Ingestion_Tools
9 pages
[FREE PDF sample] (Ebook) Streaming Data Pipelines with Kafka (MEAP) by Stefan Sprenger ISBN 9781633437012, 1633437019 ebooks
100% (5)
[FREE PDF sample] (Ebook) Streaming Data Pipelines with Kafka (MEAP) by Stefan Sprenger ISBN 9781633437012, 1633437019 ebooks
81 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
44 pages
Real time data streaming new techniques
No ratings yet
Real time data streaming new techniques
5 pages
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Udacity Enterprise Syllabus Data Streaming nd029
No ratings yet
Udacity Enterprise Syllabus Data Streaming nd029
12 pages
Real-Time Big Data Analytics
From Everand
Real-Time Big Data Analytics
Shilpi
5/5 (1)
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
BDA_Unit_3
No ratings yet
BDA_Unit_3
18 pages
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
From Everand
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
Dmitry Anoshin
No ratings yet
lec19
No ratings yet
lec19
24 pages
Elasticsearch for Hadoop
From Everand
Elasticsearch for Hadoop
Shukla Vishal
No ratings yet
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
From Everand
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
Tim Warren
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Lec 01
No ratings yet
Lec 01
17 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Practical OneOps
From Everand
Practical OneOps
Nilesh Nimkar
No ratings yet
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
From Everand
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
Saurabh Chhajed
No ratings yet
Big Data Components
No ratings yet
Big Data Components
58 pages
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
lec19
No ratings yet
lec19
23 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Stream Processing and Analytics - Regular-HO
No ratings yet
Stream Processing and Analytics - Regular-HO
7 pages
IBM WebSphere eXtreme Scale 6
From Everand
IBM WebSphere eXtreme Scale 6
Anthony Chaves
No ratings yet
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
Big Data Components
No ratings yet
Big Data Components
31 pages
4 Sem
No ratings yet
4 Sem
1 page
Poka Yoke or Error Proofing
No ratings yet
Poka Yoke or Error Proofing
5 pages
JAVA Arrays and Strings
No ratings yet
JAVA Arrays and Strings
8 pages
109814588_Example_for_SIMATIC_CFL_and_MTP_Creator_DOC_V40_en
No ratings yet
109814588_Example_for_SIMATIC_CFL_and_MTP_Creator_DOC_V40_en
17 pages
TP2 Linux
No ratings yet
TP2 Linux
5 pages
Install and Configure Apache, PHP and MySQL
No ratings yet
Install and Configure Apache, PHP and MySQL
27 pages
Sample 3 Word Cookies For PC
No ratings yet
Sample 3 Word Cookies For PC
5 pages
SAP Single Sign On 3 0 Product Availabil
No ratings yet
SAP Single Sign On 3 0 Product Availabil
14 pages
Medanta Rate List PDF - PDF
No ratings yet
Medanta Rate List PDF - PDF
328 pages
Cisco Smart Net Total Care
No ratings yet
Cisco Smart Net Total Care
4 pages
Csce5560 - HTML and CSS
No ratings yet
Csce5560 - HTML and CSS
72 pages
BDA Answer Bank
No ratings yet
BDA Answer Bank
24 pages
SHOWCast Manual - v2021-10-26
No ratings yet
SHOWCast Manual - v2021-10-26
66 pages
Samsung Mobile Price in Nepal
No ratings yet
Samsung Mobile Price in Nepal
8 pages
EY CAFTA Case Championship 2024 - PPT - Instructions
No ratings yet
EY CAFTA Case Championship 2024 - PPT - Instructions
2 pages
Lalitha Venkata Sai Kiran_Penta_Resume 2
No ratings yet
Lalitha Venkata Sai Kiran_Penta_Resume 2
2 pages
Vector Webinar Security Manager
No ratings yet
Vector Webinar Security Manager
35 pages
J Image Tutorial
No ratings yet
J Image Tutorial
57 pages
Scan Vul Metaspoitable 3nwdqp
No ratings yet
Scan Vul Metaspoitable 3nwdqp
44 pages
BioTime+8.0 Four-fold+Leaflet+ (Overseas) 20201221
No ratings yet
BioTime+8.0 Four-fold+Leaflet+ (Overseas) 20201221
2 pages
MAC40 Plus UserManual MA00019A
No ratings yet
MAC40 Plus UserManual MA00019A
88 pages
TVL CSS11 Q2 M9
No ratings yet
TVL CSS11 Q2 M9
16 pages
(UXB Bonus) The UX Ladder - A Framework For Learning UX
No ratings yet
(UXB Bonus) The UX Ladder - A Framework For Learning UX
9 pages
Cmp325gregg Awsreinvent2017perftuningec2 Nflxhpedited 26nov2017 171129183032
No ratings yet
Cmp325gregg Awsreinvent2017perftuningec2 Nflxhpedited 26nov2017 171129183032
63 pages
1pg Book
No ratings yet
1pg Book
25 pages
QUIZ # 2 Emp. Tech
No ratings yet
QUIZ # 2 Emp. Tech
26 pages
Car Rental System Thesis by Yasir
No ratings yet
Car Rental System Thesis by Yasir
85 pages
2 Using JASP and Histograms
No ratings yet
2 Using JASP and Histograms
10 pages

Data Ingestion Use Cases: Moving Big Data Into Hadoop

Uploaded by

Data Ingestion Use Cases: Moving Big Data Into Hadoop

Uploaded by

Data Ingestion Use Cases

WE'LL COVER THE FOLLOWING

• Moving Big Data Into Hadoop

Moving Big Data Into Hadoop #

Streaming Data from Databases to Elasticsearch Server #

In the past, with a few of my friends, I wrote a product search software as a

main storage it was asynchronously rivered to the Elastic server in real-time

Stream Processing Engines for Real-Time Events #

Message queues like Kafka, Stream computation frameworks like Apache

This is a good read on the topic:

An Insight into Netflix’s real-time streaming platform

You might also like