Download full (Ebook) Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications by Fabian Hueske, Vasiliki Kalavri ISBN 9781491974292, 149197429X ebook all chapters
Download full (Ebook) Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications by Fabian Hueske, Vasiliki Kalavri ISBN 9781491974292, 149197429X ebook all chapters
com
OR CLICK HERE
DOWLOAD EBOOK
(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James ISBN
9781459699816, 9781743365571, 9781925268492, 1459699815, 1743365578, 1925268497
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
ebooknice.com
(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans Heikne, Sanna
Bodemyr ISBN 9789127456600, 9127456609
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312
ebooknice.com
(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success) by Peterson's
ISBN 9780768906677, 0768906679
https://ebooknice.com/product/sat-ii-success-math-1c-and-2c-2002-peterson-s-sat-
ii-success-1722018
ebooknice.com
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT Subject Test: Math
Levels 1 & 2) by Arco ISBN 9780768923049, 0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-arco-master-
the-sat-subject-test-math-levels-1-2-2326094
ebooknice.com
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study: the United
States, 1919-41 2nd Edition by Benjamin Harrison ISBN 9781398375147, 9781398375048,
1398375144, 1398375047
https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044
ebooknice.com
(Ebook) Stream Processing with Apache Spark: Mastering Structured Streaming and
Spark Streaming by Gerard Maas, Francois Garillot ISBN 9781491944240, 1491944242
https://ebooknice.com/product/stream-processing-with-apache-spark-mastering-
structured-streaming-and-spark-streaming-10998176
ebooknice.com
(Ebook) Streaming Databases: Unifying Batch and Stream Processing by Hubert Dulay &
Ralph Matthias Debusmann ISBN 9781098154837, 1098154835
https://ebooknice.com/product/streaming-databases-unifying-batch-and-stream-
processing-58912206
ebooknice.com
(Ebook) Modern Data Engineering with Apache Spark: A Hands-On Guide for Building
Mission-Critical Streaming Applications by Scott Haines ISBN 9781484274514,
1484274512
https://ebooknice.com/product/modern-data-engineering-with-apache-spark-a-hands-
on-guide-for-building-mission-critical-streaming-applications-42509132
ebooknice.com
(Ebook) Designing Event-Driven Systems: Concepts and Patterns for Streaming Services
with Apache Kafka by Ben Stopford ISBN 9781492038221, 1492038229
https://ebooknice.com/product/designing-event-driven-systems-concepts-and-
patterns-for-streaming-services-with-apache-kafka-7246304
ebooknice.com
Stream Processing with
Apache Flink
Fundamentals, Implementation, and Operation of
Streaming Applications
Constant width
Used for program listings, as well as within paragraphs to refer to
program elements such as variable or function names, databases,
data types, environment variables, statements, and keywords.
Also used for module and package names, and to show
commands or other text that should be typed literally by the user
and the output of commands.
TIP
This element signifies a tip or suggestion.
NOTE
This element signifies a general note.
WARNING
This element signifies a warning or caution.
NOTE
For almost 40 years, O’Reilly has provided technology and business
training, knowledge, and insight to help companies succeed.
How to Contact Us
Please address comments and questions concerning this book to the
publisher:
Sebastopol, CA 95472
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples,
and any additional information. You can access this page at
http://bit.ly/stream-proc.
To comment or ask technical questions about this book, send email
to [email protected].
For more information about our books, courses, conferences, and
news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Follow the authors on Twitter: @fhueske and @vkalavri
Acknowledgments
This book couldn’t have been possible without the help and support
of several amazing people. We would like to thank and acknowledge
some of them here.
This book summarizes knowledge obtained through years of design,
development, and testing performed by the Apache Flink community
at large. We are grateful to everyone who has contributed to Flink
through code, documentation, reviews, bug reports, feature
requests, mailing list discussions, trainings, conference talks, meetup
organization, and other activities.
Special thanks go to our fellow Flink committers: Alan Gates,
Aljoscha Krettek, Andra Lungu, ChengXiang Li, Chesnay Schepler,
Chiwan Park, Daniel Warneke, Dawid Wysakowicz, Gary Yao, Greg
Hogan, Gyula Fóra, Henry Saputra, Jamie Grier, Jark Wu, Jincheng
Sun, Konstantinos Kloudas, Kostas Tzoumas, Kurt Young, Márton
Balassi, Matthias J. Sax, Maximilian Michels, Nico Kruber, Paris
Carbone, Robert Metzger, Sebastian Schelter, Shaoxuan Wang, Shuyi
Chen, Stefan Richter, Stephan Ewen, Theodore Vasiloudis, Thomas
Weise, Till Rohrmann, Timo Walther, Tzu-Li (Gordon) Tai, Ufuk
Celebi, Xiaogang Shi, Xiaowei Jiang, Xingcan Cui. With this book, we
hope to reach developers, engineers, and streaming enthusiasts
around the world and grow the Flink community even larger.
We’ve also like to thank our technical reviewers who made countless
valuable suggestions helping us to improve the presentation of the
content. Thank you, Adam Kawa, Aljoscha Krettek, Kenneth
Knowles, Lea Giordano, Matthias J. Sax, Stephan Ewen, Ted
Malaska, and Tyler Akidau.
Finally, we say a big thank you to all the people at O’Reilly who
accompanied us on our two and a half year long journey and helped
us to push this project over the finish line. Thank you, Alicia Young,
Colleen Lobner, Christina Edwards, Katherine Tozer, Marie
Beaugureau, and Tim McGovern.
Chapter 1. Introduction to
Stateful Stream Processing
Transactional Processing
Companies use all kinds of applications for their day-to-day business
activities, such as enterprise resource planning (ERP) systems,
customer relationship management (CRM) software, and web-based
applications. These systems are typically designed with separate
tiers for data processing (the application itself) and data storage (a
transactional database system) as shown in Figure 1-1.
Analytical Processing
The data that is stored in the various transactional database systems
of a company can provide valuable insights about a company’s
business operations. For example, the data of an order processing
system can be analyzed to obtain sales growth over time, to identify
reasons for delayed shipments, or to predict future sales in order to
adjust the inventory. However, transactional data is often distributed
across several disconnected database systems and is more valuable
when it can be jointly analyzed. Moreover, the data often needs to
be transformed into a common format.
Instead of running analytical queries directly on the transactional
databases, the data is typically replicated to a data warehouse, a
dedicated datastore for analytical query workloads. In order to
populate a data warehouse, the data managed by the transactional
database systems needs to be copied to it. The process of copying
data to the data warehouse is called extract–transform–load (ETL).
An ETL process extracts data from a transactional database,
transforms it into a common representation that might include
validation, value normalization, encoding, deduplication, and schema
transformation, and finally loads it into the analytical database. ETL
processes can be quite complex and often require technically
sophisticated solutions to meet performance requirements. ETL
processes need to run periodically to keep the data in the data
warehouse synchronized.
Once the data has been imported into the data warehouse it can be
queried and analyzed. Typically, there are two classes of queries
executed on a data warehouse. The first type are periodic report
queries that compute business-relevant statistics such as revenue,
user growth, or production output. These metrics are assembled into
reports that help the management to assess the business’s overall
health. The second type are ad-hoc queries that aim to provide
answers to specific questions and support business-critical decisions,
for example a query to collect revenue numbers and spending on
radio commercials to evaluate the effectiveness of a marketing
campaign. Both kinds of queries are executed by a data warehouse
in a batch processing fashion, as shown in Figure 1-3.
Data Pipelines
Today’s IT architectures include many different datastores, such as
relational and special-purpose database systems, event logs,
distributed filesystems, in-memory caches, and search indexes. All of
these systems store data in different formats and data structures
that provide the best performance for their specific access pattern. It
is common that companies store the same data in multiple different
systems to improve the performance of data accesses. For example,
information for a product that is offered in a webshop can be stored
in a transactional database, a web cache, and a search index. Due to
this replication of data, the data stores must be kept in sync.
A traditional approach to synchronize data in different storage
systems is periodic ETL jobs. However, they do not meet the latency
requirements for many of today’s use cases. An alternative is to use
an event log to distribute updates. The updates are written to and
distributed by the event log. Consumers of the log incorporate the
updates into the affected data stores. Depending on the use case,
the transferred data may need to be normalized, enriched with
external data, or aggregated before it is ingested by the target data
store.
Ingesting, transforming, and inserting data with low latency is
another common use case for stateful stream processing
applications. This type of application is called a data pipeline. Data
pipelines must be able to process large amounts of data in a short
time. A stream processor that operates a data pipeline should also
feature many source and sink connectors to read data from and
write data to various storage systems. Again, Flink does all of this.
Streaming Analytics
ETL jobs periodically import data into a datastore and the data is
processed by ad-hoc or scheduled queries. This is batch processing
regardless of whether the architecture is based on a data warehouse
or components of the Hadoop ecosystem. While periodically loading
data into a data analysis system has been the state of the art for
many years, it adds considerable latency to the analytics pipeline.
Depending on the scheduling intervals it may take hours or days
until a data point is included in a report. To some extent, the latency
can be reduced by importing data into the datastore with a data
pipeline application. However, even with continuous ETL there will
always be a delay until an event is processed by a query. While this
kind of delay may have been acceptable in the past, applications
today must be able to collect data in real-time and immediately act
on it (e.g., by adjusting to changing conditions in a mobile game or
by personalizing user experiences for an online retailer).
Instead of waiting to be periodically triggered, a streaming analytics
application continuously ingests streams of events and updates its
result by incorporating the latest events with low latency. This is
similar to the maintenance techniques database systems use to
update materialized views. Typically, streaming applications store
their result in an external data store that supports efficient updates,
such as a database or key-value store. The live updated results of a
streaming analytics application can be used to power dashboard
applications as shown in Figure 1-6.
Figure 1-6. A streaming analytics application
A Bit of History
The first generation of distributed open source stream processors
(2011) focused on event processing with millisecond latencies and
provided guarantees against loss of events in the case of failures.
These systems had rather low-level APIs and did not provide built-in
support for accurate and consistent results of streaming applications
because the results depended on the timing and order of arriving
events. Moreover, even though events were not lost, they could be
processed more than once. In contrast to batch processors, the first
open source stream processors traded result accuracy for better
latency. The observation that data processing systems (at this point
in time) could either provide fast or accurate results led to the
design of the so-called lambda architecture, which is depicted in
Figure 1-7.
$ cd flink-1.7.1
$ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host xxx.
Starting taskexecutor daemon on host xxx.
3.
"Uskotko sallimukseen?"
"Kas niin", sanoi hän, "se ei kelpaa, että sinä lakkaamatta katselet
noita kasvoja ja mietit kaikkea joutavaa. Tule mukanani klubiin, tahi
jos et tahdo ihmisten seuraa, niin menkäämme pienelle kävelylle
Thiergarteniin ja sitten johonkin supé'lle. Sinä ajattelet asiaa paljon
rauhallisemmin saatuasi pari lasia vanhaa, hyvää sherryä."
4.
"Uniko? Sinä saat uskoa mitä haluat. Minä tiedän, mitä minä
tiedän. Ei, ystäväni, minulla on liiankin tarkkoja todistuksia siitä, että
kaikki oli totta; mutta siitä lisää myöhemmin. Mitä minä äsken
sanoinkaan? Niin, se oli totta, hän astui esiripun takaa elollisena,
ainoastaan hieman pienemmältä näytti hän minusta, niin että olisin
voinut vannoa, että hän todellakin oli se Kleopatra, jonka olin
kantanut pois komerosta ja joka nyt palasi takaisin puistosta, koska
hänelle yksinäisyydessä kävi aika pitkäksi ja hän myöskin toivoi
saavansa hieman kiduttaa minua. Sinä pudistat päätäsi
epäuskoisesti! Minä sanon sinulle, että jos itse olisit kantanut häntä
käsivarsillasi, kuten minä tein, niin olisit sinäkin tuntenut, että tuo
povi oli vain jäähmettynyt ja jäätynyt kuolonuneen, ja että veri sen
suonissa voi helposti uudelleen sulaa. Pidinhän häntä tykyttävällä
sydämelläni! Täytyihän hänen tuntea se, vaikka hän olikin kivettynyt!
Ja nyt oli hän herännyt näkemään, että minä olin kantanut hänet
pois kuuluakseni toiselle — oi, Jumalani, hän ei olisi koskaan
rakastanut minua, jos hän ei olisi tullut kysymään, olinko kokonaan
unhoittanut hänet!"
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com