Csa Overview
Csa Overview
https://docs.cloudera.com/
Legal Notice
© Cloudera Inc. 2024. All rights reserved.
The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property
rights. No license under copyright or any other intellectual property right is granted herein.
Unless otherwise noted, scripts and sample code are licensed under the Apache License, Version 2.0.
Copyright information for Cloudera software may be found within the documentation accompanying each component in a
particular release.
Cloudera software includes software from various open source or other third party projects, and may be released under the
Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms.
Other software included may be released under the terms of alternative open source licenses. Please review the license and
notice files accompanying the software for additional licensing information.
Please visit the Cloudera software product page for more information on Cloudera software. For more information on
Cloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss your
specific needs.
Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility nor
liability arising from the use of products, except as expressly agreed to in writing by Cloudera.
Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered
trademarks in the United States and other countries. All other trademarks are the property of their respective owners.
Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA,
CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OF
ANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY OR
RELATED SUPPORT PROVIDED IN CONNECTION THEREWITH. CLOUDERA DOES NOT WARRANT THAT
CLOUDERA PRODUCTS NOR SOFTWARE WILL OPERATE UNINTERRUPTED NOR THAT IT WILL BE
FREE FROM DEFECTS NOR ERRORS, THAT IT WILL PROTECT YOUR DATA FROM LOSS, CORRUPTION
NOR UNAVAILABILITY, NOR THAT IT WILL MEET ALL OF CUSTOMER’S BUSINESS REQUIREMENTS.
WITHOUT LIMITING THE FOREGOING, AND TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE
LAW, CLOUDERA EXPRESSLY DISCLAIMS ANY AND ALL IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, QUALITY, NON-INFRINGEMENT, TITLE, AND
FITNESS FOR A PARTICULAR PURPOSE AND ANY REPRESENTATION, WARRANTY, OR COVENANT BASED
ON COURSE OF DEALING OR USAGE IN TRADE.
Cloudera Streaming Analytics | Contents | iii
Contents
4
Cloudera Streaming Analytics What is Apache Flink?
5
Cloudera Streaming Analytics Core features of Flink
DataStream API
The DataStream API is used as the core API to develop Flink streaming applications using Java or
Scala programming languages. The DataStream API provides the core building blocks of the Flink
streaming application: the datastream and the transformation on it. In a Flink program, the incoming
data streams from a source are transformed by a defined operation which results in one or more
output streams to the sink.
6
Cloudera Streaming Analytics Core features of Flink
Operators
Operators transform one or more DataStreams into a new DataStream. Programs can combine
multiple transformations into sophisticated data flow topologies. Other than the standard
transformations like map, filter, aggregation, you can also create windows and join windows within
the Flink operators. On a dataflow one or more operations can be defined which can be processed in
parallel and independently to each other. With windowing functions, different computations can be
applied to different streams in the defined time window to further maintain the processing of events.
The following image illustrates the parallel structure of dataflows.
7
Cloudera Streaming Analytics Core features of Flink
you can create your Flink application either based on the time when the event is created or when it
is processed by the operator.
With only the event time, it is not clear when the events are processed in the application. To track
the time for an event time based application, watermark can be used.
8
Cloudera Streaming Analytics Core features of Flink
Related Information
Flink application structure
Configuring RocksDB state backend
Enabling checkpoints for Flink applications
Enabling savepoints for Flink applications