0% found this document useful (0 votes)
20 views35 pages

Chapter 8 Flume - Massive Log Aggregation

Uploaded by

mazlout hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views35 pages

Chapter 8 Flume - Massive Log Aggregation

Uploaded by

mazlout hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 8 Flume - Massive Log Aggregation

Foreword

 Flume is an open-source, distributed, reliable, and highly available massive


log aggregation system. It supports custom data transmitters for collecting
data. It roughly processes data and writes data to data receivers.

1 Huawei Confidential
Objectives

 Upon completion of this course, you will be able to:


 Know what is Flume.
 Understand what Flume can do.
 Know the system architecture of Flume.
 Grasp key features of Flume.
 Master Flume applications.

2 Huawei Confidential
Contents

1. Overview and Architecture

2. Key Features

3. Applications

3 Huawei Confidential
What Is Flume?
 Flume is a stream log collection tool. It roughly processes data and writes data
to data receivers. Flume collects data from local files (spooling directory
source), real-time logs (taildir and exec), REST messages, Thrift, Avro, Syslog,
Kafka, and other data sources.

4 Huawei Confidential
What Can Flume Do?
 Collect log information from a fixed directory to a destination (HDFS, HBase, or
Kafka).
 Collect log information (taildir) to the destination in real time.
 Support cascading (connecting multiple Flumes) and data conflation.
 Support custom data collection tasks based on users.

5 Huawei Confidential
Flume Agent Architecture
 Infrastructure: Flume can directly collect data with an agent, which is mainly for data collection in a cluster.

Source Sink

Log Channel HDFS


Agent
Frozen

 Multi-agent architecture: Flume can connect multiple agents to collect raw data and store them in the final
storage system. This architecture is used to import data from outside the cluster to the cluster.

Source Sink Source


Sink

Channel Channel
Log Agent 1 Agent 2 HDFS

6 Huawei Confidential
Flume Multi-Agent Consolidation
Source Sink
Consolidation
Log Channel
Agent 1

Source Source
Sink Sink
Log Channel Channel HDFS
Agent 2
Agent 4

Source
Sink
Log Channel
Agent 3

 You can configure multiple level-1 agents and point them to the source of an agent using Flume. The source
of the level-2 agent consolidates the received events and sends the consolidated events into a single channel.
The events in the channel are consumed by a sink and then pushed to the destination.

7 Huawei Confidential
Flume Agent Principles

Interceptor Events

Channel

Events
Events Events Events
Channel Channel
Source Channel
Processor Selector

Events

Sink Sink
Sink
Runner Processor

8 Huawei Confidential
Basic Concepts - Source (1)
 A source receives events or generates events using a special mechanism, and
places events to one or more channels. There are two types of sources: driver-
based source and polling source.
 Driver-based source: External systems proactively send data to Flume, driving Flume
to receive data.
 Polling source: Flume periodically obtains data.
 A source must be associated with at least one channel.

10 Huawei Confidential
Basic Concepts - Source (2)

Source Type Description


Executes a command or script and uses the output of the
exec source
execution result as the data source.
Provides an Avro-based server, which is bound to a port and
avro source
waits for the data sent from the Avro client.
thrift source Same as Avro, but the transmission protocol is Thrift.
http source Supports sending data using the POST request of HTTP.
syslog source Collects syslogs.

spooling directory source Collects local static files.

jms source Obtains data from the message queue.

Kafka source Obtains data from Kafka.

11 Huawei Confidential
Basic Concepts - Channel (1)
 A channel is located between a source and a sink. A channel functions as a
queue and is used for caching temporary events. When a sink successfully sends
events to the channel of the next hop or the final destination, the events are
removed from the channel.
 The persistency of channels varies with the channel types:
 Memory channel: The memory in this channel type is not persistent.
 File channel: It is implemented based on write-ahead logs (WALs).
 JDBC channel: It is implemented based on the embedded database.
 Channels support transactions and provide weak sequence assurance. They can
connect to any number of sources and sinks.
12 Huawei Confidential
Basic Concepts - Channel (2)
 Memory channel: Messages are stored in the memory, which has high
throughput but does not ensure reliability. Data may be lost.
 File channel: Data is permanently stored. The configuration is complex. You
need to configure the data directory and checkpoint directory. A checkpoint
directory must be configured for each file channel.
 JDBC channel: A built-in Derby database makes events persistent with high
reliability. This channel type can replace the file channel with the persistence
feature.

13 Huawei Confidential
Basic Concepts - Sink
 A sink sends events to the next hop or the final destination and then removes
the events from the channel.
 A sink must work with a specific channel.

Sink Type Description


HDFS sink Writes data to HDFS.
Avro sink Sends data to Flume of the next hop using the Avro protocol.
Thrift sink Same as Avro, but the transmission protocol is Thrift.
File roll sink Saves data to the local file system.
HBase sink Writes data to HBase.
Kafka sink Writes data to Kafka.
MorphlineSolr sink Writes data to Solr.

14 Huawei Confidential
Contents

1. Overview and Architecture

2. Key Features

3. Applications

15 Huawei Confidential
Multi-level Cascading and Multi-channel Replication
 Flume supports cascading of multiple Flume agents and data replication within the cascading nodes.

Source

Log Channel

Sink
Agent 1

Channel Sink HDFS

Source

Channel Sink HBase


Agent 2

17 Huawei Confidential
Cascading Message Compression and Encryption
 Data transmission between cascaded Flume agents can be compressed and
encrypted, improving data transmission efficiency and security.

Flume

Compression Decompression HDFS/Hive/


and encryption and decryption HBase/Kafka
Application
Flume API

18 Huawei Confidential
Data Monitoring

Flume monitoring metrics


MRS Manager

Application Received data Flume Sent data volume


volume
Source Cached data volume Sink HDFS/Hive/HBase/Kafka
Flume API
Channel
Sent data volume

19 Huawei Confidential
Failover
 Data can be automatically switched to another channel for transmission when the next-
hop Flume agent is faulty or data receiving is abnormal during Flume data
transmission.

Source Sink

HDFS
Source
Channel
Sink
Log
Channel
Source Sink
Sink
HDFS
Channel

21 Huawei Confidential
Data Filter During Data Transmission
 During data transmission, Flume roughly filters and cleans data to delete unnecessary
data. If data to be filtered is complex, users need to develop filter plugins based on their
data characteristics. Flume supports third-party filter plugins.

Interceptor

Channel

Events Events Channel Channel


Source
Processor Selector Events

Channel

22 Huawei Confidential
Contents

1. Overview and Architecture

2. Key Features

3. Applications

23 Huawei Confidential
Flume Operation Example 1 (1)
 Description
 This example shows how Flume ingests logs generated by applications (such as e-
banking systems) in a cluster to HDFS.
 Prepare data.
 Create a log directory named mkdir /tmp/log_test on a node in the cluster.
 Use this directory as the monitoring directory.
 Download the Flume client.
 Log in to MRS Manager. On the Clusters page, choose Services > Flume > Download
Client.

24 Huawei Confidential
Flume Operation Example 1 (2)
 Install the Flume client.
 Decompress the client.

tar -xvf MRS_Flume_Client.tar


tar -xvf MRS_Flume_ClientConfig.tar
cd /tmp/MRS-client/MRS_Flume_ClientConfig/Flume
tar –xvf FusionInsight-Flume-1.6.0.tar.gz

 Install the client.

./install.sh –d /opt/FlumeClient –f hostIP –c


flume/conf/client.properties.properties

25 Huawei Confidential
Flume Operation Example 1 (3)
 Configure the Flume source.

server.sources = a1
server.channels = ch1
server.sinks = s1
# the source configuration of a1
server.sources.a1.type = spooldir
server.sources.a1.spoolDir = /tmp/log_test
server.sources.a1.fileSuffix = .COMPLETED
server.sources.a1.deletePolicy = never
server.sources.a1.trackerDir = .flumespool
server.sources.a1.ignorePattern = ^$
server.sources.a1.batchSize = 1000
server.sources.a1.inputCharset = UTF-8
server.sources.a1.deserializer = LINE
server.sources.a1.selector.type = replicating
server.sources.a1.fileHeaderKey = file
server.sources.a1.fileHeader = false
server.sources.a1.channels = ch1

26 Huawei Confidential
Flume Operation Example 1 (4)
 Configure the Flume channel.

# the channel configuration of ch1


server.channels.ch1.type = memory
server.channels.ch1.capacity = 10000
server.channels.ch1.transactionCapacity = 1000
server.channels.ch1.channlefullcount = 10
server.channels.ch1.keep-alive = 3
server.channels.ch1.byteCapacityBufferPercentage = 20

27 Huawei Confidential
Flume Operation Example 1 (5)
 Configure the Flume sink.

server.sinks.s1.type = hdfs
server.sinks.s1.hdfs.path = /tmp/flume_avro
server.sinks.s1.hdfs.filePrefix = over_%{basename}
server.sinks.s1.hdfs.inUseSuffix = .tmp
server.sinks.s1.hdfs.rollInterval = 30
server.sinks.s1.hdfs.rollSize = 1024
server.sinks.s1.hdfs.rollCount = 10
server.sinks.s1.hdfs.batchSize = 1000
server.sinks.s1.hdfs.fileType = DataStream
server.sinks.s1.hdfs.maxOpenFiles = 5000
server.sinks.s1.hdfs.writeFormat = Writable
server.sinks.s1.hdfs.callTimeout = 10000
server.sinks.s1.hdfs.threadsPoolSize = 10
server.sinks.s1.hdfs.failcount = 10
server.sinks.s1.hdfs.fileCloseByEndEvent = true
server.sinks.s1.channel = ch1

28 Huawei Confidential
Flume Operation Example 1 (6)
 Name the configuration file of the Flume agent properties.properties.
 Upload the configuration file.

30 Huawei Confidential
Flume Operation Example 1 (7)
 Produce data in the /tmp/log_test directory.

mv /var/log/log.11 /tmp/log_test

 Check whether HDFS has data obtained from the sink.

hdfs dfs –ls /tmp/flume_avro

 In this case, log.11 is renamed log.11. COMPLETED by Flume, indicating that


collection is successful.

31 Huawei Confidential
Flume Operation Example 2 (1)
 Description
 This example shows how Flume ingests clickstream logs to Kafka in real time for
subsequent analysis and processing.
 Prepare data.
 Create a log directory named /tmp/log_click on a node in the cluster.
 Ingest data to Kafka topic_1028.

32 Huawei Confidential
Flume Operation Example 2 (2)
 Configure the Flume source.

server.sources = a1
server.channels = ch1
server.sinks = s1
# the source configuration of a1
server.sources.a1.type = spooldir
server.sources.a1.spoolDir = /tmp/log_click
server.sources.a1.fileSuffix = .COMPLETED
server.sources.a1.deletePolicy = never
server.sources.a1.trackerDir = .flumespool
server.sources.a1.ignorePattern = ^$
server.sources.a1.batchSize = 1000
server.sources.a1.inputCharset = UTF-8
server.sources.a1.selector.type = replicating
jserver.sources.a1.basenameHeaderKey = basename
server.sources.a1.deserializer.maxBatchLine = 1
server.sources.a1.deserializer.maxLineLength = 2048
server.sources.a1.channels = ch1

33 Huawei Confidential
Flume Operation Example 2 (3)
 Configure the Flume channel.

# the channel configuration of ch1


server.channels.ch1.type = memory
server.channels.ch1.capacity = 10000
server.channels.ch1.transactionCapacity = 1000
server.channels.ch1.channlefullcount = 10
server.channels.ch1.keep-alive = 3
server.channels.ch1.byteCapacityBufferPercentage = 20

34 Huawei Confidential
Flume Operation Example 2 (4)
 Configure the Flume sink.

# the sink configuration of s1


server.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
server.sinks.s1.kafka.topic = topic_1028
server.sinks.s1.flumeBatchSize = 1000
server.sinks.s1.kafka.producer.type = sync
server.sinks.s1.kafka.bootstrap.servers = 192.168.225.15:21007
server.sinks.s1.kafka.security.protocol = SASL_PLAINTEXT
server.sinks.s1.requiredAcks = 0
server.sinks.s1.channel = ch1

35 Huawei Confidential
Flume Operation Example 2 (5)
 Upload the configuration file to Flume.
 Run the Kafka command to view the data ingested from Kafka topic_1028.

36 Huawei Confidential
Summary

 This chapter introduces functions and application scenarios of Flume and


gives details on its basic concepts, features, reliabilities, and configurations.
After learning this chapter, you will be able to understand the functions,
application scenarios, and configurations, and usage of Flume.

37 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright© 2020 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like