0% found this document useful (0 votes)
43 views11 pages

Tencent-CloudLogService

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views11 pages

Tencent-CloudLogService

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

TencentCLS: The Cloud Log Service with High Query

Performances
Muzhi Yu Zhaoxiang Lin Jinan Sun
Peking University Tencent Cloud Computing (Beijing) Peking University
Beijing, China Co., Ltd. Beijing, China
[email protected] Beijing, China [email protected]
[email protected]

Runyun Zhou Guoqiang Jiang Hua Huang


Tencent Cloud Computing (Beijing) Tencent Cloud Computing (Beijing) Tencent Cloud Computing (Beijing)
Co., Ltd. Co., Ltd. Co., Ltd.
Beijing, China Beijing, China Beijing, China
[email protected] [email protected] [email protected]

Shikun Zhang
Peking University
Beijing, China
[email protected]

ABSTRACT PVLDB Artifact Availability:


With the trend of cloud computing, cloud log service is becoming The source code, data, and/or other artifacts have been made available at
increasingly important, as it plays a critical role in tasks such as URL_TO_YOUR_ARTIFACTS.
root cause analysis, service monitoring and security audition. Cloud
Log Service at Tencent (TencentCLS) is a one-stop solution for log
collection, storage, analysis and dumping. It currently hosts more
1 INTRODUCTION
than a million tenants and the top tenant can generate up to PB-level With the trend of cloud computing, cloud log service has become in-
logs per day. creasingly popular. Log services significantly simplify the collection
The most important challenge that TencentCLS faces is to sup- and analysis of logs, and provide an one-stop solution for scenar-
port both low-latency and resource-efficient queries on such large ios such as root cause analysis, service monitoring and security
quantities of log data. To address that challenge, we propose a novel audition.
search engine based upon Lucene. The system features a novel pro- Cloud log service also has a huge business value and therefore
cedure for querying logs within a time range, an indexing technique attracts many companies. Not only there have been commercially
for the time field, as well as optimized query algorithms dedicated successful enterprises dedicated in log services, such as Splunk [9]
to multiple critical and common query types. and Elastic [6], but also many cloud vendors have launched their
As a result, the search engine at TencentCLS gains significant own log service product [1, 4, 5].
performance improvements with regard to Lucene. It gains 20x Tencent Cloud Log Services (TencentCLS) [? ] is the log service
performance increase with standard queries, and 10x performance product provided at the Tencent Cloud, and it has experienced
increase with histogram queries in massive log query scenarios. rapid growth in the past year (500% anual growth). In this paper,
In addition, TencentCLS also supports storing and querying with we describe some characteristics and challenges of the business
microsecond-level time precision, as well as the microsecond-level scenarios faced by TencentCLS, and explain the architecture and
time order preservation capability. the techniques employed within TencentCLS. We also provide ex-
perimental evaluations with regard to some major techniques, in
PVLDB Reference Format: order to show the benefits of those designs.
Muzhi Yu, Zhaoxiang Lin, Jinan Sun, Runyun Zhou, Guoqiang Jiang, Hua The TencentCLS business scenarios have the following charac-
Huang, and Shikun Zhang. TencentCLS: The Cloud Log Service with High teristics and challenges.
Query Performances. PVLDB, 14(1): XXX-XXX, 2020. Heavy and Skewed Log Writes
doi:XX.XX/XXX.XX The logs stored in TencentCLS are of large and skewed quantity.
Currently, TencentCLS has millions of log topics, about only 10%
This work is licensed under the Creative Commons BY-NC-ND 4.0 International percent of which are monthly active.
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
The logs collected per day for the active topics are highly skewed.
emailing [email protected]. Copyright is held by the owner/author(s). Publication rights Concretely, the top topic has more than 100 billion logs collected
licensed to the VLDB Endowment. per day while 90% of the active topics generate less than 10 million
Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097.
doi:XX.XX/XXX.XX logs per day each.
Heavy and Skewed Log Queries
It is worth noting that higher timestamp precision not only
allows storing and querying with higher precision, but also keeps
a microsecond-level time order of the retrieved logs. This is useful
when applications generate multiple logs at the same second. With
TencentCLS, it is more probable that those logs are retrieved in the
same order as they were written.
We have conducted detailed experiments using the open bench-
mark to demonstrate the superiority of our solution compared to
Lucene. Generally, our solution gains 7.5x to 38x performance in-
creases, with regard to different types of queries. More detailed
comparisons and analyses are shown in the experimental evalua-
tion section. In addition to the open benchmark, we have also show
Figure 1: The latency distribution of different types of queries
the performance increases using the real world data collected by
Cloud Log Service at Tencent.
The paper is organized as follows. Section 2 gives the background
Not only the quantity of logs has a skewed distribution but also
for the log search solutions. Section 3 describes the overall archi-
the query latency. As is shown in Figure 1, although the average
tecture of TencentCLS and its modules. Section 4 elaborates on the
latency of queries is below 1 second, there is a long-tail effect, and
design of the search engine of TencentCLS. Section 5 provides both
some of the queries take up to 30 seconds or even timeout.
offline and online experimental evaluation of the search engine of
To be more precise, suppose that we use Lucene [12] as the
TencentCLS.
search engine of TencentCLS, the index of the timestamp field of
10 billion logs would have the size of around 30 GB. Even loading
the index from a disk drive that has the speed of 150 MB/s takes up 2 BACKGROUND
a total of 200 seconds. Currently, there are many search engines in the industry, including
According to the online data, around 95% of the queries ask for Lucene [2, 12] and its variants [16, 21], Sphinx [8], Xapian [10],
the latest daily logs. In order for those queries to be answered in MG4J [7], etc. The most widely used are Lucene-based products and
less than 30 seconds, the number of the logs written on each drive Sphinx. We list the characteristics of each solution below, explain
per day is limited to 1.5 billion. Therefore, for the top topic that why we choose to base the search engine of TencentCLS on Lucene,
generates 100 billion logs per day, a total of at least 67 disks (with a and describe a few weaknesses of Lucene that we need to address.
speed of 150 MB/s) are required, which is very costly.
Histogram Queries are Common 2.1 Search Engine Options
In addition, for each query, TencentCLS shows the distribution Sphinx [8] is a search engine library that features high-speed
of logs in time that meets the conditions. We call the queries that index generation and high-speed distributed search. It has a good
support such visualization histogram queries, which collect the support for MySQL. However, as a static search engine, Sphinx is
counts of hits in different time segments. Histogram queries are not suitable for real-time search, or scenarios with frequent data
extremely common, but also resource demanding. updates. It also has a high disk IO overhead.
The above challenges can be worse if we use higher precision for Xapian [10] shares a similar design with Lucene, and also pro-
timestamps, because the index of the timestamp will grow larger vides a rich and extensible set of API. It even achieves higher query
as the precision increases, and ultimately slowing down the query performances than Lucene. However, it lacks the concepts of fields
processes. or columns, which is rather different from traditional databases.
Therefore, it is a both necessary and challenging task to design Lucene [2], on the other hand, can generate indexes on fields and
a system that supports low-latency queries on these large-quantity, support retrieval by fields. It also supports real-time index genera-
highly-skewed log data. tion, query and update. Lucene itself has excellent object-oriented
Our solution is a novel log search engine featuring time series system architecture and integrates a powerful search engine. Based
index. It is based on Lucene and is optimized especially for log data. on Lucene, Solr and ElasticSearch both support distributed storage
Compared with Lucene, it differs mainly in the following aspects. and distributed query.
(1) We keep the documents sorted according to their times- Solr [3] was the most mature and stable Lucene-based indexing
tamps. component before ElasticSearch. After its launch, ElasticSearch
(2) We design an time series index dedicated for the time field. become far more popular than Solr, because ElasticSearch has more
(3) We design a search algorithm dedicated for tail queries powerful real-time search capabilities as well as other advantages
(queries that are expected to return the last few hits). including easy-to-use, near-zero configuration, friendly RESTful
(4) We optimize the histogram queries (queries that are ex- query interface, convenient cluster deployment and management.
pected to return the distribution of number of logs in time). It also has the following features.
Thanks to the design, TencentCLS significantly lowers the query • High availability: ES supports indexing on shards and multi-
latency, and supports microsecond-level precision for timestamps node distributed query.
with little cost. In the scenario described above, we only need 3 • Persistency: ES supports multi-machine backup, self-monitoring
disks instead of 67 to achieve the same query latency. and balancing.
2
3.1 Access Layer
The access layer receives, processes, and forwards the requests to
other layers. It consists of multiple modules which takes charge
of authentication, validation, centralized flow control, and so on.
Valid requests will be eventually forwarded to the write layer or
the query layer according to their types. Modules at this layer are
deployed as containers, and the resources will be automatically
adjusted as the demands change.

3.2 Stateless Write Layer


The write layer processes the write requests. It writes the data to
the corresponding topic in the message queue of the data store layer.
The layer is designed to be stateless, and the mapping between the
topics and the topics in the message queue is maintained with the
Multi-Tenant Resource Manager, which is described below. The
layer is deployed as containers, and supports auto scaling.

3.3 Stateless Query Layer


The query layer processes the query requests, and consists of func-
tionalities such as query parsing, query translation, collection and
aggregation of retrieved information, etc. The layer is also designed
to be stateless, thanks to the Multi-Tenant Resource Manager mod-
Figure 2: The TencentCLS Architecture ule. The layer is deployed as containers, and supports auto scaling.

3.4 Multi-Tenant Resource Manager


• Scalability: ES supports discovering and joining new nodes The multi-tenant [11] resource manager maintains the mappings
and horizontal auto scaling. Node failures do not affect the from the topics of the tenants to three kinds of resources in the
cluster. data store layer: the topics in message queue, the indexes, and the
Due to its many advantages and rapid development, Elastic- buckets. To be more precise, each topic corresponds to one topic in
Search now enjoys a great community, and has many well-known the message queue, many indexes and one bucket. Therefore, we
enterprises users as well as a large number of startup companies. achieve a good isolation between the data from different tenants.
After comparative analyses, we have finally decided to use Lucene We also conducted two optimizations. First, we slice the data into
/ ElasticSearch as the basis of our distributed log storage and dis- many indexes according to their timestamps, so that we can perform
tributed full-text search solution. basic pre-filtering on the queries. Second, since a large proportion
of the tenants never write any data, we postponed the resource
2.2 Weaknesses of Lucene allocation in the data store layer to the point where the actual data
Lucene’s support for range query was not provided at the begin- write happens. To do so, we introduce the concept of virtual storage
ning, and when it was finally introduced, it has performance issues. resource (VSR), an abstraction of the storage resource. However,
In practice, the search can be very slow when there are many oc- this optimization by itself increases the latency of the initial write.
currences of terms in a single document. Although Lucene is often To mitigate the effect, we maintain a pool of resources in the data
regarded as an efficient full-text search engine, its high performance store layer so that the allocation is done beforehand, and only the
is mostly limited to boolean queries. data binding is done at the initial write.
Starting with Lucene version 6.0, a new index data structure
BKD-Tree [19] for numeric datatype was introduced to optimize 3.5 Data Store Layer
the performance of range queries in Lucene. The data store layer consists of three parts: 1) the message queue, 2)
The complexity of the BKD algorithm is linearly correlated with the index storage, and 3) the cloud storage. The message queue is to
the index cardinality, and the number of hits. Therefore, the orig- smooth out the latency of write requests. To ensure data reliability,
inal BKD algorithm is not suitable for massive log query, whose multiple copies of data are kept in the queue, and the write request is
timestamps are of high cardinality. responded only when more than two copies have been successfully
written.
3 ARCHITECTURE
The architecture for TencentCLS is shown in Figure 2. The entire 3.6 Index Storage Layer
system is deployed on Tencent Cloud, using cloud services such as The index storage layer maintains the indexes for different tenant
Elastic Compute Service, Cloud Object Storage, etc. Its components topics. The implementation is based on Lucene. In order to support
are described as follows. various kinds of queries, we build various indexes such as inverted
3
indexes [14], SkipList [17] indexes and BKDTree [19] indexes. Also, indexing. Therefore, searching with time conditions on massive log
column-oriented storage is adopted to support efficient analyses. data with Lucene can be extremely slow.
What makes it even worse is that, most log queries does not
3.7 Object Storage Layer specify a single timestamp, but instead specify a range of time,
The object storage layer takes care of the data persistence. It also as is shown in the above example. Such queries will require even
supports demands such as re-indexing from objects in the event of more time to finish, because Lucene has to scan through all the
an exception. timestamps and retrieve the corresponding docids. Therefore, the
time range query on massive log data almost guarantees to time
4 A SEARCH ENGINE OPTIMIZED FOR LOG out.
QUERY Although there have been various optimizations and variants
based on inverted index [13, 15, 16, 18, 20, 20], very few of them
This section describes the search engine used in TencentCLS. The
are battle tested. What we adopt here is a lightweight solution.
search engine is built upon Lucene and is highly optimized for log
queries.
We begin with some basic examples of queries, and then we 4.4 Our Solution: A Search Engine with Time
briefly describe the indexing and searching of Lucene. Next, we Series Index
demonstrate the characteristics of log queries and explain why the To achieve better performances with log queries and address the
default indexing and searching functionalities provided by native above problems, we propose a Lucene-based search engine with
Lucene is not satisfactory. Finally, we propose our design, and time series index.
elaborate on its differences from the Lucene search engine. The core design choice we make with the TencentCLS search
engine is that the log documents are always sorted by timestamps
4.1 An Example Log Document and Log Query in the ascending order.
A typical log document consists of a timestamp, text, and properties. We first explain how this design benefits the performance of log
Below is an example. queries, then describe the overhead of applying that functionality,
[ 2 0 2 1 − 0 9 − 2 8 T10 : 1 0 : 3 9 . 1 2 3 4 ] [ i p = 1 9 2 . 1 6 8 . 1 . 1 ] and next provide some implementation details. Finally, we also
describe other optimizations for specific query types.
XXXXXXXX
Normally, to accelerate log query, the system will create indexes 4.4.1 Why keeping the log sorted in time. In order to explain the
for the timestamp, text and properties respectively. reason why TencentCLS keeps the log sorted in time, we need to
A typical log query specify a few conditions, and a time range. first describe what that changes the range query procedure into.
Below is an example. The new procedure is provided below.
SELECT ∗ FROM x x x x _ i n d e x (1) Suppose the timestamp range is specified to [ts_i, ts_j],
WHERE i p = 1 9 2 . 1 6 8 . 1 . 1 we use the index to find the smallest docid, docid_p, that
and t i m e s t m a p >= 2021 −09 −28 T00 : 0 0 : 0 0 corresponds to ts_i, and the largest docid, docid_q, that
and t i m e s t a m p < 2021 −09 −29 T00 : 0 0 : 0 0 corresponds to ts_j.
(2) The document id list is directly calculated as [docid_p, ...,
4.2 Indexing and Searching in Lucene docid_q]. Previously it was constructed by merging all post-
ings lists for the timestamp within that range.
In Lucene, every log document will be assigned a unique number (3) Set operations might be performed on this document id list
called docid. When creating an index, an inverted index storing a and other document id lists, in order to generate the final
mapping from contents to sets of docids will be created. result.
For example, with the timestamp field, Lucene will create a post-
ings list that maintains a mapping from all possible timestamps Figure 3 and Figure 4 also demonstrate how the range query
to sets of docids. Based on that, Lucene can quickly response to works, before and after applying the feature.
the queries that search for a given timestamp. The algorithm com- Given the above procedure, it can be concluded that once we
plexity for the query is 𝑂 (𝑙𝑜𝑔(𝑛)), where 𝑛 is the number of the successfully keep the documents sorted, the following merits are
possible timestamps. promised.
• In the aspect of storage, the BKD index (the data structure
4.3 Characteristics and Challenges with Log provided by Lucene to support range query) for timestamp
Queries is no longer required, since the column-oriented storage
Although Lucene is known to be good at full text queries thanks to for timestamps is already sorted.
the design of the inverted index R, its performance drops dramat- • The index read frequency is reduced, since we only need
ically when searching numeric fields [15]. The performance gets to locate the docids corresponding to the begin and the end
even worse when searching high-cardinality numeric fields R. Un- of the timestamps.
fortunately, the timestamp field of log data is a high-cardinality nu- • The CPU usage is reduced, since we can construct the post-
meric field. In fact, a maximum of 24*60*60*1000 = 86400000 unique ings directly from the docids corresponding to the begin
values can be generated every day, when using millisecond-level and the end of the timestamps.
4
Figure 5: Binary search for timestamps endpoints directly
on column-oriented storage

Figure 3: Range query with unordered documents. It requires


with timestamps and return the latest 10. With the index-sorting
visiting every timestamp index within that range in order to
enabled, we only need to iterate over the latest 10 log data.
collect the documents.
However, in practice we find that simply turning on the index-
sorting function implemented in Lucene achieves little to even
negative performance improvements with log queries. After some
analysis, we find that there are some other issues that need to be
solved before we can benefit from index-sorting when processing
log queries. Those optimizations are described in the next section
4.5.1.
4.4.3 Overhead of Keeping the Log Sorted. The overhead of keep-
ing the log sorted is also important to consider. According to our
experiments and analysis, enabling index-sorting has only slight ef-
fect on log writes, increasing the CPU usage by approximately 6.5%,
a value that is perfectly acceptable for our system. For example, if
the average CPU usage was 30% before enabling index-sorting, now
it would become 32%.
Figure 4: Range query with ordered documents. It requires
visiting only two timestamp and the documents can be cal- 4.4.4 Microsecond-level Time Order Preservation. We have also no-
culated based on the first docid and the last docid. ticed that many commercial and open-source log service solutions
do not support microsecond-level time order preservation, which
is a urgent need for many time-critical log analysis scenarios. At-
• The support for timestamps of higher precision becomes tributed to the above design, our solution has already guaranteed
feasible. the property of microsecond-level time order preservation with no
Theoretically, keeping the documents sorted would reduce the additional effort, while still keeping the query latency low.
complexity of each query from 𝑂 (𝑛) to 𝑂 (𝑙𝑜𝑔(𝑛)), where 𝑛 is the
number of the hit documents. 4.5 Additional Optimizations
In addition to the major design described above, we have also
4.4.2 Implementation of the Sorting Mechanism. The function that
implement other optimizations in the search engine in TencentCLS.
keeps the documents sorted is implemented using the existing index-
The motivation and the description of those optimizations are given
sorting R in Lucene. The native Index-sorting has two functionalities.
below.
First the specified field is kept sorted. Second early-terminate is
applied to increase the performance. The early-terminate feature is 4.5.1 Optimization 1. Secondary Indexing. We analyzed the reason
explained as follows. why simply using index-sorting on timestamp field yields little per-
By default, a search request in Lucene must visit every document formance gain. In Lucene, searching the sorted field is accomplished
that matches the query in order to return the top documents sorted by performing binary search in the column-oriented storage of that
by a defined sort. When the index and search sorts are the same, it column. The problem is, the index for the log data is too large (a few
is feasible to limit the number of documents that must be viewed tens of gigabytes), and the binary search for the beginning and end
per segment in order to obtain the top N documents globally. With timestamps requires a few tens of random accesses on the disks,
early-terminate, Lucene will only compare the first N documents which are slow to perform on slow storage devices.
per segment if it detects that the top docs of each segment are For example, the index for 10 billion log entries would have the
already sorted in the index. The remaining documents that fit the size of around 30 GB, and therefore even the process of loading the
query are gathered in order to count the overall number of results index data would cost 300 seconds with the speed of 100 MB/s.
and create aggregations. To address that problem, we build a secondary index that de-
Therefore, for example, when we want the latest 10 log data, if creases disk accesses from a few tens of times to around 3 times, as
the index-sorting is not enabled, we have to sort all the log data is demonstrated in Figure 5 and Figure 6.
5
Figure 8: Demonstration of the Reverse Binary Search algo-
Figure 6: Binary search for timestamp endpoints with sec-
rithm for tail queries.
ondary index

on top of the existing iterators of Lucene. In effect, the algorithm


reduces the complexity of tail queries from 𝑂 (𝑛) to 𝑂 (𝑙𝑜𝑔(𝑛)).
The execution of the algorithm consists of two steps. The first
step is using binary search algorithm to locate the second to last
document that meets the given conditions, and store every middle
point during the search. The second step is to iterate over the
collection of the middle points. For every middle point, we examine
if there exists 𝐾 documents that meet the conditions. If there are 𝐾
documents, the execution is finished and 𝐾 documents are returned.
If not, we continue that process and examine the next middle point.
Figure 7: Head query and tail query The algorithm is demonstrated in Figure 8, as well as Algorithm
1.
4.5.2 Optimization 2. Reverse Binary Search Algorithm for Tail
Queries. We find that the queries can be divided into two groups: Algorithm 1 The Reverse Binary Search algorithm.
head queries and tail queries, and the latter can be optimized. MiddlePoints ← BinarySearch (Hits)
We define the head queries as the queries that are to search the ⊲ Here BinarySearch refers to a modified algorithm that returns
last few entries that satisfy the given conditions, and the tail queries a series of middle points instead of the found document
as the queries that are to search the first few entries, as is shown for each MiddlePoint ∈ MiddlePoints do
in Figure 7. Given that the log data are sorted in ascending order iterator ← Iterator(MiddlePoint)
by time, head queries are to search the oldest logs that meets the count ← 0
conditions while the tail queries are to search the newest logs. We documents ← {}
also provide an example of the tail query below. for each document ∈ iterator do
documents.add(document)
SELECT ∗ FROM x x x _ i n d e x
count ← count + 1
WHERE . . . end for
ORDER BY t i m e s t a m p if count >= K then
DESC LIMIT 1 0 ; return The first 𝐾 elements of documents
Although both queries look similar, when it comes to tail queries, end if
Lucene’s implementation can be very inefficient, due to the follow- end for
ing reasons.
The iterators implemented in Lucene only support one-way
iterations. Therefore, for tail queries, we have to iterate through 4.5.3 Optimization 3. The Histogram Query. Lastly, optimization
all data till the end, as is shown in Figure 7. The complexity of this are carried out for histogram query. The histogram query is a very
process is 𝑂 (𝑛), where 𝑛 is the number of the documents that meet common type of query, which asks for the distribution of number
the condition. of logs in time, that meet a set of conditions.
Even if we add support for reverse iteration on top of Lucene, By default, to handle such queries, Lucene first filter the logs by
tail queries would still be inefficient. The reason is that the reverse the conditions, then check the timestamps of the remaining logs.
access to disks would render the file cache provided by operating However, this process may cause tens of thousands of look-ups in
systems ineffective. the table, which causes huge latency. Especially with TencentCLS,
Therefore, to address the inefficiency of tail queries, we propose a histogram query almost always accompanies a normal query, in
the Reverse Binary Search algorithm. The algorithm is implemented order to provide a better user experience.
6
Table 1: Statistics of the NYC Taxi Benchmark

Name Value
No. of documents ∼12 b
No. of shards 6
average ES segment size ∼5 GB
No. of documents per ES segment ∼24 m
average No. of hits per query ∼40 m

(2) Different types of storage devices: Tencent Premium Cloud


Storage R, NVMe SSD drive R, and SATA HDD drives R.
(3) Different number of users: 1, 2, 4, 6, 8, 10, 15, 20, 50, 100,
150, 200.
Figure 9: Optimization for histogram queries. The number of (4) Different timestamp precisions: second-level, and millisecond-
logs in certain bins are directly calculated using the docids of level.
the endpoints. For example, for bin [ti, tj), the corresponding The most important scenario is ones that use Tencent Premium
number of logs is calculated as docid_j - docid_i Cloud Storage as storage devices, and adopt second-level times-
tamp precision. The reason for prioritizing the Tencent Premium
Cloud Storage is that TencentCLS is built on Tencent Cloud. And
Our solution is that, instead of checking the timestamp of every
the reason for using second-level timestamp precision is that the
hit log document, we only need to gather the edge of each bin of the
timestamp in the benchmark dataset has the second-level precision.
histogram. The reason is that since the logs are sorted, the number
Based on the results, we have been able to answer the following
of logs within a particular bin can be directly calculated by the
research questions.
docids of the two endpoints. With such technique, we reduce the
number of look-ups from tens of thousands to under ten. 5.1.1 RQ1. What is the overall performance increase compared with
The process is shown in Figure 9. Lucene? Under the most important scenario described above (Ten-
cent Premium Cloud Storage + second-level time precision), using
5 EXPERIMENTAL EVALUATION the inverse of the service time (in milliseconds) as the indicator
The experimental evaluations are mainly to demonstrate the effec- for the performance, we observe that the performances increase by
tiveness of the design of the search engine in TencentCLS. 38x for head queries, 26x for tail queries and 7.5x for the histogram
Overall, the experiments consist of two parts: offline experiments queries.
with open benchmarks and online experiments with real world data. Figure 10 shows a more detailed result, distinguishing the per-
The first part of experiments is relatively cheap to perform, we formances under different user counts. Generally, the performance
use them to analyze the performance gains of our methods under steadily increases more, as the user counts get higher. The reason
various scenarios. The second part, on the other hand, provide more is that when the user counts are low, the workload is low, and the
convincing evidences of the effectiveness of our solution, since it strength of the system design is not fully displayed. Therefore, we
utilizes real-world data at TencentCLS. always use the results from the heaviest workload as the arguments
for our analyses.
5.1 Open Benchmark Evaluation
5.1.2 RQ2. How much does each of the optimization techniques
In the open dataset experiments, we quantitatively investigate the
contribute to the performance improvements? There are four opti-
effectiveness of the query optimizations in a single-machine sce-
mization techniques described in this paper:
nario.
The experiment is performed on Tencent Cloud machines, each • O0: Keeping the documents sorted.
with a 16-core vCPU and 64 GB of ram. The storage devices are • O1: Constructing the secondary index for the timestamp
local NVMe SSD drives (IT3.4XLARGE64), local SATA HDD drives field.
(D3.4XLARGE64) and Tencent Premium Cloud Storage. • O2: Reverse binary search algorithm for tail queries.
The benchmark we use is the NYC taxi benchmark provided by • O3: Optimizing histogram queries.
esrally R. The dataset consists of taxi rides information in New York We turn them on and off individually (while we can) in order
in 2015, and contains up to a total of 12 billion documents. Some to understand the contribution of each technique to the overall
important statistics for this benchmark are listed in Table 1. performance increase. Other experimental setup is the same as the
The experiments are designed with the goal of analyzing the one used in RQ1.
performance increases in the following scenarios. Results show that turning on O0 alone increases the head query
(1) Different types of queries: head queries, tail queries, and his- performances by 12x, increases the head query performances by 3x
togram queries (defined in Section 4.5.2 and Section 4.5.3). and increases the histogram query performances by 3x.
7
Table 2: Performances when turning on and off different
optimization techniques. Multiplier refers to the boost mul-
tiplier of current optimization config, compared with the
previous one. Accumulative Multiplier refers to the accumu-
lative boost multiplier of the current optimization config.
CPU / query refers to the CPU usage per query (CPU usage
percentage * time). rMB refers to the disk read per query.

Head Query
Service Time CPU / query rMB / query
No Optimizations 604124.0 200.5 452.7
(a) Head Queries O0 50318.2 7.3 37.3
Multiplier 12.0 27.6 12.1
Acc. Multiplier 12.0 27.6 12.1
O0 + O1 17224.8 5.5 12.5
Multiplier 2.9 1.3 3.0
Acc. Multiplier 35.1 36.5 36.2
O0 + O1 + O2 + O3 15904.2 5.2 12.1
Multiplier 1.1 1.1 1.0
Acc. Multiplier 38.0 38.9 37.3
Tail Query
Service Time CPU / query rMB / query
(b) Tail Queries No Optimizations 585014.0 196.0 438.4
O0 193487.0 831.7 144.3
Multiplier 3.0 0.2 3.0
Acc. Multiplier 3.0 0.2 3.0
O0 + O1 194551.0 821.8 82.2
Multiplier 1.0 1.0 1.8
Acc. Multiplier 3.0 0.2 5.3
O0 + O1 + O2 + O3 23931.0 34.4 17.1
Multiplier 8.1 23.9 4.8
Acc. Multiplier 24.4 5.7 25.6
Histogram Query
(c) Histogram Queries Service Time CPU / query rMB / query
No Optimizations 584511.0 116.4 438.0
Figure 10: Performances for three types of queries with dif-
O0 179252.0 66.6 134.0
ferent optimization options
Multiplier 3.3 1.7 3.3
Acc. Multiplier 3.3 1.7 3.3
O0 + O1 183304.0 69.2 137.7
On top of that, the turning on the secondary index (O1) further Multiplier 1.0 1.0 1.0
increases the head query performances by 3x, but has little effect Acc. Multiplier 3.2 1.7 3.2
on the performances of other types of queries.
Furthermore, the Reverse Binary Search Optimization technique O0 + O1 + O2 + O3 76893.0 39.8 57.0
(O2) increases the tail query performances by 3.5x, while the His- Multiplier 2.4 1.7 2.4
togram Optimization technique (O3) increases the histogram query Acc. Multiplier 7.6 2.9 7.7
performances by 1.6x.
The results are shown in Figure 10, distinguishing the perfor-
mances under different user counts, as well as in Table 2. Tencent Premium Cloud Storage, SATA HDD drives, and NVMe
SSD drives are the most representative ones.
5.1.3 RQ3. How does the choice of the storage option affect the All the above analyses (RQ1 and RQ2) are based on the experi-
query performance, before and after the optimization? Tencent Cloud ments using Tencent Premium Cloud Storage as the storage option.
provides a series of customizable storage options, among which However, experimental results with other storage options are also
8
Table 3: The specifications of different storage solutions at Table 4: Comparison of performance improvements among
Tencent Cloud. IOPS is tested with 4 KiB IO, and throughput different storage solutions. For each storage solution, three
is tested with 256 KiB IO. rows list the native performances, the performances after
optimizations, and the multipliers for performance improve-
Disk Type IOPS Throughput ments, respectively. The results are tested under 200 concur-
rent users for Premium Cloud Storage and NVMe SSD, and
Premium Cloud Storage 6,000 150 MB/s under 150 concurrent users for SATA HDD because of the its
NVMe SSD 650,000 2.8 GB/s limited performance.
SATA HDD 200 190 MB/s
Head Query
Service Time CPU / query rMB / query
Premium Cloud
important, because they not only show the comparison of effective- Storage 604124.0 200.5 452.7
ness of the optimization techniques, but also serve as a guidance 15904.2 5.2 12.1
for choosing the storage option. 38.0 38.9 37.3
Tencent Cloud Premium Cloud Storage is a hybrid storage option. NVMe SSD 84986.6 405.6 459.4
It adopts the Cache mechanism to provide a high-performance SSD- 2704.1 9.0 9.6
like storage, and employs a three-copy distributed mechanism to 31.4 45.3 47.6
ensure data reliability. SATA HDD 1426810.0 215.7 423.9
SATA HDD is the most economical option suitable for scenarios 108863.0 8.6 14.0
that involve sequential reading and writing of large files, but its 13.1 25.1 30.2
random access performance is relatively low. Tail Query
NVMe SSD has the highest performance. But its low cost perfor- Service Time CPU / query rMB / query
mance ratio restricts its strength in the log service scenarios.
Premium Cloud
Table ?? shows the comparison of the specifications of the three
Storage 585014.0 196.0 438.4
storage options.
23931.0 34.4 17.1
The experimental results with different storage options are shown
24.4 5.7 25.6
in Table 4. We can draw the following conclusions. First, the NVMe
NVMe SSD 77402.1 370.8 449.6
SSD option consistently outperform other storage options, while
13134.5 61.1 17.3
the Tencent Premium Cloud Storage option is less than an order
5.9 6.1 26.0
of magnitude behind. Second, compared with the NVMe SSD, the
SATA HDD 1448450.0 211.7 433.2
Tencent Premium Cloud Storage consistently enjoys more benefits
183195.0 35.7 17.7
from the query optimization techniques.
7.9 5.9 24.5
5.1.4 RQ4. Will the increase of timestamp precision level impact the Histogram Query
query performances? It is also the goal of Cloud Log Service to sup- Service Time CPU / query rMB / query
port storing and querying higher-precision timestamps. Therefore, Premium Cloud
it is important to check how does the increase of the timestamp pre- Storage 584511.0 116.4 438.0
cision level impact the query performance. To this end, we change 76893.0 39.8 57.0
the timestamp from second to millisecond, and analyze the query 7.6 2.9 7.7
performance. The data also comes from the experiments using NVMe SSD 53759.4 237.7 425.5
Tencent Premium Cloud Storage. 17333.5 77.4 48.9
Interestingly, as is shown in Figure 11, increasing the timestamp 3.1 3.1 8.7
precision has almost no impact on the query performance, thanks SATA HDD 1326030.0 130.9 411.9
to the search engine design in TencentCLS. 465770.0 42.4 58.1
The reason is that although the precision increases, the frequency 2.8 3.1 7.1
of the log writes stays the same. Although theoretically some oper-
ations such as locating the endpoints will get slower, after applying
the secondary index optimization, the difference in costs is sig-
nificantly reduced. Also, those precision-sensitive operations do 5.1.5 RQ5. What is the bottleneck of our system? We have also
not take up a large proportion of the total service time. Therefore, investigated the bottlenecks of our system, by analyzing the CPU
generally speaking, the performance is virtually unaffected by the usage and the disk IO during the above experiments.
time precision. As is shown in Table 4, the IO performances the main bottleneck
In fact, the online version of TencentCLS is running with microsecond- for Premium-Cloud-Storage-based solutions and SATA-HDD-based
level time precision thanks to the search engine design, while many solutions, while the CPU performances becomes the bottleneck for
vendors are providing second-level time precision log services. NVMe-SSD-based solutions.
9
Table 5: Results of the online experiment.

Head Query
# Log 109 1010
Original (ms) 12882 16904
Ours (ms) 399 780
Boost Multiplier 32x 21x
Tail Query
# Log 109 1010
Original (ms) 10577 17483
Ours (ms) 391 1299
(a) Head query performance Boost Multiplier 27x 13x
Histogram Query
# Log 109 1010 5 ∗ 1010 1011
Original (ms) 16623 >42764 TIMEOUT TIMEOUT
Ours (ms) 1144 4253 10300 17920
Boost Multiplier 15x >10x N/A N/A

The results are shown in Table 5. Generally, the head/tail query


performances increase by 20x, while the histogram query perfor-
mances increase by 10x. Moreover, the new system supports his-
togram queries on 100 billion log documents, and can process the
queries within 20 seconds, while the original system has started to
(b) Tail query performance time out on only 10 billion log documents.

6 CONCLUSION
In this paper, we introduce the motivation of TencentCLS, and
propose the architecture of TencentCLS. Then we elaborate on
the design and optimizations of the search engine in TencentCLS,
a system that supports low-latency queries with massive high-
cardinality data. Finally, we evaluate and analyze the performance
of our search engine, both with open benchmarks and with online
data in TencentCLS.

ACKNOWLEDGMENTS
We would like to thank anonymous reviewers for their valuable
(c) Histogram query performance comments and helpful suggestions. We thank the Tencent Cloud
staff for providing cloud resources and technical support. We also
Figure 11: Performances with second-level timestamp preci- thank ElasticSearch Team for the support.
sion and millisecond-level timestamp precision, evaluated
using the total service time (in milliseconds). REFERENCES
[1] 2022. Amazon CloudWatch - Application and Infrastructure Monitoring.
https://aws.amazon.com/cloudwatch/.
[2] 2022. Apache Lucene. https://lucene.apache.org/. Accessed: 2010-09-30.
[3] 2022. Apache Solr. https://solr.apache.org/.
[4] 2022. Azure Monitor | Microsoft Azure. https://azure.microsoft.com/en-
us/services/monitor/.
[5] 2022. Cloud Logging | Google Cloud. https://cloud.google.com/logging.
5.2 Online Test [6] 2022. Elastic. https://www.elastic.co/.
[7] 2022. MG4J: High-Performance Text Indexing for Java™.
In addition to the offline experiments with open benchmarks, we https://mg4j.di.unimi.it/.
have also tested the system with real world data. [8] 2022. Sphinx: Open Source Search Engine. http://sphinxsearch.com/.
[9] 2022. Splunk. https://www.splunk.com.
The experiments involve two clusters, one equipped with Elas- [10] 2022. The Xapian Project. https://xapian.org/.
ticSearch (version 7.10.1), and the other equipped with the search [11] Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, and Jan Rittinger.
engine of TencentCLS. Each cluster consists of 3 master nodes as [n.d.]. Multi-Tenant Databases for Software as a Service: Schema-Mapping
Techniques. ([n. d.]), 12.
well as 40 data nodes. We select a single large log topic as input, [12] Andrzej Białecki, Robert Muir, and Grant Ingersoll. 2012. Apache Lucene 4. 24
and its data is written to those clusters at the same time. pages.
10
[13] Matteo Catena, Craig Macdonald, and Iadh Ounis. 2014. On Inverted Index Berlin Heidelberg, Berlin, Heidelberg, 124–138. https://doi.org/10.1007/978-3-
Compression for Search Engine Efficiency. In Advances in Information Retrieval 540-72951-8_11
(Lecture Notes in Computer Science), Maarten de Rijke, Tom Kenter, Arjen P. de [18] Giulio Ermanno Pibiri and Rossano Venturini. 2021. Techniques for Inverted
Vries, ChengXiang Zhai, Franciska de Jong, Kira Radinsky, and Katja Hofmann Index Compression. Comput. Surveys 53, 6 (Nov. 2021), 1–36. https://doi.org/10.
(Eds.). Springer International Publishing, Cham, 359–371. https://doi.org/10. 1145/3415148 arXiv:1908.10598
1007/978-3-319-06028-6_30 [19] Octavian Procopiuc, Pankaj K. Agarwal, Lars Arge, and Jeffrey Scott Vitter.
[14] D. Cutting and J. Pedersen. 1990. Optimization for Dynamic Inverted Index Main- 2003. Bkd-Tree: A Dynamic Scalable Kd-Tree. In Advances in Spatial and
tenance. In Proceedings of the 13th Annual International ACM SIGIR Conference Temporal Databases, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Thanasis
on Research and Development in Information Retrieval - SIGIR ’90. ACM Press, Hadzilacos, Yannis Manolopoulos, John Roddick, and Yannis Theodoridis (Eds.).
Brussels, Belgium, 405–411. https://doi.org/10.1145/96749.98245 Vol. 2750. Springer Berlin Heidelberg, Berlin, Heidelberg, 46–65. https://doi.org/
[15] Marcus Fontoura, Ronny Lempel, Runping Qi, and Jason Zien. 2005. Inverted 10.1007/978-3-540-45072-6_4
Index Support for Numeric Search. [20] Hao Yan, Shuai Ding, and Torsten Suel. 2009. Inverted Index Compression and
[16] Xiaoming Gao, Vaibhav Nachankar, and Judy Qiu. 2011. Experimenting Lucene Query Processing with Optimized Document Ordering. In Proceedings of the 18th
Index on HBase in an HPC Environment. In Proceedings of the First Annual International Conference on World Wide Web - WWW ’09. ACM Press, Madrid,
Workshop on High Performance Computing Meets Databases - HPCDB ’11. ACM Spain, 401. https://doi.org/10.1145/1526709.1526764
Press, Seattle, Washington, USA, 25. https://doi.org/10.1145/2125636.2125646 [21] Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene
[17] Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. 2007. A Simple for Information Retrieval Research. In Proceedings of the 40th International ACM
Optimistic Skiplist Algorithm. In Structural Information and Communication SIGIR Conference on Research and Development in Information Retrieval. ACM,
Complexity, Giuseppe Prencipe and Shmuel Zaks (Eds.). Vol. 4474. Springer Shinjuku Tokyo Japan, 1253–1256. https://doi.org/10.1145/3077136.3080721

11

You might also like