Detecting Logic Bugs
Detecting Logic Bugs
1 INTRODUCTION
In the past decades, we have witnessed the evolution of modern DBMS (Database Management
Systems) to support various new architectures such as cloud platforms and HTAP [15, 27], which
require increasingly sophisticated optimizations for query evaluation. Query optimizer is considered
as one of the most complex and important components in DBMS. It parses the input SQL queries and
generates an efficient execution plan with the assistance of built-in cost models. The implementation
errors in a query optimizer can result in bugs, including crashes and logic bugs. Crashes are easier
∗ Sai Wu is the corresponding author.
Authors’ addresses: Xiu Tang, [email protected], Zhejiang University, China; Sai Wu, [email protected], Zhejiang
University, China; Dongxiang Zhang, [email protected], Zhejiang University, China; Feifei Li, Alibaba Group, China,
[email protected]; Gang Chen, Zhejiang University, China, [email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the 55
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/5-ART55 $15.00
https://doi.org/10.1145/3588909
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:2 Xiu Tang et al.
to detect as the system will halt immediately. Whereas, logic bugs are prone to be ignored, because
they simply cause the DBMS to return incorrect result sets that are hard to detect. In this paper, we
focus on detecting these silent bugs.
Pivoted Query Synthesis (PQS) has recently emerged as a promising way to detect logic bugs
in DBMS [50]. Its core idea is to select a pivot row from a table and generate queries that fetch
this row as the result. A logic bug is detected if the pivot row is not returned in any synthesized
query. PQS is mainly designed to support selection queries in a single table and 90% of its reported
bugs are for queries involving only one table. There still exists substantial research gap regarding
multi-table queries with different join algorithms and join structures, which are more error-prone
than single-table queries.
In Figure 1, we illustrate two logic bugs of MySQL for join queries. These two bugs can be
detected by our proposed tool in this paper. Figure 1(a) demonstrates a logic bug of hash join in
MySQL 8.0.18. In this example, the first query returns the correct result set because it is executed
with the block nested loop join. However, when the second query is issued with an inner hash join,
an incorrect empty result set is returned. This is because the underlying hash join algorithm asserts
that “0” and “−0” are not equal. In Figure 1(b), the logic bug is caused by semi-join processing in
MySQL’s newest version (8.0.28). In the first query, the nested loop inner join casts the data type
𝑣𝑎𝑟𝑐ℎ𝑎𝑟 to 𝑏𝑖𝑔𝑖𝑛𝑡 and produces a correct result set. But when the second query is executed with
hash semi-join, the data type 𝑣𝑎𝑟𝑐ℎ𝑎𝑟 is converted to 𝑑𝑜𝑢𝑏𝑙𝑒, resulting in data accuracy loss and
incorrect equivalence comparison.
Adopting query synthesis for logic bug detection in multi-table join queries is much more difficult
than that in single-table selection queries, due to two unique challenges:
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:3
• Result Verification: Previous approaches adopt the differential testing strategy to verify the
correctness of query results. The idea is to process a query using different physical plans. If
plans return inconsistent result sets, a possible logic bug is detected. However, the drawback
of differential testing is two-fold. First, some logic bugs affect multiple physical plans and
make them all generate the same incorrect result. Second, when inconsistent result sets are
observed, we need manually check which plan generates the correct one, incurring high
overheads. A possible solution for the above problem is to obtain the ground-truth results for
an arbitrary testing query, which is not supported by existing tools.
• Search Space: The number of join queries that can be generated from a given database
schema is exponential to the number of tables and columns. Since we are unable to enumerate
all possible queries for verification, there requires an effective query space exploration
mechanism that allows us to detect logic bugs as efficiently as possible.
In this paper, we propose Transformed Query Synthesis (TQS) as a remedy. It is a novel, general,
and cost-effective tool to detect logic bugs of join optimizations in DBMS. To address the first
challenge above, we propose the DSG (Data-guided Schema and query Generation) approach.
Given a dataset denoted as one wide table, DSG splits the dataset into multiple tables based on
detected normal forms. To speed up bug discovery, DSG also inserts some artificial noise data into
the generated database. We first convert the database schema into a graph whose nodes are the
tables/columns and edges are the relationships between the nodes. DSG adopts random walking on
schema graph to select tables for queries, and uses those tables to generate join expressions. For a
specific join query spanning over multiple tables, we can easily identify its ground-truth results
from the wide table. In this way, DSG can effectively generate (query, result) pairs for database
verification.
For the second challenge, we design the KQE (Knowledge-guided Query space Exploration)
approach. We first extend the schema graph to a plan-iterative graph, which represents the entire
query space. Each join query is then represented as a sub-graph. KQE adopts an embedding-based
graph index to score the generated query graphs by searching whether there are structurally similar
query graphs in already-explored space. The coverage score guides the random walk generator to
explore the unknown query space as much as possible.
To demonstrate the generality and effectiveness of our approach, we evaluated TQS on four
popular DBMSs: MySQL [41], MariaDB [37], TiDB [27] and PolarDB [15]. After 24 hours of running,
TQS successfully found 115 bugs, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and
23 in PolarDB. Through root cause analysis, there are 7 types of bugs in MySQL, 5 in MariaDB,
5 in TiDB, and 3 in PolarDB respectively. All the detected bugs are submitted to the respective
community and we received their positive feedbacks.
2 OVERVIEW
In this section, we formulate the problem definition and present an overview of our proposed
solution.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:4 Xiu Tang et al.
Fig. 2. Overview of TQS. TQS designs DSG (Data-guided Schema and query Generation) and KQE (Knowledge-
guided Query space Exploration) to detect the logic bugs of join optimizations in DBMS.
silent bugs (acting as hidden bombs) are more dangerous, since they are hard to detect and may
affect the correctness of applications.
In this paper, we focus on detecting logic bugs introduced by the query optimizer for multi-table
join queries. Specifically, we refer to those bugs as join optimization bugs. Using the notations
listed in Table 1, the join optimization bug detection is formally defined as:
Definition 2.1. For each query 𝑞𝑖 in the query workload 𝑄, we let the query optimizer execute
joins of 𝑞𝑖 with multiple physical plans, and verify its result sets 𝑆𝑞𝑖 with its ground truth 𝐺𝑇𝑞𝑖 . If
𝑆𝑞𝑖 ≠ 𝐺𝑇𝑞𝑖 , we find a join optimization bug.
Notation Definition
𝑑 input dataset
𝑇𝑤 wide table which has multiple columns
𝑆 generated database schema
𝐺𝑠 database schema graph
𝐺 plan-iterative graph
𝐺𝑞 query graph
𝑞𝑖 generated SQL query 𝑞𝑖
𝑡𝑟𝑎𝑛𝑠_𝑞 transformed SQL query with hints for query 𝑞
𝑆𝑞 query result set for a query 𝑞
𝐺𝑇𝑞 ground truth of query 𝑞’s query result set
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:5
enable the DBMS to execute multiple different physical plans for bug searching. The ground-truth
results of a join are identified by mapping the join graph back to the wide table.
After the schema is setup and data are split, KQE extends the schema graph to a plan-iterative
graph. Each query is represented as a sub-graph. KQE builds an embedding-based graph index
for the embeddings of query graph in history (i.e., in already explored query space). The intuition
of KQE is to assure a newly generated query graph is as far away from its nearest neighbors in
history as possible, namely, to explore new query graphs, instead of repeating existing ones. KQE
achieves its effectiveness by scoring the generated query graphs based on their structural similarity
(to query graphs in history) and applying an adaptive random walk method for generation.
We summarize the core idea of TQS in Algorithm 1, where we show the two main components:
DSG (line 2, line 10 and line 12), KQE (line 4, line 8 and line 9), respectively.
Given a dataset 𝑑 and a wide table 𝑇𝑤 sampled from 𝑑, DSG splits the single wide table 𝑇𝑤 into
multiple tables that form the database schema 𝑆 with normal form guarantees (line 2). Schema 𝑆
can be considered as a graph 𝐺𝑠 , where tables and columns denote vertices and edges represent
the relationships between them. DSG applies random walk on 𝐺𝑠 to generate the join expressions
of queries (line 10). In fact, a join query can be projected as a sub-graph of 𝐺𝑠 . By mapping the
sub-graph back to the wide table 𝑇𝑤 , DSG can easily retrieve the ground-truth results for the query
(line 12).
KQE extends the schema graph to a plan-iterative graph (line 4). To avoid tests on similar paths,
KQE builds an embedding-based graph index GI to index embeddings of the existing query graphs
(line 9). KQE updates the edge weights 𝜋 of the plan-iterative graph 𝐺 according to how much the
current query graph is structurally similar to existing query graphs (line 8). KQE scores the next
possible paths, which guides the random walk generator to favor exploring unknown query space.
For a query 𝑞𝑖 , TQS transforms the query with hint sets 𝑡𝑟𝑎𝑛𝑠_𝑞𝑖 to execute multiply different
physical query plans (line 11). Finally, the result set of query 𝑡𝑟𝑎𝑛𝑠_𝑞𝑖 is compared with the ground-
truth 𝐺𝑇𝑞𝑖 (line 14). If they are not consistent, a join optimization bug is detected (lines 15).
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:6 Xiu Tang et al.
TQS currently focuses on bug detection of equi-join queries. However, the idea of DSG and KQE
can be extended to non-equal joins. The only challenge is how to generate and manage ground-truth
results, whose sizes increase exponentially for non-equal joins. We leave it to our future work.
Fig. 3. Schema generation of the shopping order dataset. Data in black is the original dataset, and data in
color is the noisy data which is injected in schema tables and then synchronized in the wide table.
To effectively support bug detection of multi-table join queries, we generate a test database
as a wide table and leverage schema normalization to split the wide table into multiple tables
with normal form. In addition, we propose an effective noise injection techniques to increase the
probability of causing logic bugs with inconsistent database states. With the constructed database,
join queries are generated via random walk on the schema graph and represented as abstract syntax
trees. Finally, we efficiently retrieve their ground-truth result sets for these generated queries,
assisted by the proposed bitmap index.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:7
primary key 𝑅𝑜𝑤𝐼 𝐷 for all generated tables in order to recover ground-truth results. Moreover,
metadata about the implicit primary and foreign key relationships are also maintained.
Example 3.1. We use an example of wide table in Figure 3 to illustrate the idea. The FD dis-
covery algorithm finds four valid FDs: {𝑜𝑟𝑑𝑒𝑟𝐼𝑑, 𝑔𝑜𝑜𝑑𝑠𝐼𝑑, 𝑢𝑠𝑒𝑟𝐼𝑑 →𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒, 𝑢𝑠𝑒𝑟 𝑁 𝑎𝑚𝑒, 𝑝𝑟𝑖𝑐𝑒},
{𝑔𝑜𝑜𝑑𝑠𝐼𝑑 → 𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒, 𝑝𝑟𝑖𝑐𝑒}, {𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒 → 𝑝𝑟𝑖𝑐𝑒} and {𝑢𝑠𝑒𝑟𝐼𝑑 → 𝑢𝑠𝑒𝑟 𝑁 𝑎𝑚𝑒}. The FDs are
automatically selected for decomposing the wide table 𝑇𝑤 , and the explicit 𝑅𝑜𝑤𝐼𝐷 columns are
created. The example table is decomposed into schema 𝑆 with four tables: {𝑇1 (RowID, orderId,
goodsId, userId), 𝑇2 (RowID, userId, userName), 𝑇3 (RowID, goodsId, goodsName), 𝑇4 (RowID, good-
sName, price)}, where {orderId, goodsId, userId}, {goodsId}, {goodsName} and {userId} are implicit
primary keys, and {𝑇1 .𝑢𝑠𝑒𝑟𝐼𝑑 → 𝑇2 .𝑢𝑠𝑒𝑟𝐼𝑑 }, {𝑇1 .𝑔𝑜𝑜𝑑𝑠𝐼𝑑 → 𝑇3 .𝑔𝑜𝑜𝑑𝑠𝐼𝑑 } and {𝑇3 .𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒 →
𝑇4 .𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒} are implicit foreign key mappings.
To facilitate join query synthesis and the ground-truth result generation, we also create a RowID
mapping table 𝑇𝑅𝑜𝑤𝐼 𝐷𝑀𝑎𝑝 , which defines a mapping relation [𝑅𝑜𝑤𝐼𝐷,𝑇𝑖 , 𝑟𝑜𝑤 𝑗 ]. Here, 𝑟𝑜𝑤 𝑗 denotes
the row id of table 𝑇𝑖 . The mapping relation is the list of rows in the wide table 𝑇𝑤 which are split
to create the 𝑟𝑜𝑤 𝑗 th row of table 𝑇𝑖 . Based on the RowID map table, we build a join bitmap index
which will be used to speed up the retrieval of ground-truth results. The bitmap index consists of 𝑘
bit arrays of size 𝑛, where 𝑘 and 𝑛 are the number of schema tables and data rows, respectively.
Each row has been assigned a distinct 𝑅𝑜𝑤𝐼𝐷. For the bit array of value 𝑇 𝑗 , the 𝑖th bit is set to “1” if
the 𝑖th record of table 𝑇𝑤 has produced some rows in Table 𝑇 𝑗 ; otherwise, the 𝑖th bit is set to “0”. If
the table is too large and results in a sparse bitmap, we apply RLE (run-length encoding)-based
technique, the WAH encoding [57], to compress consecutive sequences of “0” or “1”.
Example 3.2. Figure 4 illustrates examples of RowID map table and join bitmap index of the data
in Figure 3. The row with RowID 5 of rowID map table records a split process, which splits the
row with RowID 5 in wide table 𝑇𝑤 to produce rows in table 𝑇1 , 𝑇2 , 𝑇3 and 𝑇4 with RowID 5, 1, 2
and 2 respectively. The row with RowID 0 of join bitmap index represents that tables (𝑇1 , 𝑇3 , 𝑇4 )
have rows split from the wide table with RowId 0, while table 𝑇2 has no corresponding row. Data in
color represents data updates after noise injection and will be discussed later.
Fig. 4. RowID map table 𝑇𝑅𝑜𝑤𝐼 𝐷𝑀𝑎𝑝 and the join bitmap index are built to retrieve the ground-truth of query
joins. Data in color represents data updates after noise injection.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:8 Xiu Tang et al.
𝐷𝑠 to violate the FDs and primary-foreign key relationships. The injected noises produce traceable
join results. The idea of noise injection is to corrupt a small fraction (𝜖) of the primary-foreign
key relationship by replacing original values with (1) boundary values (e.g., for integer value and
char(10) type, we replace the value with 65535 and ‘ZZZZZZZZZZ’), and (2) NULL values. For each
primary and foreign key column, we randomly pick 𝜖 tuples to perform the value replacement. The
produced noisy database 𝐷𝑠 follows the same schema 𝑆 as that of 𝐷𝑠 . When we inject noises, we
guarantee that the values of injected noises are unique and do not violate the ground-truth results
of normal data.
The introduction of noises violates the consistency between the generated tables and the original
wide table 𝑇𝑤 . Since our ground-truth result is recovered from the wide table, we need to update
𝑇𝑤 according to the injected noise so that the wide table and the noisy database become consistent.
Suppose a noise is introduced into column 𝑐𝑜𝑙𝑘 of the 𝑟𝑜𝑤 𝑗 th row in table 𝑇𝑖 , we have two cases:
Case 1: If 𝑐𝑜𝑙𝑘 is the implicit primary key column, the affected rows in 𝑇𝑤 can be represented as
𝑅¯ = 𝑅𝑜𝑤𝑀𝑎𝑝 (𝑇𝑖 , 𝑟𝑜𝑤 𝑗 ), and the affected columns in 𝑇𝑤 can be represented as 𝐶¯ = 𝐹𝑑 (𝑐𝑜𝑙𝑘 ), where
𝐹𝑑 (𝑐𝑜𝑙𝑘 ) denotes the columns which are functional dependent on column 𝑐𝑜𝑙𝑘 . Then, cells in Table
𝑇𝑤 should be updated as follows:
𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛 : 𝑇𝑤 [𝑁 + 1] [𝑐𝑜𝑙𝑘 ] = 𝑇𝑖 [𝑟𝑜𝑤 𝑗 ] [𝑐𝑜𝑙𝑘 ],
𝑇𝑤 [𝑁 + 1] [𝑐𝑘 ] = 𝑇𝑤 [𝑟 ] [𝑐𝑘 ] | 𝑟 ∈ 𝑅,
¯ ∀𝑐𝑘 ∈ 𝐶;
¯
𝑢𝑝𝑑𝑎𝑡𝑒 : 𝑇𝑤 [𝑟 𝑗 ] [𝑐𝑘 ] = 𝑁𝑈 𝐿𝐿 | ∀𝑟 𝑗 ∈ 𝑅,
¯ ∀𝑐𝑘 ∈ 𝐶.¯
Here, 𝑁 denotes the number of tuples in 𝑇𝑤 and there involve two insertion operators and an
update operator. The insertion operator creates a new tuple by copying the noisy data from 𝑇𝑖 and
its function-determined values, leaving the remaining values as NULL. In the wide table, we can
find multiple rows (∀𝑟 𝑗 ∈ 𝑅)¯ that can be functionally derived from the noisy row. We only need to
copy one of them (𝑟 ∈ 𝑅). ¯ The update operator, on the other hand, modifies the rows of 𝑇𝑤 that
relates to the 𝑟𝑜𝑤 𝑗 th row of 𝑇𝑖 by tagging the corresponding column values as NULL, since the
primary-foreign key joins are invalid.
Case 2: If the noise is created in the foreign key column of table 𝑇𝑖 , the tuples in Table 𝑇𝑤 should
be updated as follows:
𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛 :𝑇𝑤 [𝑁 + 1] [𝑐𝑘 ] = 𝑇𝑤 [𝑟 ] [𝑐𝑘 ] | 𝑟 ∈ 𝑅,
¯ ∀𝑐𝑘 ∈ 𝑐𝑜𝑙𝑘 ∪ 𝐶;
¯
𝑢𝑝𝑑𝑎𝑡𝑒 :𝑇𝑤 [𝑟 𝑗 ] [𝑐𝑜𝑙𝑘 ] = 𝑇𝑖 [𝑟𝑜𝑤 𝑗 ] [𝑐𝑜𝑙𝑘 ] | ∀𝑟 𝑗 ∈ 𝑅,
¯
𝑇𝑤 [𝑟 𝑗 ] [𝑐𝑘 ] = 𝑁𝑈 𝐿𝐿 | ∀𝑟 𝑗 ∈ 𝑅, ¯ ∀𝑐𝑘 ∈ 𝐶. ¯
In case 2, we have an insertion and two update rules, respectively. This update process is explained
in Example 3.3.
After updating wide table 𝑇𝑤 , we should adjust the RowID map table as well. We denote those
columns in the RowID map table as 𝐶𝑑𝑒𝑝 = T (𝑐𝑜𝑙𝑘 ∪ 𝐶), ¯ where T (·) is the tables whose columns
is a subset of the given columns.
𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛 : 𝑇𝑅𝑜𝑤𝐼 𝐷𝑀𝑎𝑝 [𝑁 + 1] [𝐶𝑑𝑒𝑝 ] = 𝑇𝑅𝑜𝑤𝐼 𝐷𝑀𝑎𝑝 [𝑟 ] [𝐶𝑑𝑒𝑝 ] | 𝑟 ∈ 𝑅;
¯
𝑢𝑝𝑑𝑎𝑡𝑒 : 𝑇𝑅𝑜𝑤𝐼 𝐷𝑀𝑎𝑝 [𝑟 𝑗 ] [𝐶𝑑𝑒𝑝 ] = 𝑁𝑈 𝐿𝐿 | ∀𝑟 𝑗 ∈ 𝑅.
¯
The two rules are defined similarly to those for 𝑇𝑤 and we discard the details.
Example 3.3. Figure 3 illustrates the process of noisy injection. Initially, we inject noise into
tables 𝑇1 and 𝑇2 , which are highlighted with colored fonts. To maintain a correct ground-truth
result set, the corresponding rows of wide table 𝑇𝑤 also need to be synchronized according to the
injected noise in Tables 𝑇1 − 𝑇4 .
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:9
1 selectstmt
columnref table
table column T4
T4 goodsName
Fig. 5. Example of join query generation. The join expressions are generated by random walk on the schema
graph.
For table 𝑇2 , we inject a noisy value into the primary key 𝑢𝑠𝑒𝑟𝐼𝑑 of the first tuple. For 𝑇𝑤 , we
locate the rows corresponding to the noise via the RowID map table in Figure 4(a). The results are
the rows in 𝑇𝑤 with RowID in [0 − 2]. Using the insertion rule, a new tuple (tuple 8) is created in
𝑇𝑤 to refer to the noisy tuple in 𝑇2 . Similarly, based on the update rule, the tuples with RowID in
[0 − 2] need to change their userName to NULL values.
Similarly, table 𝑇1 is polluted with a noisy value in its foreign key column 𝑔𝑜𝑜𝑑𝑠𝐼𝑑. We locate
involved rows in wide table 𝑇𝑤 by mapping the noise through the RowID map table in Figure 4(a),
which is the tuple in 𝑇𝑤 with RowID 6. Because 𝑔𝑜𝑜𝑑𝑠𝐼𝑑 is the primary key of 𝑇3 , when joining
them together, the missing contents should be recovered. Therefore, a new row with RowID 9 is
created in 𝑇𝑤 to maintain contents of columns 𝑐𝑜𝑙𝑘 ∪ 𝐶. ¯ On the other hand, the tuples with column
𝑔𝑜𝑜𝑑𝑠𝐼𝑑 should be updated to noisy data, and the columns that can be functionally determined by
𝑔𝑜𝑜𝑑𝑠𝐼𝑑 should be updated to NULL.
Finally, the RowID map table should be updated by noises, and the data in the join bitmap index
which cannot be found by its RowID in the wide table should be set to 0.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:10 Xiu Tang et al.
a join relationship is obtained and we move to the new table vertex 𝑣 𝑗 . On the other hand, if the
random walk process picks a (table-column) edge (say (𝑣𝑖 , 𝑣 𝑗 ) s.t. 𝑣𝑖 ∈ 𝑉𝑡 and 𝑣 𝑗 ∈ 𝑉𝑥 ), random
filters (i.e., a selection condition) are generated on the column 𝑣 𝑗 and the random walk process
continues from the preceding table vertex 𝑣𝑖 (but now excluding 𝑣 𝑗 ).
After the join expressions are generated by random walk on schema graph 𝐺𝑠 , DSG randomly
generates other expressions based on the join clauses. Generating these expressions is implemented
similarly to RAGS [53] and SQLSmith [52]. Note that we support sub-query inside the IN/Exist
expressions of the where clause.
Example 3.4. Figure 5 depicts a running example of join query generation. The random walk pro-
cess selects three tables (𝑇1 , 𝑇3 , 𝑇4 ) to join using the join conditions involving columns 𝑔𝑜𝑜𝑑𝑠𝐼𝑑 and
𝑔𝑜𝑜𝑑𝑠𝑁 𝑎𝑚𝑒 (e.g., the nodes 5 and 6 in Figure 5). The join type of node
5 can be inner/outer/cross
join, node 6 can be semi/anti-join. As Figure 5 shows, the table expressions of the from clause
(node )
3 are set to the tables which are involved in the join processing. Then, select clause (node
)
2 and where clause (node ) 7 are randomly constructed for tables of from clause and the random
walk results (i.e., the columns and their types). And the aggregation operators are also supported
(node ).
8 Finally, we transform the AST back to a SQL statement.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:11
Example 3.5. Now consider the query of "SELECT price FROM T3 INNER JOIN T4 WHERE
T3.goodsName = T4.goodsName AND T3.goodsName = ‘flower’". Firstly, using the rules in Table 2,
we get the join bitmap of the query: 𝐵𝑖𝑡 (𝑇 3) ∧ 𝐵𝑖𝑡 (𝑇 4), and retrieve redundant data with RowID
{0 − 5, 7, 9} in wide table 𝑇𝑤 by using join bitmap index in Figure 4(b). Then, the duplicates are
removed based on the primary key 𝑔𝑜𝑜𝑑𝑠𝐼𝑑. Hence, the remaining results of the query include
tuples with RowID {0, 1, 5} in the wide table 𝑇𝑤 . Finally, filters and projections are also applied,
and the ground truth of the query is “10”.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:12 Xiu Tang et al.
join column
join column
projection
projection
group by
group by
count
filter
count
filter
projection inner join
group by left outer join
bigint
count right outer join
goodsId T1 T3
join column cross join
filter semi join
join column
join column
projection
projection
anti join
group by
group by
count
count
filter
filter
char userId goodsName blob
Fig. 6. Plan-iterative graph. Queries can be mapped to sub-graphs in the plan-iterative graph.
Unfortunately, we cannot directly evaluate the exact graph isomorphism for every generated
query because it has been proven that determining the sub-graph existence in a graph (i.e., sub-
graph isomorphism matching) is NP-complete [11, 22]. Recently, learning-based models have been
proposed for sub-graph analysis tasks. For examples, NeuroMatch [61] and LMKG [17] are developed
for the sub-graph isomorphism problem. These algorithms perceive the presence or absence of a sub-
graph problem as a binary classification problem. To support more general sub-graph isomorphism
search, GNNs (Graph Neural Networks) have been adopted by [20] and [60].
Motivated by learning-based approaches for sub-graph isomorphism search, we extend our
random walk scheme in Section 3.3 with an adaptive weighting strategy. In order to avoid repeatedly
exploring similar graph structures, we adjust the probabilities of random walks based on the
exploration history. Figure 7 illustrates our idea. To support the approximate evaluation of subgraph
isomorphism, KQE builds a graph index GI. Given a query 𝑞 and its corresponding sub-graph 𝐺𝑞
from the plan-iterative graph, GI first applies the similarity-oriented graph embedding approach [20]
to generate a unique high dimensional embedding 𝐸 (𝐺𝑞 ) for 𝑞. If two sub-graphs are isomorphic
or structurally similar, the cosine distance of their embeddings is expected to be below a threshold.
Then, GI applies the HD-Index [3] to support approximate KNN (K-Nearest Neighbor) search in the
high-dimensional space.
... ...
...
Historical Query
Graphs Embedding Model Graph Index
goodsId goodsId
join column join column Nearest Neighbor
Search
T inner join T
1 3
projection inner join join column adaptive weighting
userId T4 goodsName
filter join column
price goodsName
Query Graph Plan-iterative Graph
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:13
We define the coverage score of a generated query graph 𝐺𝑞 wrt historical sub-graphs (i.e., query
graphs of queries that had already been explored):
1
𝑘
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 (𝐺𝑞 ) = 𝑐𝑜𝑠𝑖𝑛𝑒_𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝐸 (𝐺𝑞 ), 𝐸 (𝐺𝑖 )), (2)
𝑘 𝑖=1
where 𝐺𝑖 is the 𝑖-th nearest neighbor of the query graph 𝐺𝑞 returned by GI. A higher coverage
value indicates that similar graph structures have been explored and hence, this indicator allows us
to avoid repeatedly generating similar queries.
To guide the query generator to explore more diversified search spaces, KQE performs the random
walk according to the adaptive weights of the next possible edges. Consider a random walk that
follows a path 𝑃𝑞 = (𝑣 0, 𝑒 1, 𝑣 1, ..., 𝑒𝑘 , 𝑣𝑘 ) to generate a sub-graph 𝐺𝑞 , with 𝑣𝑘 being the current vertex
that is accessed, KQE decides its next step by evaluating the transition probability 𝜋𝑒𝑥 for each
edge 𝑒𝑥 connecting with vertex 𝑣𝑘 . A possible next query graph 𝐺𝑞 can be created by adding edge
𝑒𝑥 and the corresponding vertex 𝑣 𝑥 (i.e., 𝑒𝑥 = (𝑣𝑘 , 𝑣 𝑥 )) to 𝐺𝑞 (i.e., 𝑃𝑞 = (𝑣 0, 𝑒 1, 𝑣 1, ..., 𝑒𝑘 , 𝑣𝑘 , 𝑒𝑥 , 𝑣 𝑥 )).
The transition probability on edge 𝑒𝑥 is set to
1
𝜋𝑒 𝑥 = . (3)
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 (𝐺𝑞 ) + 1
We also establish a termination mechanism with a probability 𝜋𝑒𝑛𝑑 , when the scores of all next
possible graphs are less than the current query graph, the graph expansion is stopped. The pseu-
docode for the adaptive random walk on plan-iterative graph 𝐺 is given in Algorithm 2. At every
step of the walk, alias sampling is conducted based on the transition probability 𝜋𝑒𝑥 . Sampling of
nodes while simulating the random walk can be done efficiently in 𝑂 (1) time complexity using
alias sampling.
The existence of KQE allows us to perform parallel query space exploration, where a central
server hosts the graph index and applies the adaptive random walk approach. When a new query is
generated, the server disseminates it to a random client, which maintains a replica of the database
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:14 Xiu Tang et al.
Table 3. We tested a diverse set of popular and emerging DBMS. All numbers are the lasted as of July 2022.
Popularity Rank
First
DBMS DB- Stack Github LOC
Release
Engines Overflow Stars
MySQL 2 1 8.0k 3.8M 1995
MariaDB 12 7 4.3k 3.6M 2009
TiDB 96 – 31.8k 0.8M 2017
and hosts an individual DSG process. This strategy effectively improves the efficiency of database
debugging, and the only bottleneck is the synchronization cost of the KQE on the server side.
5 EXPERIMENT
We evaluate the TQS on multiple DBMSs and specifically, our goal is to answer the following
questions:
• Can the TQS detect logic errors of implementations of multi-table queries from real-world
production-level DBMSs? (section 5.1)
• Can the TQS outperform state-of-the-art testing tools? (section 5.2)
• What are the main roles of two core modules (namely DSG and KQE) of TQS? (section 5.3)
Tested DBMSs. In our evaluation, we consider three open-source DBMSs designed for different
purposes (see Table 3) to demonstrate the generality of TQS. According to the DB-Engine’s Ranking
[18], the Stack Overflow’s annual Developer Survey [42], and GitHub, these DBMSs are among the
most popular and widely-used DBMSs. We also run TQS against our own cloud-native database
system (PolarDB). PolarDB [15] is designed to run on elastic computation and storage resources
with high scalability and concurrency support. All selected DBMSs support hints for their optimizers
to intentionally change the physical plans [36, 40, 54].
We have evaluated TQS with both randomly generated TPC-H data and the dataset from UCI
machine learning repository2 . However, since TPC-H dataset follows uniform distribution and
has a simple schema, all reported bugs on TPC-H dataset have been covered by the UCI dataset.
Therefore, we only show the results on the UCI data. As TQS runs, it continuously reports bugs
and to be fair for all DBMSs, we only report the results for the first 24 hours. All DBMSs are run
with default configurations and compilation options.
Baseline approaches. We compare TQS with the SQLancer3 which is the start-of-the-art
approach to detecting logic bugs in databases. SQLancer is not designed to test multi-table queries.
However, it can be tailored for multi-table queries by artificially generating queries and tuples
across more than one tables. Similar to the single table case, all queries and tuples are randomly
generated. We use three methods in SQLancer as our baselines. The first one is PQS [50], which
constructs queries to fetch a randomly selected tuple from a table. The tested DBMS may contain
a bug if it fails to fetch that tuple. The second one is TLP [50], which decomposes a query into
three partitioning queries, each of which computes its result on that tuple. The third one is NoRec
[47], which targets at logic bugs generated by the optimization process in DBMS. It performs a
comparison between the results of randomly-generated queries and their rewritten ones.
DBMS versions. Note that DBMS bugs will be fixed once reported to the community. In our
experiments, we report the results of the latest released versions during our testing, namely, MySQL
8.0.28, MariaDB 10.8.2, TiDB 5.4.0 and PolarDB beta 8.0.18 respectively.
2 https://archive.ics.uci.edu/ml/datasets/KDD+Cup+1998+Data
3 https://github.com/sqlancer/sqlancer
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:15
Table 4. Detected bugs. TQS found 20 bug types: 7 from MySQL, 5 from MariaDB, 5 from TiDB and 3 from
PolarDB.
Database ID Status Severity Description
1 Fixed S1 (Critical) Semi-join gives wrong results.
2 Fixed S2 (Serious) Incorrect inner hash join when using materialization strategy.
3 Verified S2 (Serious) Incorrect semi-join execution results in unknown data.
MySQL
4 Verified S2 (Serious) Incorrect left hash join with subquery in condition.
8.0.28
5 Verified S2 (Serious) Incorrect nested loop antijoin.
6 Fixed S2 (Serious) Bad caching of converted constants in NULL-safe comparison.
7 Verified S2 (Serious) Incorrect hash join with materialized subquery.
8 Verified Major Incorrect join execution by not allowing BKA and BKAH join.
9 Verified Major Incorrect join execution by not allowing BNLH and BKAH join.
MariaDB
10 Verified Major Incorrect join execution when controlling outer join operations.
10.8.2
11 Verified Major Incorrect join execution by limiting the usage of the join buffers.
12 Verified Major Incorrect join execution when controlling join cache.
13 Fixed Critical Incorrect Merge Join Execution.
14 Fixed Critical Merge Join executed incorrect resultset which missed -0.
TiDB
15 Fixed Critical Merge Join executed an incorrect empty resultset.
5.4.0
16 Fixed Critical Merge Join executed an incorrect NULL resultset.
17 Fixed Critical Merge Join executed an incorrect resultset which missed rows.
18 Fixed 2 (High) Left join convert to inner join returns wrong result sets.
PolarDB
19 Fixed 2 (High) Hash join returns wrong result sets.
8.0.18
20 Verified 2 (High) Incorrect semi-join with materialize execution.
Setup. All experiments are conducted on our in-house server equipped with Intel Xeon CPU
E5-2682 (2.50GHz) 16 cores and 128GB memory. For our cloud-native database PolarDB, we run an
instance on our public cloud with similar computation capability to our in-house server.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:16 Xiu Tang et al.
query :
SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 IN ( SELECT t 0 . c 0 FROM t 0 WHERE ( t 0 . c 0 NOT IN (
SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 ) ) = ( t 0 . c 0 ) ) ;
ground t r u t h :
Empty s e t
transformed query :
SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 IN ( SELECT / ∗ + s e m i j o i n ( ) ∗ / t 0 . c 0 FROM t 0 WHERE (
t 0 . c 0 NOT IN ( SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 ) ) = t 0 . c 0 ) ;
result :
+−−−−−−−−−−−−+
| c0 |
+−−−−−−−−−−−−+
| 0000001985 |
| 0000001996 |
+−−−−−−−−−−−−+
transformed query :
SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 IN ( SELECT / ∗ + n o _ s e m i j o i n ( ) ∗ / t 0 . c 0 FROM t 0 WHERE
( t 0 . c 0 NOT IN ( SELECT t 0 . c 0 FROM t 0 WHERE t 0 . c 0 ) ) = t 0 . c 0 ) ;
result :
Empty s e t
Listing 1. MySQL’s incorrect semi-join execution.
By analyzing query plans of the above query, we find that hash join with semi-join produces in-
correct results when using materialization technique for optimization. MySQL community responds
to us that there is a documented fix in the changelog of MySQL 8.0.30. It explains that incorrect
results may be generated from execution of a semi-join with materialization, when the WHERE
clause has an equal condition. In some cases, such as when the equal condition is denoted as an IN
or NOT IN expression, the equality is neither pushed down for materialization, nor evaluated as
part of the semi-join. This could also cause issues with inner hash joins.
CREATE TABLE t 0 (
c 0 t e x t NOT NULL ,
primary key ( c 0 ) ) ;
CREATE TABLE t 1 (
c0 t i n y i n t ( 3 ) unsigned z e r o f i l l ,
c 1 varchar ( 1 5 ) NOT NULL ,
primary key ( c 0 ) ,
key t 1 _ f k 1 ( c 1 ) ,
c o n s t r a i n t t 1 _ i b f k _ 1 f o r e i g n key ( c 1 )
r e f e r e n c e s t 0 ( c0 ) ) ;
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:17
query :
SELECT t 1 . c 0 FROM t 1 LEFT OUTER JOIN t 0 ON t 1 . c 1 = t 0 . c 0 WHERE ( t 1 . c 1 IN ( SELECT
t 0 . c 0 FROM t 0 ) ) OR ( t 0 . c 0 ) ;
ground t r u t h :
+−−−−−−+
| c0 |
+−−−−−−+
| NULL |
| NULL |
+−−−−−−+
Listing 2. MySQL’s incorrect left hash join execution.
We also test with foreign key constraints. Listing 2 illustrates a bug when trying to optimize a
left hash join using subquery_to_derived condition. MySQL retrieves an additional “NULL” value.
5.1.2 Bugs in MariaDB. Different from MySQL, many bugs were reported on the nested loops join
and hash join in MariaDB. Listing 3 shows an example bug, which is produced by transforming
block nested loop hash join to block nested loop join in MariaDB. Incorrect result set is obtained,
as when MariaDB executes the join, data are mistakenly changed to empty.
CREATE TABLE t 1 (
c 0 varchar ( 1 0 0 ) NOT NULL ,
KEY i c 1 ( c 0 ) ) ;
CREATE TABLE t 2 (
c 0 varchar ( 1 0 0 ) NOT NULL ) ;
query :
SELECT t 2 . c 0 FROM t 2 RIGHT OUTER JOIN t 1 ON t 1 . c 0 = t 2 . c 0 ;
ground t r u t h :
+−−−−−−+
| c0 |
+−−−−−−+
| NULL |
| NULL |
+−−−−−−+
transformed query :
SET o p t i m i z e r _ s w i t c h = ' j o i n _ c a c h e _ h a s h e d = o f f ' ;
SELECT t 2 . c 0 FROM t 2 RIGHT OUTER JOIN t 1 ON t 1 . c 0 = t 2 . c 0 ;
result :
+−−−−−−+
| c0 |
+−−−−−−+
| |
| NULL |
+−−−−−−+
Listing 3. MariaDB’s incorrect loop join execution.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:18 Xiu Tang et al.
Listing 4 shows another bug when transforming batch key access join to block nested loop join.
When MariaDB executes the join, the optimizer mistakenly changes the null value to empty.
CREATE TABLE t 1 (
c 0 b i g i n t ( 2 0 ) DEFAULT NULL ) ;
CREATE TABLE t 2 (
c 0 d o u b l e NOT NULL ,
c 1 varchar ( 1 0 0 ) NOT NULL ,
PRIMARY KEY ( c 1 ) ) ;
CREATE TABLE t 3 (
c 0 mediumint ( 9 ) NOT NULL ,
c 1 t i n y i n t ( 1 ) NOT NULL ) ;
CREATE TABLE t 4 (
c 0 d o u b l e NOT NULL ,
c 1 varchar ( 1 0 0 ) NOT NULL ,
PRIMARY KEY ( c 1 ) ) ;
query :
SELECT t 3 . c 0 FROM t 3 RIGHT OUTER JOIN t 4 ON t 3 . c 1 = t 4 . c 1 JOIN t 2 ON t 2 . c 1 = t 4 . c 1
CROSS JOIN t 1 ;
ground t r u t h :
+−−−−−−+
| c0 |
+−−−−−−+
| NULL |
| NULL |
+−−−−−−+
transformed query :
SET o p t i m i z e r _ s w i t c h = ' j o i n _ c a c h e _ b k a = o f f ' ;
SELECT t 3 . c 0 FROM t 3 RIGHT OUTER JOIN t 4 ON t 3 . c 1 = t 4 . c 1 JOIN t 2 ON t 2 . c 1 = t 4 . c 1
CROSS JOIN t 1 ;
result :
Empty s e t
Listing 4. MariaDB’s incorrect index join execution.
5.1.3 Bugs in TiDB. Merge join and join index are the main causes of bugs in TiDB. Listing 5
shows an example query, where incorrect result set is obtained when hash join is converted to
merge join (due to space constraints, the ground truth results and incorrect results are discarded
here).
CREATE TABLE t 1 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 varchar ( 5 1 1 ) DEFAULT NULL ,
PRIMARY KEY ( i d ) ) ;
CREATE TABLE t 2 (
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:19
CREATE TABLE t 3 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 varchar ( 5 1 1 ) DEFAULT NULL ,
PRIMARY KEY ( i d ) ) ;
We find that when TiDB executes the above merge join, intermediate data are mistakenly
materialized as null. TiDB developers stated that it was because outer merge join cannot keep the
prop of its inner child and the bug has been fixed.
5.1.4 Bugs in PolarDB. In our PolarDB, most bugs are produced from hash join and semi-join.
Listing 6 illustrates an example query which transforms left join to inner join. PolarDB fails to
return a correct result set. We submit it to our developers and they located the cause of the bug:
the inner join cannot distinguish null from 0.
CREATE TABLE t 1 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 i n t ( 1 6 ) NOT NULL ,
PRIMARY KEY ( i d , c o l 1 ) ) ;
CREATE TABLE t 2 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 i n t ( 1 6 ) NOT NULL ,
PRIMARY KEY ( i d , c o l 1 ) ) ;
CREATE TABLE t 3 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 varchar ( 5 1 1 ) DEFAULT NULL ,
PRIMARY KEY ( i d ) ) ;
query :
SELECT t 1 . i d FROM ( t 1 LEFT JOIN t 2 ON t 1 . c o l 1 = t 2 . i d ) JOIN t 3 ON t 2 . c o l 1 = t 3 . c o l 1
where t 1 . c o l 1 = 1 ;
ground t r u t h :
empty s e t
transformed query :
SELECT / ∗ +JOIN_ORDER ( t 3 , t 1 , t 2 ) ∗ / t 1 . i d FROM ( t 1 LEFT JOIN t 2 ON t 1 . c o l 1 = t 2 . i d )
JOIN t 3 ON t 2 . c o l 1 = t 3 . c o l 1 where t 1 . c o l 1 = 1 ;
result :
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:20 Xiu Tang et al.
+−−−−−−+
| id |
+−−−−−−+
| NULL |
| NULL |
+−−−−−−+
As another example, Listing 7 shows that a test case caused the semi-join to return the wrong
result sets. When the query is executed by an inner semi hash join without materialization strategy,
the PolarDB returns the incorrect result.
CREATE TABLE t 0 ( c 0 f l o a t ) ;
CREATE TABLE t 1 ( c 0 f l o a t ) ;
query :
SELECT ALL t 1 . c 0 FROM t 1 RIGHT OUTER JOIN t 0 ON t 1 . c 0 = t 0 . c 0 WHERE t 1 . c 0 IN (
SELECT t 0 . c 0 FROM t 0 WHERE ( t 1 . c 0 NOT IN ( SELECT t 1 . c 0 FROM t 1 ) ) = ( 1 ) IN ( t 1 . c 0
));
ground t r u t h :
Empty s e t
transformed query :
SET o p t i m i z e r _ s w i t c h = ' m a t e r i a l i z a t i o n = o f f ' ;
SELECT ALL t 1 . c 0 FROM t 1 RIGHT OUTER JOIN t 0 ON t 1 . c 0 = t 0 . c 0 WHERE t 1 . c 0 IN (
SELECT t 0 . c 0 FROM t 0 WHERE ( t 1 . c 0 NOT IN ( SELECT t 1 . c 0 FROM t 1 ) ) = ( 1 ) IN t 1 .
c0 ) ;
result :
+−−−−−−−−−−−+
| c0 |
+−−−−−−−−−−−+
| 292269000 |
+−−−−−−−−−−−+
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:21
(a) MySQL diversity (b) MariaDB diversity (c) TiDB diversity (d) PolarDB diversity
(e) MySQL efficiency (f) MariaDB efficiency (g) TiDB efficiency (h) PolarDB efficiency
Fig. 8. Comparison with existing tools in the query graph diversity and the efficiency of detecting bugs.
Efficiency. Figure 8(e), (f), (g) and (h) show the testing efficiency of MySQL, MariaDB, TiDB
and PolarDB, respectively. Not surprisingly, we obtain a similar result as the diversity experiment.
More query structures are being tested, more bugs can be discovered.
In fact, we run the TQS for 48 hours and continue to find logic bugs for all DBMSs. But there are
too many bugs and we have not submitted them to the developer communities for verification. So
we only show bug types of the first 24 hours. Figure 9 illustrates that the number of bugs increases
linearly with the testing time, while the number of bug types is not. This indicates that most bugs
are caused by a small set of improperly implemented operators.
Fig. 9. Bug types vs bug counts on MySQL. Fig. 10. Effect of parallel search.
In Section 4, we briefly explain the idea of using KQE to build a distributed computation frame-
work to speed up the testing process. We show our results in Figure 10 by deploying the framework
on the different number of clients. We run our experiments on MySQL for 24 hours and the results
show that using parallel computation can effectively facilitate the testing process.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:22 Xiu Tang et al.
Query Graph
DBMS Approach Bug Count
Diversity
TQS 460k 31
MySQL 𝑇 𝑄𝑆 !𝑁 𝑜𝑖𝑠𝑒 460k 14
8.0.28 𝑇 𝑄𝑆 !𝐺𝑇 460k 21
𝑇 𝑄𝑆 !𝐾𝑄𝐸 228k 16
TQS 475k 30
MariaDB 𝑇 𝑄𝑆 !𝑁 𝑜𝑖𝑠𝑒 475k 15
10.8.2 𝑇 𝑄𝑆 !𝐺𝑇 475k 18
𝑇 𝑄𝑆 !𝐾𝑄𝐸 234k 12
TQS 462k 31
TiDB 𝑇 𝑄𝑆 !𝑁 𝑜𝑖𝑠𝑒 462k 20
5.4.0 𝑇 𝑄𝑆 !𝐺𝑇 462k 22
𝑇 𝑄𝑆 !𝐾𝑄𝐸 231k 18
TQS 465k 23
PolarDB 𝑇 𝑄𝑆 !𝑁 𝑜𝑖𝑠𝑒 465k 12
8.0.18 𝑇 𝑄𝑆 !𝐺𝑇 465k 18
𝑇 𝑄𝑆 !𝐾𝑄𝐸 225k 15
Noise vs No-Noise. We artificially generate noise data during our testing. As shown in Table
5, the results indicate that the number of discovered bugs dramatically decreases, if we remove
the noise-injection module (denoted as 𝑇 𝑄𝑆 !𝑁 𝑜𝑖𝑠𝑒 ). This verifies that a large portion of logic bugs
are generated by outliners or unexpected values. DBMS developers should be alerted of boundary
testing.
CREATE TABLE t 1 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 i n t ( 1 6 ) DEFAULT NULL ,
c o l 2 d o u b l e DEFAULT NULL ,
PRIMARY KEY ( i d ) ) ;
CREATE TABLE t 2 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 i n t ( 1 6 ) NOT NULL ,
c o l 2 d o u b l e DEFAULT NULL ,
c o l 3 varchar ( 5 1 1 ) DEFAULT NULL ,
PRIMARY KEY ( i d , c o l 1 ) ) ;
CREATE TABLE t 3 (
i d b i g i n t ( 6 4 ) NOT NULL AUTO_INCREMENT ,
c o l 1 d o u b l e DEFAULT NULL ,
c o l 2 varchar ( 5 1 1 ) DEFAULT NULL ,
PRIMARY KEY ( i d ) ) ;
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:23
GT vs No-GT. We first show the power of ground-truth verification. TQS without GT(ground-
truth) is to judge the correctness of query results by comparing the results executed with different
query plans (denoted as 𝑇 𝑄𝑆 !𝐺𝑇 ), namely using the differential testing. We found that 7 bugs could
not be detected using the differential testing on PolarDB. For example, Listing 8 shows a bug of
hash join in PolarDB 8.0.18. The query result remains the same for different plans. But the query
result is different from the ground-truth of this query, indicating a logic bug here. As shown in
Table 5, some bugs cannot be revealed by differential testing, while using ground-truth results, we
can successfully identify them.
KQE vs No-KQE. KQE (knowledge-guided query space exploration) allows us to avoid the
exploration of similar query structures. In Table 5, we observe that TQS is superior to the 𝑇𝑄𝑆 !𝐾𝑄𝐸
on the four databases, indicating the effectiveness of applying KQE to generate queries. Note that
because iterating all possible isomorphic sets for a graph is an NP-complete problem, there is no
way to perform exhaustive testing. The intuition of KQE is to generate new queries as much as
possible, not to iterate all isomorphic sets.
In summary, noise injection, ground-truth results and KQE modules are important for bug
detection. They either improve the effectiveness of TQS, or speed up the testing process.
6 RELATED WORK
Differential Testing of DBMS. Differential testing is a widely adopted approach for detecting
logic bugs in software systems. It compares results of the same query from multiple versions of
the system or uses different physical plans to discover possible bugs. Differential testing has been
shown to be effective in many areas [12, 16, 30, 33, 38, 59]. It is first used in RAGS to find bugs
of DBMSs [53]. APOLLO also applies differential testing to find performance regression bugs by
executing SQL queries on multiple versions of DBMSs [29]. There were 10 previously unknown
performance regression bugs found in SQLite and PostgreSQL. Although differential testing shows
its effectiveness, our experiments show that some logic bugs must be revealed by using ground-truth
results.
Generator-based Testing of DBMS. Various database data generators [6, 13, 23, 26, 31] and
query generators [5, 14, 29, 39, 45, 52, 55] have been proposed to artificially create test cases,
but test oracles, which should give feedback on the correctness of the system, have received less
attention. Generation-based testings [24, 32, 34, 39, 46, 58, 62] have been adopted for extensive
testings on DBMSs for purposes such as bug-finding and benchmarking. SQLSmith is a widely-used
generation-based DBMS tester [52]. It synthesizes a schema from initial databases and generates
limited types of queries, whose target is at the code coverage. Squirrel focuses on generating queries
to detect memory corruption bugs [62]. All above random generators are mostly applied to detect
crashing bugs, while our focus is the logic bug.
Logic Bug Testing of DBMS. SQLancer [49] is a current state-of-the-art tool in testing DBMS
for logic bugs and is the most closely related work to ours. SQLancer proposes three approaches to
detect logic bugs. PQS constructs queries to fetch a randomly selected tuple from a table [50]. TLP
decomposes a query into three partitioning queries, each of which computes its result on a selected
tuple [48]. NoRec compares the results of randomly-generated optimized queries and rewritten
queries that DBMS cannot optimize [47]. SQLancer targets at single table queries and 90.0% of its
bug reports include only one table. On the other hand, our TQS targets at detecting logic bugs of
multi-table joins, which are more prone to bugs.
Database Schema Normalization. The well-known database design framework for relational
databases is centered around the notion of data redundancy [7, 35]. The redundant data value
occurrences originate from functional dependencies (FDs) [8]. The data-driven normalization
algorithms [19, 44, 56] can remove FD-related redundancy effectively. Schema normalization has
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:24 Xiu Tang et al.
been well studied. In our approach, we adopt previous database normalization algorithm to spilt a
wide table into multiple tables, so that the ground-truth results of join queries over those tables
can be recovered from the wide table.
Synthetic Graph Generation. In our approach, we adopt a random walk-based approach to
generate valid join queries by enumerating sub-graphs from the schema graph. In fact, synthetic
graph generation and the corresponding graph exploration approaches have been studied for years.
For example, in support of the experimental study of graph data management system, a variety of
synthetic graph tools such as SP2Bench [51], LDBC [21], LUBM [25], BSBM [9], Grr [10], WatDiv [2]
and gMark [4] have been developed in the research community. In our scenario, iterating all possible
sub-graphs is an NP-hard problem, and hence, we adopt a novel neural encoding-based approach
to avoid generating similar sub-graphs, significantly reducing the overhead.
7 CONCLUSION
In this paper, we proposed a framework, TQS (Transformed Query Synthesis), for detecting
logic bugs of the implementations of multi-table join queries in DBMS. TQS employs two novel
techniques, DSG (Data-guided Schema and query Generation) and KQE (Knowledge-guided Query
space Exploration), to generate effective SQL queries and their ground-truth results for testing.
We evaluated the TQS on four DBMSs: MySQL, MariaDB, TiDB and PolarDB. There are 115 bugs
discovered from tested DBMSs within 24 hours. Based on root cause analysis from the developer
community, there are totally 7, 5, 5 and 3 types of bugs in MySQL, MariaDB, TiDB and PolarDB,
respectively. Compared to existing database debug tools, TQS is more efficient and effective in
detecting logic bugs generated by different join operators. It can be considered as an essential
DBMS development tool.
ACKNOWLEDGMENT
This work was supported by the Key Research Program of Zhejiang Province (Grant No. 2023C01037)
and the Fundamental Research Funds for Alibaba Group through Alibaba Innovative Research (AIR)
Program. We thank the anonymous reviewers and Shanshan Ying for their valuable feedback.
REFERENCES
[1] Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data
Management Systems. In ISWC. Springer, 197–212.
[2] Güneş Aluç, Olaf Hartig, M Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified stress testing of RDF data
management systems. In International Semantic Web Conference. 197–212.
[3] Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya. 2018. HD-Index: Pushing the Scalability-Accuracy
Boundary for Approximate kNN Search in High-Dimensional Spaces. Proc. VLDB Endow. 11, 8 (2018), 906–919.
[4] Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Aurélien Lemay, and Nicky Advokaat. 2017.
gMark: Schema-Driven Generation of Graphs and Queries. In ICDE. 63–64.
[5] Hardik Bati, Leo Giakoumakis, Steve Herbert, and Aleksandras Surna. 2007. A genetic approach for random testing of
database systems. In VLDB. ACM, 1243–1251.
[6] Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer Özsu. 2007. QAGen: generating query-aware test databases.
In SIGMOD. ACM, 341–352.
[7] Joachim Biskup. 1995. Achievements of Relational Database Schema Design Theory Revisited. In Semantics in Databases.
Springer, 29–54.
[8] Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing Independent Database Schemas. In
SIGMOD, Philip A. Bernstein (Ed.). ACM, 143–151.
[9] Christian Bizer and Andreas Schultz. 2009. The berlin sparql benchmark. IJSWIS 5, 2 (2009), 1–24.
[10] Daniel Blum and Sara Cohen. 2011. Grr: generating random RDF. In Extended Semantic Web Conference. Springer,
16–30.
[11] Manuel Bodirsky. 2015. Graph homomorphisms and universal algebra course notes. TU Dresden (2015).
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
Detecting Logic Bugs of Join Optimizations in DBMS 55:25
[12] Robert Brummayer and Armin Biere. 2009. Fuzzing and delta-debugging SMT solvers. In Proceedings of the 7th
International Workshop on Satisfiability Modulo Theories. 1–5.
[13] Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In VLDB. ACM, 1097–1107.
[14] Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating Queries with Cardinality Constraints for DBMS
Testing. IEEE Trans. Knowl. Data Eng. 18, 12 (2006), 1721–1725.
[15] Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun
Liu, Jing Fang, Bo Wang, Yuhui Wang, Haiqing Sun, Ze Yang, Zhushi Cheng, Sen Chen, Jian Wu, Wei Hu, Jianwei Zhao,
Yusong Gao, Songlu Cai, Yunyang Zhang, and Jiawang Tong. 2021. PolarDB Serverless: A Cloud Native Database for
Disaggregated Data Centers. In SIGMOD. ACM, 2477–2489.
[16] Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012.
Testing Static Analyzers with Randomly Generated Programs. In NASA Formal Methods - 4th International Symposium,
NFM. Springer, 120–125.
[17] Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. 2022. LMKG: Learned Models for Cardinality Estimation
in Knowledge Graphs. In EDBT. OpenProceedings.org, 2:169–2:182.
[18] DB-Engines. 2018. DB-Engines Ranking. [EB/OL]. https://db-engines.com/en/ranking.
[19] Jim Diederich and Jack Milton. 1988. New Methods and Fast Algorithms for Database Normalization. ACM Trans.
Database Syst. 13, 3 (1988), 339–365.
[20] Chi Thang Duong, Dung Hoang, Hongzhi Yin, Matthias Weidlich, Quoc Viet Hung Nguyen, and Karl Aberer. 2021.
Efficient Streaming Subgraph Isomorphism with Graph Neural Networks. VLDB 14, 5 (2021), 730–742.
[21] Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and
Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In SIGMOD. 619–630.
[22] M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H.
Freeman.
[23] Jim Gray, Prakash Sundaresan, Susanne Englert, Kenneth Baclawski, and Peter J. Weinberger. 1994. Quickly Generating
Billion-Record Synthetic Databases. In SIGMOD. ACM Press, 243–252.
[24] Zhongxian Gu, Mohamed A Soliman, and Florian M Waas. 2012. Testing the accuracy of query optimizers. In Proceedings
of the Fifth International Workshop on Testing Database Systems. 1–6.
[25] Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Journal
of Web Semantics 3, 2-3 (2005), 158–182.
[26] Kenneth Houkjær, Kristian Torp, and Rico Wind. 2006. Simple and Realistic Data Generation. In VLDB. ACM, 1243–1246.
[27] Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang,
et al. 2020. TiDB: a Raft-based HTAP database. VLDB 13, 12 (2020), 3072–3084.
[28] Yka Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering
functional and approximate dependencies. The computer journal 42, 2 (1999), 100–111.
[29] Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. 2019. APOLLO: Automatic Detection and
Diagnosis of Performance Regressions in Database Systems. VLDB 13, 1 (2019), 57–70.
[30] Timotej Kapus and Cristian Cadar. 2017. Automatic testing of symbolic execution engines via program generation and
differential testing. In ASE. IEEE Computer Society, 590–600.
[31] Shadi Abdul Khalek, Bassem Elkarablieh, Yai O. Laleye, and Sarfraz Khurshid. 2008. Query-Aware Test Generation
Using a Relational Constraint Solver. In ASE. IEEE Computer Society, 238–247.
[32] Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL query generation for systematic testing of database
engines. In ASE. ACM, 329–332.
[33] Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In PLDI. ACM,
216–226.
[34] Eric Lo, Carsten Binnig, Donald Kossmann, M. Tamer Özsu, and Wing-Kai Hon. 2010. A framework for testing DBMS
features. VLDB J. 19, 2 (2010), 203–230.
[35] David Maier. 1983. The Theory of Relational Databases. Computer Science Press.
[36] MariaDB. 2022. MariaDB hints. [EB/OL]. https://mariadb.com/kb/en/optimizer-switch/.
[37] Mariadb. 2022. Mariadb Homepage. [EB/OL]. https://mariadb.org/.
[38] William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J. 10, 1 (1998), 100–107.
[39] Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating targeted queries for database testing. In
SIGMOD. ACM, 499–510.
[40] MySQL. 2022. MySQL hints. [EB/OL]. https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html.
[41] MySQL. 2022. MySQL Homepage. [EB/OL]. https://www.mysql.com.
[42] Stack Overflow. 2021. Developer Survey Results. [EB/OL]. https://insights.stackoverflow.com/survey/2021.
[43] Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD,
Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 821–833.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.
55:26 Xiu Tang et al.
[44] Thorsten Papenbrock and Felix Naumann. 2017. Data-driven Schema Normalization. In EDBT. OpenProceedings.org,
342–353.
[45] Meikel Poess and John M. Stephens. 2004. Generating Thousand Benchmark Queries in Seconds. In VLDB. Morgan
Kaufmann, 1045–1053.
[46] Kim-Thomas Rehmann, Changyun Seo, Dongwon Hwang, Binh Than Truong, Alexander Boehm, and Dong Hun Lee.
2016. Performance Monitoring in SAP HANA’s Continuous Integration Process. SIGMETRICS Perform. Evaluation Rev.
43, 4 (2016), 43–52.
[47] Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in database engines via non-optimizing reference
engine construction. In ACM Joint Meeting on ESEC and FSE. 1140–1152.
[48] Manuel Rigger and Zhendong Su. 2020. Finding bugs in database systems via query partitioning. Proceedings of the
ACM on Programming Languages 4, OOPSLA (2020), 1–30.
[49] Manuel Rigger and Zhendong Su. 2020. SQLancer. [EB/OL]. https://github.com/sqlancer/sqlancer.
[50] Manuel Rigger and Zhendong Su. 2020. Testing database engines via pivoted query synthesis. In OSDI 20. 667–682.
[51] Michael Schmidt, Thomas Hornung, Georg Lausen, and Christoph Pinkel. 2009. SPˆ 2Bench: a SPARQL performance
benchmark. In ICDE. IEEE, 222–233.
[52] Andreas Seltenreich. 2020. SQLSmith. [EB/OL]. https://github.com/anse1/sqlsmith.
[53] Donald R. Slutz. 1998. Massive Stochastic Testing of SQL. In VLDB, Ashish Gupta, Oded Shmueli, and Jennifer Widom
(Eds.). Morgan Kaufmann, 618–622.
[54] TiDB. 2022. TiDB hints. [EB/OL]. https://docs.pingcap.com/tidb/v5.3/optimizer-hints.
[55] Manasi Vartak, Venkatesh Raghavan, and Elke A. Rundensteiner. 2010. QRelX: generating meaningful queries that
provide cardinality assurance. In SIGMOD. ACM, 1215–1218.
[56] Ziheng Wei and Sebastian Link. 2019. Embedded Functional Dependencies and Data-completeness Tailored Database
Design. VLDB 12, 11 (2019), 1458–1470.
[57] Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2002. Compressing Bitmap Indexes for Faster Search Operations. In
SSDBM. IEEE Computer Society, 99–108.
[58] Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, and Allison W. Lee. 2018. Snowtrail: Testing with Production
Queries on a Cloud Database. In SIGMOD. ACM, 4:1–4:6.
[59] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In
SIGPLAN, PLDI. ACM, 283–294.
[60] Yixing Yang, Yixiang Fang, Maria E. Orlowska, Wenjie Zhang, and Xuemin Lin. 2021. Efficient Bi-triangle Counting
for Large Bipartite Networks. VLDB 14, 6 (2021), 984–996.
[61] Rex Ying, Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, and Jure Leskovec. 2020. Neural Subgraph
Matching. CoRR abs/2007.03092 (2020).
[62] Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. 2020. SQUIRREL: Testing
Database Management Systems with Language Validity and Coverage Feedback. In CCS. ACM, 955–970.
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 55. Publication date: May 2023.