0% found this document useful (0 votes)
45 views

OpenSAP Hanasql1 Week 3 Transcript en

This document provides an overview and introduction to the concepts of dominant operators and possible key reducers for query performance analysis in SAP HANA. It explains that dominant operators are those that take the longest time to process data and generate large intermediate results. Possible key reducers are ancestor operators of dominant operators that could help reduce the results. The document demonstrates how to identify dominant operators using the PlanViz tool in SAP HANA, such as by checking operators with the highest times at the column search level or along highlighted lines. Understanding dominant operators and possible key reducers can help optimize queries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

OpenSAP Hanasql1 Week 3 Transcript en

This document provides an overview and introduction to the concepts of dominant operators and possible key reducers for query performance analysis in SAP HANA. It explains that dominant operators are those that take the longest time to process data and generate large intermediate results. Possible key reducers are ancestor operators of dominant operators that could help reduce the results. The document demonstrates how to identify dominant operators using the PlanViz tool in SAP HANA, such as by checking operators with the highest times at the column search level or along highlighted lines. Understanding dominant operators and possible key reducers can help optimize queries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

openSAP

A First Step Towards SAP HANA Query


Optimization
Week 3 Unit 1

00:00:05 Hello, and welcome back to Week 3 for the course A First Step Towards SAP HANA Query
Optimization.
00:00:13 In this week, we will talk about methods for the query performance analysis. And as a first
unit of week three, we will explore query analysis overview.
00:00:27 When we say query execution, it is a process of getting records from tables through the
various operators.
00:00:37 So there are base tables and with the various operators, such as inner JOIN, LEFT OUTER
JOIN, GROUP BY, and so on,
00:00:46 with these operators, the final results are returned. Depending on the generated plan or
other factors
00:00:56 such as execution engines or used functions, the performance may be slow.
00:01:04 In this performance issue, we need to look into query operators and find out where and
which operators make the performance slow.
00:01:17 So we can say that the analysis of the query performance issue is a process of observing
how data is processed based on the generated SQL plan.
00:01:31 Let’s find out query as a part of business logic If the query is a part of the business logic,
00:01:38 We need to understand the overall business flow that could affect the execution of the
problematic query.
00:01:49 For example, there is business logic that contains many query executions. When the job is
started, different queries are executed.
00:02:00 Compiled plan can be varied depending on data statistics and the time when the query is
compiled.
00:02:09 At different times of query execution, the time of data insertion or time of query compilation
can vary.
00:02:19 Therefore, there are important checkpoints, which are query compilation, query execution,
and data flow.
00:02:28 Let's say there is a plan compiled with a small size of data, then this plan may not be
suitable for the big size of the data.
00:02:39 Different data statistics at the time of the compilation may not guarantee the same query
performance.
00:02:47 Therefore, we need to understand the data flow and check its compilation and execution
time In order to check the query compilation and execution time,
00:03:00 we can use monitoring view M_SQL_PLAN_CACHE. Or, if there is no SQL plan cache entry
found in that monitoring view,
00:03:09 you can use HOST_SQL_PLAN_CACHE since it contains the information for the last 42
days by default.
00:03:20 And as you already know, to search the compilation execution information, we can use the
following useful columns.
00:03:29 Statement string, is valid, last invalidation reason, statement hash, user name, plan ID,
execution count, preparation count,
00:03:39 last execution timestamp, and last preparation timestamp. Is valid columns tell us whether
the query plan is still valid or not.
00:03:51 And last invalidation reason shows the reason for the last invalidation. And through
execution count and preparation count,
00:04:02 we can get information about number of the execution count and preparation count
00:04:11 Another way of performance issue analysis is understanding data flow. Understanding data
flow is one of the key points in performance issue analysis.
00:04:26 There are various ways to understand the data flow, but the most useful tool is PlanViz.
00:04:35 By executing the visualized plan, you can get the information of data flow as well as the
overview of the query plan.
00:04:46 In most cases, we need to first check the column search in PlanViz in order to see the data
flow.
00:04:53 In this example, there are three column searches. To see the data flow, we look into the
inner plan of each column search.
00:05:10 As I explained last week, you can see more detailed plan when you look inside the column
search.
00:05:16 So you right click each column search, and you can open inner plan as logical or physical.
Since the logical plan shows the shape of the query optimizer tree
00:05:29 and it contains the structural information, we recommend that you analyze logical plan first
before you analyze the physical plan.
00:05:41 So when we look into the logical inner plan of this example, the query structure will be like
this.
00:05:50 Let’s check out how data is processed through the operators. Firstly, the majority of the
data, around 7,500,000 rows,
00:06:00 is extracted from table T1. And this data is firstly processed through the INNER JOIN with
other tables,
00:06:08 and it returns 2,349 rows. And in the column search 2, with the results from column search
1,
00:06:18 there is LEFT OUTER JOIN with other table T13 and T14. Lastly, at column search 3, the
final data is processed through LEFT OUTER JOIN
00:06:30 with table T15, table T16, and the result from the column search 2. With the PlanViz tool, we
can get the information of the data flow.
00:06:41 And it is important to look into query operators and find out where and which operators were
used to process the data.
00:06:52 That's the end of unit one of week three. In the upcoming unit, we will discover a topic
00:06:58 of the concepts of dominant operators and possible key reducers. Looking forward to
meeting you there.
00:07:05 Thank you for your attention, bye.

2
Week 3 Unit 2

00:00:05 Hello, and welcome to unit two of week three. In this week, I will present to you the concepts
of dominant operators
00:00:12 and possible key reducers as a way of performance issue analysis. Finding dominant
operators and possible key reducers
00:00:24 can be a key to create a better plan and rewrite a query in a better way. Here, what we call
the dominant operator is an operator that takes long to process data.
00:00:39 It usually generates a huge intermediate result, and reducing the result size of dominant
operator is key to optimizing the queries.
00:00:49 And we define possible key reducer as an ancestor operator of the dominant operator that
could reduce the result generated by the dominant operator.
00:01:02 If the plan does not have any key operators, then there is no possibility of benefiting from
reducing the intermediate result.
00:01:13 So now, let's find out how to find the dominant operator. When you collect visualized plan
using the PlanViz tool,
00:01:21 the dominant operators are displayed in the Dominant Operators section in PlanViz. It
shows the most expensive top three dominant operators, as you can see.
00:01:36 These are the top three dominant operators. In this example, each operator takes around
880 milliseconds,
00:01:44 637 milliseconds, and 110 milliseconds. You can also check what percentage those
operators occupied of the total execution time.
00:01:56 But as I mentioned in the last session, to get the overview of a query structure, we usually
look at the logical plan.
00:02:04 Therefore, we don't have to look at these specific operators. It is important to check out the
plan at the column search level.
00:02:13 And you can also click on any operator name to move to the corresponding visualized
operator in the graph.
00:02:22 So, when you click the most expensive operator here, it directly goes to the corresponding
part within the column search.
00:02:32 So you can easily check the dominant operator using PlanViz feature. Here, we recommend
that you look into the dominant operators at column search level.
00:02:45 And as another way, you can reach to the most dominant operators along highlighted lines
in the orange color, like this.
00:02:54 This is the column search level overview. So we can get to know the analytical search
contains the dominant operator.
00:03:05 You can also find the dominant operators by comparing the time consumed to process the
operators at column search level.
00:03:17 As you can see, the most time consumed column search is analytical search, which has
1,637 milliseconds as the exclusive time.
00:03:29 Therefore, we could know that this column search contains the dominant operator. When
you recall the definition of inclusive time and exclusive time in PlanViz,
00:03:43 inclusive time is the time taken to execute the complete operation including the time of the
children operators and excluding compilation time.
00:03:57 And exclusive time is the time taken to execute a single operation. In most cases, we are
checking exclusive time
00:04:07 in order to check the time for the execution of one single operation. Let's check out the
result size of each column search.
00:04:19 From the last slide, we know that analytical column search contains the dominant operator
because it has the longest exclusive time and the highlighted orange line indicates it.

3
00:04:36 So we now know the dominant operator, and we can find the possible key reducer. From
the last slide, we define the possible key reducers
00:04:47 as an ancestor operator of the dominant operator that could reduce the result from the
dominant operator.
00:04:57 Since the record from the dominant operator is reduced from the most top column search,
the possible key reducer could be this column search.
00:05:11 Okay, then now let's check out the logical inner plan for each column search and find the
dominant operator and possible key reducer within this column search.
00:05:25 The logical plan gives you the big picture and an overview of the plan, therefore it will be
very useful to understand the query optimizer tree and structural information.
00:05:40 To see the logical plan, you can just click the column search and choose Open Inner Plan,
and select Logical.
00:05:50 Then you will see the following logical inner plan. We name the column searches as column
search 1,
00:05:58 column search 2, and column search 3. When we look at the column search 1's logical inner
plan,
00:06:08 the base table T1 is ordered by T1.A in ascending order. After that, limit operation is applied
and it generates 20 rows.
00:06:21 When you check the column search 2's logical inner plan, there is INNER JOIN between
column table T2 and T3.
00:06:28 After the JOIN is processed, aggregation GROUP BY is applied. And in column search 3,
there is LEFT OUTER JOIN between these two column searches,
00:06:41 column search 1 and column search 2. After this JOIN, the result is ordered by,
00:06:48 and finally, 20 rows are returned. So with this logical plan, let's find out the dominant
operator and possible key reducer.
00:07:00 In this example, the dominant operator is column search 2. When you recall the concept of
the column search from last week's session,
00:07:10 when a column search is split into multiple ones, we call this a stacked column search.
00:07:18 And we learned that column search processes the natively supported operators in a
predefined order, which is table, JOIN, GROUP BY, and ORDER BY.
00:07:32 With this knowledge, let's find out the dominant operator. The aggregation operator is
located at the top in column search 2
00:07:41 and this makes the stacked column search in this plan therefore, one single column search
is not made,
00:07:49 taking a long time to process the column search 2. And we can call the Aggregation
operator as dominant operator.
00:08:01 Now, let's find out the possible key reducer. We defined the possible key reducer as usually
an ancestor operator of the dominant operator
00:08:12 that could reduce the result from the dominant operator. Therefore, the possible key reducer
in this example can be this JOIN,
00:08:26 since the generated records are extremely reduced after this JOIN. If there is no operator
that can reduce the result, then the query should be rewritten.
00:08:42 So we found the dominant operator and possible key reducer. Then we can think about the
proper SQL hint to make the query performance better.
00:08:55 If the possible key reducer is pushed down, then intermediate results can be reduced. Here,
we can think about the SQL hint JOIN_THRU_AGGR.
00:09:10 However, hints may not lead you to the desirable plan because it is logically impossible to
move operators,
00:09:18 or a different plan than what you want to make is chosen by cost. When it says logically
impossible, it means,
00:09:30 when the optimizer expects the results after they move the operator, there can be a case to
return the different query results,

4
00:09:39 therefore the optimizer does not allow moving the operator. So when the hint does not work,
we can think about the different query result can be made.
00:09:53 And you can verify this by simulating and testing the query. We will look into more details in
case study two.
00:10:04 Regarding point two, a different plan can be made by cost, I will give you an example.
00:10:12 Let's assume you want to push down the aggregation. But when the aggregation is pushed
down, it can go to either
00:10:21 the left-hand side or right-hand side of the JOIN. This decision is chosen by cost,
00:10:28 therefore, the generated plan can be different from what you expect. If the hint does not
move possible key reducer,
00:10:37 then rewriting should be considered to locate the key operators before the dominant
operators. That was the end of unit two of week three.
00:10:52 In the upcoming unit, we will talk about the techniques using dominant operators and
possible key reducers .
00:10:59 Thank you for your attention. See you.

5
Week 3 Unit 3

00:00:05 Hello, and welcome to unit three of week three. Today, I will talk about the techniques using
dominant operators and possible key reducers.
00:00:15 As a way of performance issue analysis, there is issue reproduction. When you encounter
the performance issue,
00:00:23 sometimes there can be a case when the query execution at time one and query execution
at time two have different performance.
00:00:34 This may be because the statement was executed in a different host, or the target table size
has been changed, or a different enumeration process is applied.
00:00:49 Therefore, for the performance issue analysis, it is important to check out the information of
the compilation and execution.
00:00:58 And also it is very important whether the issue can be reproduced. To reproduce the issue,
you can install a duplicated system as a test system.
00:01:13 Or you might already have the test system. So, both systems have the same nodes and
same revision information.
00:01:26 In this situation, in order to reproduce the issue, you can export the target data from the
production system,
00:01:33 and import the data to the test system to reproduce the issue. So you have same
environment as production where you encountered the performance issue,
00:01:45 By investigating the issue and trying various things in the test system, you can find
workarounds,
00:01:51 or you can find the fast case by chance. As another way of performance issue analysis,
00:02:00 you can compare the slow case with the fast case. So, let’s assume you have a fast case to
compare with your slow case.
00:02:11 The fast case can be found by chance, or you may already know the workaround of the
issue.
00:02:20 Let's assume left-hand side is the bad case and right-hand side is the good case.
00:02:26 Let’s check out each query tree. When we look at the query structure, both cases have four
INNER JOINs in total,
00:02:35 and the top INNER JOIN is processed with two INNER JOINs. When you look at the INNER
JOIN on the left side first,
00:02:44 there are base tables called Table1 and Table 2. And GROUP BY operation is processed
while extracting data in Table 2.
00:02:55 And when you look at the right-side JOIN, it is processed with Table 3 and another INNER
JOIN.
00:03:02 There are base tables, called Table 4 and Table 5. After the JOIN between Table 4 and
Table 5, the GROUP BY operation is applied.
00:03:13 This is the bad case that caused the performance issue. Now let’s look at the right-side tree.

00:03:21 This is the good case. When you look at the query plan, like in the bad case, there are four
INNER JOINS in total.
00:03:29 And it seems like this is very similar to the bad case. However, when you look at the top of
the tree, there is a difference.
00:03:37 In the good case, the GROUP BY operation is processed at the top. This is the difference
between the bad case and the good case.
00:03:47 Let’s assume the dominant operator is the red circle. When you compare the bad case and
good case,
00:03:56 the very first step is to find a counterpart against the dominant column search. So, this is the
counterpart of the column search.
00:04:08 Because when you look at the dominant operator, there is JOIN between Table 4 and Table
5,

6
00:04:14 and after INNER JOIN, the GROUP BY operation is processed. So we need to find where
the INNER JOIN between Table 4 and Table 5 is processed
00:04:26 and also find the GROUP BY operation in the good case. Then, you can see this
counterpart.
00:04:33 So this is the counterpart at the column search level. So now you know the dominant
operator in the bad case,
00:04:41 and you found the counterpart of the dominant column search. And when you look at the
counterpart column search,
00:04:51 you can find there are two more JOINS in the good case and also Table 3 is processed
within this column search.
00:05:00 These operators are possible key reducers in the counterpart of the column search. You
remember, the possible key reducer is usually an ancestor of a dominant operator
00:05:11 and it could reduce the result from the dominant operator. Now let’s look at the good case
and bad case as a big picture.
00:05:23 We found the counterpart column search in the good case, and we also have found that
there are two more JOINS
00:05:30 and Table 3 is processed with this column search. Let’s check where those INNER JOINS
and Table 3 are located in the bad case.
00:05:44 As you can see, in the bad case, those INNER JOINS and Table 3 are out of the column
search.
00:05:53 Here, we can find the possible key reducer. Again, the possible key reducer is usually an
ancestor of the dominant operator
00:06:02 and it also could reduce result from the dominant operator. Therefore, this INNER JOIN can
be a possible key reducer.
00:06:15 So, we know the possible key reducer in the bad case. Then, what can we do with this
possible key reducer?
00:06:23 The reason why we compare the good case and bad case, is to make the bad case into a
good case using SQL HINT
00:06:31 or query rewriting by moving the possible key reducer. Therefore, here we can think about
the SQL hint JOIN_THRU_AGGR
00:06:42 in order to move this INNER JOIN into the dominant column search in the bad case. So to
summarize, by comparing the good case and bad case,
00:06:52 we can try various things in order to make the bad case into a good case. With that, we
come to the end of unit three of week three.
00:07:05 In the next unit, we will continue to explore techniques using dominant operators and
possible key reducers.
00:07:13 Thank you for your attention, and looking forward to meeting you in the next unit.
00:07:18 Bye.

7
Week 3 Unit 4

00:00:05 Hello and welcome back to unit four of week three. Today, we will continue to explore
techniques using dominant operators
00:00:13 and possible key reducers. So far we found out about dominant operators and possible key
reducers,
00:00:21 and I introduced a way to make better performance using this information. However, if there
is no operator that can reduce the results,
00:00:33 the query should be rewritten. Here I will show you a query rewriting case using a possible
key reducer.
00:00:42 This is an example. We create table 1, table 2, and table 3
00:00:49 and also create a view called column_search_1 as follows. The reason why we call the view
name column_search_1 here
00:00:58 is that this part will be turned as one single column search. And the original query is about
INNER JOIN between table 1
00:01:07 and the view column_search_1. Now we are going to draw the query optimizer tree here.
00:01:17 So first, let's draw the tree of the view column_search_1. Now, we will draw the query tree
of the original query.
00:01:28 So when you look at the original query, it is about INNER JOIN between the view and table
1.
00:01:36 So if we draw the tree, it will be like this. And from this example,
00:01:41 here GROUP BY operator prevents one single column search and leads to stacked column
search.
00:01:48 Again, when the column search is split into multiple ones, we call this as a stacked column
search,
00:01:55 and it brings data materialization where the result of the column search is converted into a
physical temporary table.
00:02:05 And column search processes the natively supported operators in a predefined order, which
is table, JOIN, GROUP BY, and ORDER BY.
00:02:16 Therefore, the dominant operator in this example is the view column_search_1. Then the
possible key reducer would be the INNER JOIN,
00:02:28 which is the ancestor operator of a dominant operator. Here, let's suppose that we want to
get the INNER JOIN pushed down over the GROUP BY.
00:02:40 Then we can write a query from the plan we want to make. Let's rewrite the query.
00:02:49 The original query is about INNER JOIN between table 1 and the view column_search_1.
Here we know that the GROUP BY operator prevents one single column search,
00:03:00 therefore, it is a dominant operator and INNER JOIN is a possible key reducer.
00:03:08 In order to reduce the result generated by the dominant operator, we can think about this
tree structure
00:03:15 by moving possible key reducers into below GROUP BY. So this is the query tree that we
want to make.
00:03:27 Based on the revisions, or data statistics, there can be different plans generated.
00:03:34 However, in this example, we assume that the plan is generated as follows,
00:03:40 and we made the desired plan as we wanted. Using this desired query tree,
00:03:46 we can rewrite the query. Let's rewrite the query based on this desired query tree.
00:03:55 From the desired query tree, the possible key reducer is INNER JOIN. And this is a join
between table 1 and another INNER JOIN.
00:04:07 So we can rewrite the query like this up to table 1. And table 1 is processed with INNER
JOIN
00:04:17 with the join key "b = f" when you refer to the original query.

8
00:04:22 So we can write the query up to INNER JOIN like this. Now, we need to care about another
INNER JOIN that is processed with table 2 and table 3.
00:04:36 And the join key is "e = I" when we refer to the original query. So we can write the query like
this.
00:04:47 Now, we need to handle the GROUP BY operator. When you refer to the view definition,
00:04:53 columns f, g, h, j, k, and l are processed with GROUP BY. Therefore, we can rewrite the
query like this using the desired query tree that we made.
00:05:07 This might be difficult at first, but with practice, you can achieve better performance
00:05:13 by doing query rewriting using the desired query plan. However, there can be a rare case
where the query result is changed after rewriting.
00:05:24 In that case, you need to consider whether the changed result is suitable for your business
scenario.
00:05:31 Therefore, you need to examine your business logic first. That was about Techniques Using
Dominant Operators and Possible Key Reducers.
00:05:43 In the next unit, my colleague Jinyeon Lee will present to you a simple hands-on session
and SQL tuning tips.
00:05:52 Please join and enjoy it. Thank you for your attention.
00:05:57 Bye.

9
Week 3 Unit 5

00:00:05 Hello and welcome to unit five of week three. I'm Jinyeon Lee
00:00:10 and today I will present to you a hands-on
session about simple query performance analysis.
00:00:16 This hands-on is about finding a possible key
reducer and SQL hints to move the possible key reducer
00:00:22 in order to reduce the results generated by a
dominant operator. Here is a query tree.
00:00:31 There are four INNER JOINs in total. The first one is an INNER JOIN of table 4 and table 5,
00:00:38 and its result is processed with GROUP BY. And when you look at the second INNER JOIN,
00:00:45 it is about joining table 3 to the result from the
first INNER JOIN. And there is the third INNER JOIN, which is table 1 and table 2.
00:00:57 And the fourth INNER JOIN is processed with the results of the second INNER JOIN and
the third one.
00:01:04 And the final result is ORDER BY. With this query tree,
00:01:10 suppose that we are aware of the dominant
column search producing 1 billion rows. In this case, what would be a possible key
reducer?
00:01:21 And why is it? Which SQL statement can we try?
00:01:28 Let's have a look together. The possible key reducer is normally an
ancestor of a dominant operator.
00:01:39 Then would this second INNER JOIN be a possible key reducer? No, this is not a possible
key reducer
00:01:52 because it does not reduce the result of the
column search. 1 billion rows is not reduced.
00:02:01 After this second INNER JOIN, 2 billion rows are
generated. That is, the result is not reduced, but increased.
00:02:12 Then what about the fourth INNER JOIN? Would it be a possible key reducer?
00:02:21 Yes, because after this join is processed, the result is reduced to 1,000.
00:02:31 So, this INNER JOIN reduces the result that is
generated from the column search. We found that possible key reducer, which is
the fourth INNER JOIN.
00:02:45 Now it's time to think about the SQL hint that moves the possible key reducer
00:02:50 in order to reduce the results generated from the
dominant operator. If the INNER JOIN is processed earlier,
00:03:02 then the intermediate result can be massively reduced. So here, we can use the SQL hint
JOIN_THRU_AGGR.
00:03:14 With the JOIN_THRU_AGGR hint, we can make the possible key reducer, INNER
JOIN, processed first,
00:03:20 then make the GROUP BY aggregation processed later. Then as you can see,
00:03:29 using SQL hint JOIN_THRU_AGGR, the upper INNER JOIN is processed before the
aggregation GROUP BY,
00:03:36 and GROUP BY is processed after INNER JOIN. Then, we can imagine how the query plan
can
be transformed
00:03:50 with SQL hint JOIN_THRU_AGGR. However, such transformation is only possible
00:03:58 when it does not change the result. That was about hands-on for simple query
performance analysis.
00:04:10 In the upcoming unit, I will present to you SQL tuning tips. Thank you for your attention.
00:04:17 Looking forward to meeting you. Bye.

10
Week 3 Unit 6

00:00:06 Hello and welcome to unit six of week three. Last week, we discovered how to analyze the
performance issue.
00:00:14 And today, I will present you SQL tuning tips. There are many ways to tune SQL
00:00:20 but I will introduce the most useful tuning tips. The first SQL tuning tip that I'd like to
introduce is proper type casting.
00:00:31 It is important to use proper type casting as much as possible in order to avoid costs of
implicit type casting
00:00:38 such as data materialization. Usually, we say that data materialization is expensive.
00:00:46 So if unnecessary data materialization occurs due to implicit type casting, it might lead to
longer execution time and extra memory consumption,
00:00:57 as well as more CPU consumption. Let’s have a look with a simple example of implicit type
casting.
00:01:06 Here we define a table, TABLE_3. It has one INTEGER column and one VARCHAR column

00:01:14 and two SECONDDATE datatype columns. And using this table, we will run this simple
query.
00:01:27 Let’s have a look at EXPLAIN PLAN to see how the plan is optimized. EXPLAIN PLAN
shows the compiled plan without actual execution,
00:01:37 so, it is a good tool to check how the plan will be before you're running the statement.
00:01:44 As you can see, there are two type conversions that occur. The first type conversion is from
SECONDDATE to TIMESTAMP.
00:01:56 The data type SECONDDATE consists of year, month, day, hour, minute, and second
information
00:02:03 to represent a date with a time value. And the data type TIMESTAMP consists of data and
time information
00:02:13 and it also displays the fractional seconds as well. Since the right part of the following
predicate
00:02:23 is wrapped with explicit type casting to TIMESTAMP, the left side of this also needs to be
changed to TIMESTAMP implicitly.
00:02:37 So, intentionally, ENDTIME, which is SECONDDATE data type, needs to be changed to
TIMESTAMP, involving additional data materialization.
00:02:50 And another type conversion is from INTEGER to DECIMAL and DECIMAL to VARCHAR.
00:02:58 This unnecessary type conversion occurs two times, therefore, if you understand the usage
and business needs,
00:03:07 then it would be good to change the data type of TESTID column to avoid unnecessary type
casting.
00:03:17 Another SQL tuning tip is partition pruning. Partitioning is to horizontally divide a table into
different physical parts.
00:03:29 From the query processing perspective, partition pruning can be a way of performance
improvement
00:03:35 by ruling out irrelevant parts in a first step by restricting the amount of data.
00:03:42 However, there is a check list when you consider partition pruning. First, it is important to
make sure you set partitioning criteria
00:03:53 in a way that supports the most frequent and expensive queries processed by the system.
For example,
00:04:03 let's assume a table has a YEAR column and this table is partitioned based on a range
predicate on the YEAR column.

11
00:04:13 When a query having a predicate on the YEAR column needs to be processed, the system
can restrict the aggregation to the rows in individual partitions for YEAR only,
00:04:25 instead of considering all available partitions. However, if the query does not use YEAR
column as a predicate,
00:04:36 all partitions will be scanned. Another check point is that there is an exception regarding
partitioning
00:04:47 Therefore, you need to consider this exception when you use partition pruning.
00:04:55 And the exception is, partition pruning does not work if a filter has expressions on partition
columns.
00:05:03 I will give you an example in a later slide. Let’s check out partition pruning with an example.
00:05:16 Here we create table called PART1. Let’s look at the definition of PART1 table.
00:05:25 It has SID as INTEGER, and STARTTIME and ENDTIME as SECONDDATE.
00:05:34 Primary key is SID and STARTTIME, and there is hash-range partition with the SID and
ENDTIME.
00:05:45 As multi-level partitioning, hash partitioning is implemented at the first level for load
balancing,
00:05:53 and range partitioning at the second level for time-based partitioning. Here there are four
partitions created
00:06:04 as hash partitioning with hash partition key SID. and under each partition,
00:06:11 there are range partitions with the key ENDTIME. Since here the range partition key is
ENDTIME,
00:06:21 for efficient partition usage, the filter condition has to have range partition key ENDTIME.
00:06:33 Here is another table called TEST_T1. It has two INTEGER columns for TID and YEAR,
00:06:42 and two NVARCHAR columns for month and day. And there is range partition by YEAR
column.
00:06:53 So as you can see, this table is partitioned by YEAR
00:06:57 and there are seven partitions generated. For efficient partition usage,
00:07:05 the filter condition has to have range partition key YEAR. This is an example query.
00:07:15 Here what we have to focus is, WHERE condition. When we look at the filter condition of
this example query,
00:07:25 ENDTIME is greater than and equal to CAST('2018/4/27 6:00:00.0 AM' AS TIMESTAMP)
00:07:33 and TO_CHAR(T.YEAR) is equal to 2018. ENDTIME and YEAR columns are the partition
key for each table.
00:07:45 Let’s check out EXPLAIN PLAN for this query As you can see, even though the partition key
is used in the WHERE condition,
00:07:56 the partition pruning did not work. This is because the data type of ENDTIME column was
mismatched
00:08:07 and the filter for YEAR column has expression. If we express this as a diagram,
00:08:17 as you can see, all partitions are scanned. Then let’s modified the query like this.
00:08:28 What we changed here is we matched the datatype between ENDTIME and the value as
SECONDDATE.
00:08:37 And here we removed the function to_char. And let’s have a look at the EXPLAIN PLAN.
00:08:47 As you can see, the partition pruning works. So as we see in the diagram,
00:08:57 the highlighted white partitions are scanned instead of scanning all the partitions.
00:09:06 So when you create a partition, and if you want to take advantage of it,
00:09:12 make sure you set the correct partition key and try to avoid expressions on filters.
00:09:21 Today we covered SQL tuning tips regarding type casting and partition pruning. In the next
unit, we will continue to explore other useful SQL tuning tips.
00:09:37 Thank you for your attention. See you.

12
Week 3 Unit 7

00:00:06 Hello, and welcome to the last unit of week three. Today, I will continue to talk about SQL
tuning tip.
00:00:16 Today, the SQL tuning tip is query inlining. SQLScript optimizer has a logic to combine two
or more statements as possible.
00:00:26 And this process is called inlining. Here is a simple example of a procedure that contains
inline feature.
00:00:36 Inlining is used in order to simplify complex queries. If the queries are fully inlined, then
many intermediate sequences
00:00:46 are merged into one big step. Therefore it is considered as beneficial for most procedures.
00:00:56 However, a fully inlined query can makes the search spaces larger, so there can be a rare
case that inlining may lead to suboptimal plans.
00:01:09 Let's have a look at the example of query inlining. Here, we CALL the main procedure,
called PROC_OUTER.
00:01:19 Let's look at the definition of a PROC_OUTER procedure. Within PROC_OUTER, there is a
variable defined as TV1
00:01:30 and another procedure, PROC_INNER, is called. And when you check out how
PROC_INNER is called,
00:01:40 the defined table variable TV1 is entered as an input. And PROC_INNER returns TV2.
00:01:50 And TV2 has the result of select query from another input table variable called V_TAB. After
PROC_INNER is processed, there is another statement,
00:02:03 SELECT all FROM table variable TV2. And the table variable TV2 is actually from the result
of PROC_INNER procedure.
00:02:16 After that, PROC_OUTER defines two variables, which are T1 and T2. T1 is the result of the
JOIN query of TAB1.
00:02:27 And T2 is SELECT the statement from the table variable T1. As the last, the final statement,
SELECT all FROM the table variable T2, is executed,
00:02:42 Then, the procedure PROC_OUTER is finished. Now, let’s check out how this inlined query
is processed using SQL trace.
00:02:56 To recall SQL trace, SQL trace is a trace to capture every single SQL statement that enters
the database.
00:03:05 Like this case, when a procedure is run and all the dependent queries are integrated in the
main query,
00:03:14 each statement is not executed individually through session layer and this is not captured by
the default SQL trace setting.
00:03:24 So to collect trace, we enable the internal statement configuration setting for SQL trace.
Here is the SQL trace that I captured to check query inlining.
00:03:40 As all of you now know, in order to find the string, we can search schema name or the query
string.
00:03:47 Then, there is information for connection ID and transaction ID. And check out how the
statement SELECT all FROM TV2 is executed.
00:04:03 As you can see in SQL trace, firstly, it indicates the procedure PROC_OUTER with
commenting. And as you can see by the blue and green highlights,
00:04:15 those two statements are inlined with a WITH clause. And the statement using the table
variable TV2 is executed.
00:04:27 And let’s check out how the statement SELECT all FROM the table variable T2 is executed.
As you can see in SQL trace, this statement is also inlined using a WITH clause.

13
00:04:48 Like the SELECT all FROM the table variable TV2, when you look at the SQL trace, firstly,
procedure PROC_OUTER is indicated at the very beginning with commenting.
00:05:01 Then, using the WITH clause, the yellow and blue highlighted statements are inlined. Then
the statement SELECT all FROM the table T2, highlighted with green, is executed.
00:05:19 In cases when the combined statements do not result in an optimal plan, we can prevent
statements from being combined with other statements
00:05:30 using the NO_INLINE hint. So, with the hint NO_INLINE, all the queries are executed one
by one.
00:05:42 But since the result of each query is executed individually, more memory consumption can
be generated.
00:05:52 So, let's have a look at the previous example with NO_INLINE hint. This will block the
statements from being inlined
00:06:02 and each statement will be executed individually. As you can see, if we specify the hint
NO_INLINE,
00:06:14 SQL trace shows each query is individually executed. CALL PROC_INNER is executed first
with variable TV2.
00:06:26 And SELECT all from the table variable TV2 is run. Then, SELECT all FROM the table
variable T2 is executed.
00:06:41 In this part, we have learned about query inlining. Most of the time, query inlining is
considered as beneficial,
00:06:52 since it combines different SQL statements in order to optimize the database requests,
therefore many intermediate sequences are merged into one big step,
00:07:04 and there can be performance gain. However, by doing so, the search space can be larger,
00:07:13 and it can be the case that a suboptimal plan is generated. In this case, we can use the
NO_INLINE SQL hint to prevent inlining.
00:07:25 Then all the statements can be executed individually. However, there can be memory
consumption increases by doing so.
00:07:37 Therefore, when you write queries, you should consider the characteristics of inlined and
non-inlined queries.
00:07:47 That was about SQL tuning tips, and I've come to the end of this week.
00:07:54 Thank you for your attention. In the upcoming week, we will look at case studies
00:08:02 using the knowledge from the last units. See you.

14
www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distr ibutors contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and serv ices are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any relat ed presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possibl e future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from e xpectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trade marks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/copyright for additional trademark information and notices.

You might also like