Adventures in RAC - GC Buffer Busy Acquire and Release - Martins Blog
Adventures in RAC - GC Buffer Busy Acquire and Release - Martins Blog
So when you see this event in ASH/traces/v$session etc you know it’s a cluster
wait and potentially limiting your processing throughput. It also tells you the
file#, block# and class# of the buffer which you can link to v$bh. This view allows
you to find the data object ID given these input parameters.
Using https://orainternals.wordpress.com/2012/04/19/gc-buffer-busy-acquire-
vs-release/ as a source I worked out that the event has to do with acquiring a
buffer (=block) in RAC (gc = global cache) on the local instance. If the block you
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 1/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
need is on the remote instance you wait for it to be released, and the wait event
is gc buffer busy release.
Since Oracle will clone blocks in buffer caches for consistent reads and use a
shared lock on these for reading I thought that waiting can only happen if
someone requested a block in XCUR (exclusive current) mode. So with that
working hypothesis I went to work.
How to test
I started off writing a small java class that creates a connection pool against my
RAC database. I initially used the default service name in the connect descriptor
but had to find out that dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE did
not trace my sessions. In the end I created a true RAC service with CLB and RLB
goals against both instances and I ended up with traces in the diagnostic_dest.
After setting up the UCP connection pool the code will create a number of
threads that each will pull a connection from the pool, do some work (*) and
hand it back to the pool as good citizens should do.
(*) The do some work bit is this::
1 ...
2 try {
3
4 PreparedStatement pstmt = conn.prepare
5 "select /* bufferbusy001 */ id, to_c
6 "from t1 where id = ? for update");
7
8 int randomID = new Random().nextInt((1
9 System.out.println("thread " + mThread
10 pstmt.setInt(1, randomID);
11
12 ResultSet rs = pstmt.executeQuery();
13
14 while (rs.next()) {
15 System.out.println("Thread " +
16 + rs.getInt("id") + ". Now it
17 }
18
19 rs.close();
20 pstmt.close();
21 conn.rollback();
22 conn.close();
23 conn = null;
24
25 Thread.sleep(2000);
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 2/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
26 } catch (Exception e) {
27 e.printStackTrace();
28 }
29 ...
I think that’s how a Java developer would do it (with more error handling of
course) but then I’m not a Java developer. It did work though! What I considered
most important was to generate contention on a single block. Using
dbms_rowid I could find out which IDs belong to (a random) block:
1 SQL> select * from (
2 2 select id,DBMS_ROWID.ROWID_BLOCK_NUMBER (rowid, 'BIGFILE') as b
3 3 from t1
4 4 ) where block = 11981654;
5
6 ID BLOCK
7 ---------- ----------
8 1450765 11981654
9 1450766 11981654
10 1450767 11981654
11 1450768 11981654
12 1450769 11981654
13 1450770 11981654
14
15 6 rows selected.
So if I manage to randomly select from the table where ID in the range …765 to …
770 then I should be ok and just hit that particular block.
It turned out that the SQL statement completed so quickly I had to considerably
ramp up the number of sessions in the pool to see anything. I went up from 10 to
500 before I could notice a change. Most of the statements are too quick to
even be caught in ASH-Tanel’s ashtop script showed pretty much nothing
except ON-CPU occasionally as well as the odd log file sync event. Snapper also
reported sessions in idle state.
1 SQL> r
2 1 select count(*), inst_id, status, sql_id, event, state
3 2 from gv$session where module = 'BufferBusy'
4 3* group by inst_id, status, sql_id, event, state
5
6 COUNT(*) INST_ID STATUS SQL_ID EVENT
7 ---------- ---------- -------- ------------- ------------------------
8 251 1 INACTIVE SQL*Net message from clie
9 248 2 INACTIVE SQL*Net message from clie
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 3/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
10
11 2 rows selected
Result! There are gc buffer busy acquire events recorded. I can’t rule out TX-row
lock contention since with all those threads and only 6 IDs to choose from there
was going to be some locking on the same ID caused by the “for update” clause.
Now I am reasonably confident that I worked out at least one scenario causing a
gc buffer busy acquire. You might also find the location of the blocks in the
buffer cache interesting:
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 4/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
There is the one block in XCUR mode and 9 in CR mode in the buffer cache for
that block.
Making it worse
Now I didn’t want to stop there, I was interested in what would happen under
CPU load. During my career I noticed cluster waits appear primarily when you
are CPU-bound (all other things being equal). This could be the infamous
middle-tier-connection-pool-mismanagement or an execution plan going
wrong with hundreds of users performing nested loop joins when they should
hash-join large data sets… This is usually the point where OEM users ask the
DBAs to do something against that “sea of grey” in the performance pages.
As with every cluster technology an overloaded CPU does not help. Well-I guess
that’s true for all computing. To increase the CPU load I created 10 dd sessions
to read from /dev/zero and write to /dev/null. Sounds silly but one of these hogs
1 CPU core 100%. With 10 out of 12 cores 100% occupied that way on node 1 I
relaunched my test. The hypothesis that CPU overload has an effect was proven
right by suddenly finding ASH samples of my session.
1 SQL> @ash/ashtop sql_id,session_state,event "sql_id='6a5jfvpcqvbk6'" s
2
3 Total
4 Seconds AAS %This SQL_ID SESSION EVENT
5 --------- ------- ------- ------------- ------- ---------------------
6 373 1.2 79% | 6a5jfvpcqvbk6 WAITING enq: TX - row lock con
7 54 .2 11% | 6a5jfvpcqvbk6 WAITING gc buffer busy release
8 20 .1 4% | 6a5jfvpcqvbk6 ON CPU
9 11 .0 2% | 6a5jfvpcqvbk6 WAITING gc buffer busy acquire
10 11 .0 2% | 6a5jfvpcqvbk6 WAITING gc current block busy
11 1 .0 0% | 6a5jfvpcqvbk6 WAITING gc current block 2-way
12
13 6 rows selected.
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 5/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
Further Reading
I’m sure there is a wealth of resources available out there, in my case Riyaj’s blog
helped me a lot. He even tagged posts with gc buffer busy:
https://orainternals.wordpress.com/tag/gc-buffer-busy
Have a look at the Oaktable World 2014 agenda and watch Tanel Poder’s session
attentively. You’d be surprised how many scripts he made publicly available to
troubleshoot perform. Like snapper? It’s only the tip of the iceberg. And if you
can, you should really attend his advanced troubleshooting seminar.
Responses
Stefan Koehler
December 16, 2014
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 6/7
4/4/24, 4:11 AM Adventures in RAC: gc buffer busy acquire and release – Martins Blog
Hi Martin,
nice blog post.
I usually have seen extensive “gc buffer busy release” waits in case of
LMS / LGWR (gcs log flush sync / log file sync) issues. The long
duration of “gcs log flush sync / log file sync” was mostly caused by
“broadcast on commit” bugs (in case of fast log file parallel write), but
CPU load can be an influencing factor as well as you have
demonstrated nicely :-))
Regards
Stefan
Martin Bach
December 16, 2014
Hi Stefan!
Always nice to get comments from the experts. If memory serves
me right then Riyaj’s post had an example of an overloaded lgwr
process for anyone who wants to see what happens in that case.
Martin
Blog at WordPress.com.
https://martincarstenbach.com/2014/12/16/adventures-in-rac-gc-buffer-busy-acquire-and-release/ 7/7