Friday, December 9, 2011

Singular CBC Latch Acquisition Pattern Diagnosis

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Quick Introduction

In my previous posting on this topic I focused on determining cache buffer chain (CBC) wait time acquisition patterns; disperse or singular. A disperse pattern occurs when many CBC latches are active. In contrast, there are situations when only one or a few CBC child latches are extremely active, hence the singular pattern. In my previous posting, I also listed some disperse pattern solutions.

In this posting I'm focusing on the additional diagnostic steps needed before a specific solution can be determined with a singular wait acquisition pattern.

Important: To understand what I write below you need a basic understanding of Oracle buffer cache internals. In particular the buffer, latch, child latch, cache buffer chain, and buffer header. These are described (including nice diagrams) in my posting here. In that posting a visualization tool is also used which you can download for free here.

Situations with singular CBC acquisition patterns

Two of the most common popular buffer concurrency situations that can stress a single CBC latch are:

First, when a single block contains some application reference information and is not being changed often. The popular buffer causes its hash/cache buffer chain to be popular and then also the CBC child latch covering the hash buffer chain. If concurrency is high enough, this can become an issue. However, if the buffer is being changed often, we'd likely see a buffer busy wait.

Second, relates to a popular index root block. Any time an index is accessed its single root block must first also be accessed, which repeatedly forces the question, "Is this index root block in the buffer cache?". If the concurrency becomes high enough, eventually you'll see CBC latch contention associated with a single CBC child latch.

There are other singular CBC access situations. Next I'll present how to determine the acquisition pattern.

Diagnosing the acquisition pattern

We need to determine if the CBC child latch access pattern is disperse or singular. A way to do this is shown the SQL below, which is essentially my latchchild.sql OSM script. The script collects access details for latch number 150 (the CBC latch) over a 300 second period. Run the script a few times (i.e., collect a few sample sets) to ensure what you think is occurring actually is occurring.
def latch=150
def sleep=300
drop table op_interim
/
create table op_interim as
select addr,gets,sleeps
from   v$latch_children
where  latch# = &latch
/
exec dbms_lock.sleep(&sleep);
select t1.addr,
       t1.gets-t0.gets delta_gets,
       t1.sleeps-t0.sleeps delta_sleeps
from   op_interim t0,
       (
         select addr,gets,sleeps
         from   v$latch_children
         where  latch# = &latch
       ) t1
where  t0.addr = t1.addr
order by 3,2
/

ADDR             DELTA_GETS     DELTA_SLEEPS
-------- ------------------ ----------------
...
76C33818            19,8038                3
76C644A8             8,1535                4
76C7759C             9,6993                4
76C2D1CC            14,5096                4
76E03FEC            14,9355                4
76CDD394            16,0718                4
76C68904            18,2300                4
76D69374             6,5250                5
76D7800C            13,3134                5
76DA2650            15,6578                5
76C02B88            15,8293                5
76CE59E0            14,5169                7
76C7B9F8            15,8243                7
76CFF120             6,5228                9
76CFCF30            14,8187               11
76D38668            96,6345               62
76CDF508            98,1384               96
76DAF26C          1,96,2752              187

1024 rows selected.
Figure 1. A 300 second CBC child latch acquisition activity collection SQL and report. Notice the last three child latches are the most active and have the most sleeps (which we see as "waits"). 

Figure 1 shows three CBC child latches received a high proportional amount of activity, especially the 76DAF26C child latch. Therefore, we'll focus on the three child latches with addresses 76DAF26C, 76DF508, and 76D38668.

Determine the hot buffer(s) and buffer header(s)

More precisely, we need to determine the hot buffer headers that are on the CBC chains protected by the unusually active CBC child latch(es). The x$bh performance table contains information about each buffer header that has an associated buffer in the buffer cache. (More about buffer headers) There are four columns of particular interest to us:

HLADDR is the "hash latch address" hence the column name hladdr. This is a foreign key to the v$latch_children view's addr column.

FILE# is the buffer header's (also Oracle block and buffer) file number that links to many performance views and tables, such as dba_data_files.file_id and dba_segments.header_file.

DBABLK is the buffer header's (also Oracle block and buffer) block number which may be referenced in dba_segments.header_block.

TCH is the buffer header's touch count number, which is a popularity indicator...with a few twists I'll detail below. In every Oracle release I have checked (including 11.2), the tch column is not in v$bh, only x$bh.

Keep in mind that each child latch will normally "cover" multiple cache buffer chains (perhaps >100) and each chain can have zero or more associated buffer headers (average chain length is usually 0 to 1). And of course each buffer header is related to a cached Oracle block residing in the buffer cache.

In the report shown in Figure 1, we saw which CBC child latches are relatively busy; 76DAF26C, 76DF508, and 76D38668. Now we need to know which CBC chains and buffers are related to our relatively active CBC child latches. By querying x$bh we can easily determine the buffer headers associated with a given CBC child latch. We can also get a clue as to the buffer header's popularity. The SQL below is once such query.
SQL> l
  1  select hladdr, file#, dbablk, tch
  2  from   x$bh
  3  where  hladdr in ('76DAF26C','76CDF508','76D38668')
  4* order by 4
SQL> /

HLADDR        FILE#     DBABLK        TCH
-------- ---------- ---------- ----------
...
76D38668          1      70197          5
76D38668          1      39365          5
76DAF26C          1     117328        185
76CDF508          1     117329        186

47 rows selected.
As I wrote many years ago in my touch count algorithm paper and detailed in my Oracle Performance Firefighting book, Oracle uses the "touch count" algorithm to essentially tag popular buffers. There is a little twist though... a buffer's touch count can get reset to zero. So to truly determine the popular buffers, we must repeatedly sample x$bh.

The SQL below can be used to repeatedly sample x$bh finding the most popular buffers given their CBC latch address. If you look closely at the SQL, you'll notice it collects and stores 300, 1 second interval x$bh samples. If you use a larger sleep time, you'll want to increase the number of samples collected. The final select statement reports the key tch based popularity statistics.
drop table op_interim;
create table op_interim (hladdr raw(4), file# number, block# number, tch number);
declare
  i number;
begin
  for i in 1..300
  loop
    insert into op_interim 
      select hladdr,file#, dbablk, tch
      from   x$bh
      where  hladdr in ('76DAF26C','76CDF508','76BE93CC');
    dbms_lock.sleep(1);
  end loop;
end;
/
select hladdr, file#, block#,
       count(*) count, min(tch) min, median(tch) med,
       round(avg(tch)) avg, max(tch) max
from   op_interim
group by hladdr, file#, block#
order by 7
/

HLADDR    FILE#     BLOCK#      COUNT        MIN        MED        AVG        MAX
-------- ------ ---------- ---------- ---------- ---------- ---------- ----------
...
76BE93CC      1      39364        300          6          6          6          6
76BE93CC      1      69730        300         13         14         14         14
76CDF508      1     117329        600          1         36         64        181
76DAF26C      1     117328        300         70        125        125        180
76BE93CC      4    1552339        300         76        131        131        186

42 rows selected.
The statistics that I'm most interested in are the median (MED) and the maximum (MAX). And I'm hoping there is a clear buffer header or very few buffer headers that are relatively really, really active (i.e., hot).

As a side note, you may have noticed that one of the COUNTS is 600 while the others are 300. This could have occurred because there are two buffer headers related to buffer 1,117329. Perhaps one could be the current mode (CU) buffer and the other a consistent read (CR) buffer. The above SQL could hbe improved by adding the state column to more uniquely identify a buffer header by its file#, block#, and state...but I digress.

Based on the above SQL output, we know the hot buffers (file#, block#) are: (1,117328) and (4,1552339). You could also argue to include 1,117329 but this buffer header is not consistently hot because its median is much lower than the other two. But in all honestly, if this was a real production system and because the above SQL could be improved, I would investigate. Now we need to understand why these two buffer headers are so popular.

Determine the hot buffer details

I will investigate the two most popular buffers. But honestly, my focus in this posting is on the second buffer; 4,1552339.

Investigating Buffer 1, 117328

To determine the segment's name and type, I'm going to use my dba_extents based OSM script, objfb.sql. The SQL is a little tricky, so you may want to check it out. Let's first look at block 1,117328.
SQL> @objfb 1     117328

Database: prod18                                               18-NOV-11 09:45am
Report:   objfb.sql            OSM by OraPub, Inc.                Page         1
            Object Details For A Given File #(1) and block #(117328)

File number    :1
Block number   :117328
Owner          :SYSTEM
Segment name   :OP_LOAD_PARAMS
Segment type   :TABLE
Tablespace     :SYSTEM
File name      :/u01/oradata/prod18/system01.dbf
OK, so we're dealing with a table. But this is strange because the op_load_params table is used to interactively change the load intensity of my workload generator tool. (You can download an older version here. I haven't posted the latest version... just lazy I guess.)

When I'm investigating hot table buffers, I also check if the buffer is the segment header block (header blocks contain special stuff...sorry to be so non-specific... but I digress). To determine if the buffer is a segment header block, I ran the below code snippet:
SQL> l
  1  select owner, segment_name, segment_type
  2  from   dba_segments
  3  where  header_file=&hdr_file
  4*   and  header_block=&hdr_block
SQL> /
Enter value for hdr_file: 1
Enter value for hdr_block: 117328

OWNER      SEGMENT_NAME              SEGMENT_TY
---------- ------------------------- ----------
SYSTEM     OP_LOAD_PARAMS            TABLE
Since a row was returned, this popular buffer header is indeed the op_load_params table header block! This is not what I expected and not the focus of this blog posting... so I'll move on. But if this was a production system, you better believe I would figure it out!

Investigating Buffer 4, 1552339

Now let's turn our attention to the other hot buffer header, namely buffer header 4,1552339. First I'll determine the object type by running my objfb.sql script.
SQL> @objfb 4    1552339

Database: prod18                                               18-NOV-11 09:51am
Report:   objfb.sql            OSM by OraPub, Inc.                Page         1
            Object Details For A Given File #(4) and block #(1552339)

File number    :4
Block number   :1552339
Owner          :MG
Segment name   :SPECIAL_CASES_BOGUS
Segment type   :INDEX
Tablespace     :USERS
File name      :/u01/oradata/prod18/users01.dbf

1 row selected.
Very interesting... an index. So we know this hot buffer header is an index block. I'm willing to bet it's the index's root block! Why? Because every time an index is accessed, it's root block buffer header (and buffer) is also accessed. And a very active index root block buffer header can cause its hash chain to be very active which can cause problems when a process attempts to acquire the hash chain's child latch. But how can we tell if an index block is the root block?

As I wrote in my November 11, 2011 posting, we can expect an index's root block to have a block number that is one greater than its segment header block. If you look closely at the SQL below, the block number I entered is one less then the popular block (1552338 = 1552339-1). If this 1552339 block is indeed an index root block, then it's segment header will have a block number of 1552338... let's check it out!
SQL> /
Enter value for hdr_file: 4
Enter value for hdr_block: 1552338

OWNER      SEGMENT_NAME              SEGMENT_TY
---------- ------------------------- ----------
MG         SPECIAL_CASES_BOGUS       INDEX

1 row selected.

SQL> l
  1  select owner, segment_name, segment_type
  2  from   dba_segments
  3  where  header_file=&hdr_file
  4*   and  header_block=&hdr_block
SQL> 
Fantastic! A row was returned, which means the popular buffer header is indeed the index's root block!

So the initial diagnosis is complete and now we need a solution. A poor solution would be to increase the number of CBC latches. Adding CBC latches does indeed significantly help during a disperse CBC latch acquisition pattern because each latch covers fewer chains. But when there exists a singular CBC acquisition pattern, the additional latches won't help much. What we need to do is somehow make the popular index root block buffer header less popular. One solution is to hash partition the index, which effectively creates multiple root blocks because each partition has a root block. At this point we have our diagnosis and a solution.

Summary

This posting has a number of objectives:

  1. How to identify CBC child latch acquisition patterns; disperse or singular
  2. How to determine the hot buffer headers related to specific CBC child latches
  3. What information is needed to further diagnose the hot buffer header
  4. Focus on the situation (diagnosis and solution) when the hot buffer header is an index root block

This is one of my more complicated (and perhaps confusing) posts because to really understand what I'm getting at, you must have a good understanding of Oracle buffer cache internals.

Personally, the hot index root block situation is particularly satisfying. It's a very real application of when the top wait event is "latch: cache buffer chains" simply increasing the number of CBC child latches will not significantly improver performance. But with just a couple of extra diagnostic steps, we can nail the core Oracle contention area.

It took a while to get there but I hope you enjoyed the ride!

Thanks for reading!

Craig.

If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.

Monday, December 5, 2011

Important article published in DB Trends magazine

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Greetings,

This is an unusual post because it's rare my work is published by traditional media. The magazine DB Trends and Applications (DBTA) published an article I wrote entitled, Uniting Operations Research With Time-Based DB Performance Analysis. If you receive the printed magazine it's on page 28 of the December 2011 issue but you can also read it on-line here.

What's the article about? In the short article, I introduce

Unit of Work Time Based Analysis as the intersection of Oracle performance firefighting and Oracle forecasting and predictive analysis.

This intersection is one of the main themes in my two day class, Advanced Oracle Performance Analysis. Understanding and applying the content unlocks deep performance insights and allows you to active a much high level of performance analysis. Why? Because you can objectively compare various performance solutions both numerically and visually. And this comparison can be done from a very high and abstract level (think: pictures) down to a very detailed operations research mathematical level (think: formulas). It's powerful.

It's no coincidence my three classes are similarly entitled: Oracle Performance Firefighting, Advanced Oracle Performance Analysis, and finally Oracle Forecasting & Predictive Analysis.

Because I rarely teach my forecasting class, I want to mention that I will be offering my Oracle Forecasting & Predictive Analysis class in Frankfurt, Germany, February 6 - 8, 2012. I just finished teaching this class in Santa Clara, CA last week and we all had a great time for sure!

My firefighting courses will also be taught in Sweden the week of January 30 and also the week of February 27 in Santa Clara, California.

I hope you enjoy the article! And if you have any questions, feel free to email me at orapub.general@orapub.com.

All the best in your performance endeavors!

Craig.

Friday, November 18, 2011

CBC Latch Diagnosis & Acquisition Patterns


Cache buffer chain (CBC) latch contention is a common top Oracle wait event. There are a number of interrelated causes but also a number of solutions. The trick is to properly diagnose the problem which results in a short list of solutions.

The CBCs are created as a hashing structure and are primarily used to determine if a block currently resides in the buffer cache. (More->>) As you can image, even the smallest Oracle systems ask, "Is a block in the buffer cache?" a ga-zillion times each day. If CBC access continues to intensify, at some point the time to acquire the desired CBC latch will be a performance problem.

Is there a CBC issue?

When CBC latch contention is raging, you're system is likely to have a crippling CPU bottleneck because the application SQL is concurrently and repeatedly checking if specific blocks reside in the buffer cache. If the answer to, "Is the block in the buffer cache?" is usually, "Yes" then IO reads are minimized and memory structure access is maximized...hence the crippling CPU bottleneck and CBC latch contention.

The OraPub System Monitor (OSM and OSM) script I use to interactively determine overall time situation is rtpctx.sql. Here's an example of a 707 second interval.
Figure 1. Typical CBC latch contention result based on rtsysx.sql, response time report.

In Figure 1, the rtsysx.sql output the "% WT" column shows the percentage of wait time by wait event over the report interval. The "% RT" column shows the percentage of the total response time (CPU time and Wait time). The values in Figure 1 are typical when there is a very serious CBC latch contention issue. An AWR or Statspack report will tell a similar story; the top wait event being "latch: cache buffers chains" and most of the system's CPU resources being consumed by Oracle.

While there are a number of causes for CBC latch contention, I tend to see two CBC acquisition patterns. The first pattern is when many CBC latches are very active, that is, the access pattern is very disperse. The second pattern when a single CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the acquisition pattern characteristic. Read on!

Determining CBC Wait Pattern

To determine the CBC wait pattern, you can run a very simple script like this:
select p1 latch_addr,
       p2 latch_#
from   v$session
where  status    = 'ACTIVE'
  and  wait_time = 0
  and  event     = 'latch: cache buffers chains'
/

LATCH_ADDR    LATCH_#
---------- ----------
1993834384        150
1993834384        150
1993834384        150
1993834384        150

4 rows selected.
Notice that all four sessions are sleeping (i.e., they are posting the wait event) while trying to acquire the same CBC latch (note latch address is identical). While the above snippet and the result are interesting, you could easily misled from this single sample. A more statically sound method is to gather multiple samples. Using my OSM script, latchchild.sql we can a sample each second to glean from statistical analysis. The latchchild.sql script essentially does this:
def latch=150
def sleep=300
drop table op_interim
/
create table op_interim as
select addr,gets,sleeps
from   v$latch_children
where  latch# = &latch
/
exec dbms_lock.sleep(&sleep);
select t1.addr,
       t1.gets-t0.gets delta_gets,
       t1.sleeps-t0.sleeps delta_sleeps
from   op_interim t0,
       (
         select addr,gets,sleeps
         from   v$latch_children
         where  latch# = &latch
       ) t1
where  t0.addr = t1.addr
order by 3,2
/
Below is some actual output. The "delta" columns are simply the difference between the beginning and ending values for gets and sleeps. Notice there is not a massive gap between the delta_gets and delta_sleeps  and there is not a single (or a few) latch that is significantly more active then the others. This would be classified as dispersed CBC latch contention
ADDR             DELTA_GETS     DELTA_SLEEPS
-------- ------------------ ----------------
...
76D1ABC4            30,2356                5
76D23210            30,9631                5
76C8413C            32,4284                5
76C75428            23,2780                6
76CFCFAC            24,7324                6
76DB79B0            24,7332                6
76BFE7A8            25,3808                6
76DB9C98            28,1330                6
76C534AC            32,3395                6
76D0BCC0            33,7938                6
76C17DF0            20,3694                7
76C04DF4            24,0050                7
76DE64CC            29,5872                7
76BF5EF0            23,2782                8
76D05864            27,4886                8

1024 rows selected.
To really grasp the situation, a visual histogram based on the sleeps is very useful.

Figure 2. Histogram of the number of CBC latch address and their respective sleep activity.
Figure 2 is a standard histogram I copied from the statistical analysis Mathematica notepad (you can download below). Just over 600 CBC latches have zero sleeps while only two CBC latches had eight sleeps. Notice that while there are differences in the number of sleeps, we don't see a pattern with a massive jump like; 0,0,0,1,2,3,5,6,7,1021. Again, this is an example of dispersed CBC latch contention. But sometimes the situation is not very dispersed, but singular towards just a couple or perhaps even one single CBC latch!
Figure 3. Histogram of the number of CBC latch address and their respective sleep activity.
Figure 3 is the result, as we'll see, of three very popular buffers which are each related to three different three buffer chains. While nearly 1000 CBC latches have zero sleeps (far left vertical bar in histogram), just as important is there is an obvious large delta_sleeps gap near the most active delta_sleeps CBC latches. For some people, the histogram tells a better story, but other others the below numeric snippet better captures the situation. (I personally like to use both.)
ADDR             DELTA_GETS     DELTA_SLEEPS
-------- ------------------ ----------------
...
76C33818            19,8038                3
76C644A8             8,1535                4
76C7759C             9,6993                4
76C2D1CC            14,5096                4
76E03FEC            14,9355                4
76CDD394            16,0718                4
76C68904            18,2300                4
76D69374             6,5250                5
76D7800C            13,3134                5
76DA2650            15,6578                5
76C02B88            15,8293                5
76CE59E0            14,5169                7
76C7B9F8            15,8243                7
76CFF120             6,5228                9
76CFCF30            14,8187               11
76D38668            96,6345               62
76CDF508            98,1384               96
76DAF26C          1,96,2752              187

1024 rows selected.
What is important to us is that there is a substantial delta_sleeps gap separating the top three CBC latches from the rest of the pack.

I'm hoping you can see the differences in these access patterns. I would suggest running the latchchild.sql script on one of your systems to see this for yourself.

If you want to see the latchchild.sql output (latchchild.txt) and the statistical details for above CBC activity, here are the links:
  1. Disperse CBC latch contention (latchchild.txt, PDF, Mathematica Notepad)
  2. Singular CBC latch contention (latchchild.txt, PDF, Mathematica Notepad)
Solutions for Disperse CBC Latch Contention

Figure 2 and the code snippet directly above Figure 2 show a typical example of what you'll likely see when many CBC latches are active enough to cause a significant performance problem. Here's the likely situation: There is an intense CBC latch situation along with a raging CPU bottleneck, and you can probably easily see the heavy logical IO (sysstat.session logical IO) SQL as well. There are a number of solutions, with some of them listed below.

An Oracle focused solution is to increase the number of CBC latches by increasing the hidden instance parameter, _db_block_hash_latches.

An application focused solution is to find the most logical IO intensive SQL and reduce the LIO's by executing it less often or tuning it. Either way, your objective is to reduce the LIOs generated during times of critical performance.

An operating system focused solution is to increase CPU resources by removing CPU consuming process if possible, adding more CPU cores, or increasing CPU speed.

There are of course other solutions, but I think you get the idea.

Solutions for Singular CBC Latch Contention

Figure 3 and the output snippet directly below it are typical when there is intense CBC contention focused on one or perhaps a few CBC latches. When this is the situation, additional diagnosis is needed to determine specifically why the intense singular activity is occuring. This is the topic of my next posting...

Summary

The Cache buffer chain (CBC) structure is used to answer the question, "Is this block in the buffer cache?" At some point, this question can get asked enough to cause significant performance problems known as CBC latch contention. While there are a number of causes for CBC latch contention, I tend to see two CBC acquisition patterns: The first pattern is when many CBC latches are very active, that is, the access pattern is very disperse. The second pattern when a single CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the acquisition pattern characteristic.

In this posting I focused on how to determine the CBC latch contention acquisition pattern; disperse or singular. I then presented some disperse CBC latch contention solutions. In my next posting I'll focus on additional steps to diagnose singular CBC latch contention, two common situations, and some possible solutions.

Thanks for reading!

Craig.

Friday, November 11, 2011

Are you sure it's the index root block?

The Situation

Suppose you want to check if a specific Oracle block is an index root block. Why? Here are two very real situations. You notice a specific block is very active and want to know if it's an index root block. Even more common is, perhaps there is a very active cache buffer chain latch related to a specific block/buffer and you want to know if this hot buffer is an index root block. Besides these very real examples, it's also an interesting journey into Oracle internals!

Folklore States...

Some very respectable blogs and a simple test I ran indicate an index root block is the block after it's segment header block.

Figure 1. Diagram of an Oracle index segment, highlighting the index root block.
Figure 1 is a diagram of an Oracle index segment. If it wasn't for the index root block, Figure 1 would be a good diagram for any Oracle segment. The light blue colored block is the segment header block. Notice the orange colored index root block follows the segment header bock.

As mentioned above, folklore says if the segment is indeed an index, then the orange block will be the index root block. And not just now, but for the life of the index! Wow... This is a pretty strong statement and one that needs to be tested. So that's what I did and what this posting is all about.

It's Kind of Complicated

We need to determine if the block following an index segment header block is the index root block... for always and forever until the index is dropped. First, just dump the index and locate the root block's data block address (DBA). Second, get the DBA for the block following the index segment header block. And finally, compare them. If they match, then we have shown a situation where the block following the index segment header block is indeed the index root block. So let's do that.

Once we get the object_id from dba_segments, here's how to dump an index:
alter session set events 'immediate trace name treedump level :ObjectId';
And here's a snippet of the trace file from near the top.
...
----- begin tree dump
branch: 0x4c5461 5002337 (0: nrow: 8, level: 2)
   branch: 0x4c575e 5003102 (-1: nrow: 141, level: 1)
      leaf: 0x4c5462 5002338 (-1: nrow: 96 rrow: 96)
      leaf: 0x4c63b7 5006263 (0: nrow: 78 rrow: 78)
...     
      leaf: 0x4c554d 5002573 (139: nrow: 100 rrow: 100)
   branch: 0x4c63c7 5006279 (0: nrow: 213, level: 1)
      leaf: 0x4c629d 5005981 (-1: nrow: 88 rrow: 88)
      leaf: 0x4c554e 5002574 (0: nrow: 60 rrow: 60)
      leaf: 0x4c62a0 5005984 (1: nrow: 54 rrow: 54)
...
The first/top mentioned "branch" block is the index's root block. In this case, the index root block has a data block address (DBA) of 5002337. Now let's get the data block address for the block after the index's segment header block. But first we need to get the file number and block number of the index segment header block.
SQL> select header_file, header_block
  2  from dba_segments
  3  where segment_name = 'CH_6_IRB_I';

HEADER_FILE HEADER_BLOCK
----------- ------------
          1       808032
Now let's get the data block address (DBA) for the block just following the header block. We must remember to add one to the header block number, so the block number we are interested in is 808033.
SQL> select dbms_utility.make_data_block_address(1,808033) from dual;

DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(1,808033)
----------------------------------------------
                                       5002337
Do you see it? The DBA just above (with the header block + 1) matches the first/top "branch" block's DBA (5002337) from the index trace file! So now we know how to check if the block following the index's segment header block is truly the index root block.

Now the question becomes, does it always remain this way? For example, what if create the table, create the index, and then insert rows into the table? Or what if we create the table, then insert rows, and finally create the index? If that's not enough, how about this: What if the index grows and splits? Or how about if we delete all the table's rows, insert rows until the index splits? Or how about if it we truncate the related table? As you can see, there are an infinite number of possibilities and there is no way we can test all of them.

The Experimental Setup

I created a number of tests that could be repeatedly run and easily modified and extended. There are two related scripts. The driving script is a SQL script called, doRbExpr.sql and takes a single argument, called the prefix. This prefix is the begining name of all the objects the script creates. This allows you to quickly and easily re-run the script without first removing all the objects from the previous run. The second script, getIdxRtBlk.sql, retrieves the index root block's DBA from both the data dictionary and by dumping the index, and then nicely displays them so you can easily see if there is a difference. I also show the index depth (blevel) as an added test to help ensure I'm looking at the current statistics.

The Experimental Results

Click here to see the results. As you can see, in every case the DBA of the index segment header block plus one, matches the index trace file's root block. I have rerun this test many times, and the results are always the same.

What Does This Prove?

Actually the experiments prove very little, yet they yield a tremendous value. The experiments clearly and repeatedly demonstrate that I have not found a way to disprove an index root block is the block immediately following its segment header block. All it would take is just one of my experiments to break the "block after" rule... but I could not break the rule!  If you can devise a situation to break the rule, please let me know and I'll post it.

So next time you need to check if a particular block is an index root block, simply get it's segment header file and block number, add one to the block number, and compare. In my opinion, that's much easier and faster than dumping the index, parsing it, etc.

Thanks for reading!


If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.





Tuesday, October 11, 2011

Understanding Wait Event Time Patterns

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

What's The Point?

If the average db file sequential read is 20ms, it is likely the typical value is something more like 5ms. If you look at your system's wait event time distributions, you will see (details below) that an average Oracle wait event time is not likely to be the typical wait time and the distribution of wait times is not likely to be normally distributed. So when you approach your IO team about the 20ms sequential read time they may look at you strangely and can honestly say, "When we watch the IO times, we see nothing like this." This posting takes a closer look at the typical wait event time, wait event time distributions/patterns, how YOU can determine the typical wait times (plus other statistical data), and plot a histogram... all based on a standard Statspack or AWR report. ...good stuff!

I first need to mention that many Oracle IO requests do not result in a wait event occurrence. I blogged about this in my "IO Read Wait Occurrence Mismatch - Part 2" posting back in January of 2011. So that in itself can be enough to drive a wedge between you and the IO team.

In this posting, I'm going to focus on wait event times, not why the IO subsystem response times may not match Oracle IO related wait event times. It's an interesting discussion for sure, but not the focus of this posting.

The Journey

Way back on November 20, 2010 I posted an entry entitled, The Average Challenge... Part 1 focused on why using the statistical average to describe common Oracle performance happenings can lead to a gross misunderstanding of reality. To demonstrate my point, I operating system traced (Linux: strace) server processes, the log writer, and the database writer. I created histograms based on the data collected and posted those graphics in the blog entry. In this posting, I will present below how you can do the same thing using data from v$event_histgram and from an AWR or Statspack report.

The challenge is, averages are usually reported to us and they are easy to calculate, so we tend to use them. Worse though, is when we say something like, "The average multi-block read wait time is 20ms." most people immediately assume most values are pretty close to 20ms and it's just as likely a value will be greater than the average then less than average. That is far from reality. Not even close. This means we are effectively misleading people by failing to communicate properly. And as a consultant and a teacher, it really really really bothers me when I'm not being clear or correctly understood.

All this thinking about averages led me on a fantastic quest into some very interesting and surprisingly practical study. First, I needed to gain a better understanding of statistical distributions. To document my research, I posted a blog entry entitled, Important Statistical Distributions...Really. I also spent some time investigating "average" SQL elapsed times and also SQL arrival rate patterns. The results where stunning, disappointing, and enlightening all at once! Currently, I am presenting some of the results of this research in my conference presentation, SQL Elapsed Time Analysis.

At this point in this journey, it's time to blog about wait times. Not a specific wait event, but the actual wait times and how to describe what's actually occurring. Knowing that saying the average wait time is Xms can be seriously misunderstood, the practicality of this is huge. I am no longer satisfied in speaking about averages. We need to get a much better understanding of what the typical value is and understand the wait time pattern (i.e., distribution). That's exactly what this blog entry is all about; understanding wait time patterns, how to get good data, understanding the data, how to numerically and visually show the data, and how to better communicate what the data actually means.

The Data Source

Oracle does a us a big favor by capturing wait event times and placing them into buckets or bins of a specific time range. For example, there are bins from 0ms to 1ms and from >1ms to 2ms, from >2ms to 4ms, and on up in powers of two. The good news is the data is being automatically captured, but the bad news is:

1. The bin size can be quite large. For example, a bin size of >8ms to 16ms. That's a bin size of 8ms! Plus much of the really interesting times center around 10ms.

2. The bin sizes change, that is, they increase in powers of two. This makes visualizing the distribution (think: histogram) of the wait times extremely difficult for us humans. That's why I created a tool to help us accurately visualize the data... more below.

For every wait event, Oracle stores the wait event occurrence in the 10g performance view v$event_histogram. The view is very straightforward here are a couple of links (swhist.sql and swhistx.sql)  to tools in my OSM Toolkit that pull directly from this view. In addition to directly pulling from the source v$event_histogram view, more recent Statspack and AWR reports provide this information in varying levels of completeness. The formatting is a little bizarre, but with a little practice you can grab the histogram bin values necessary to create the statistics and the histogram we need.
Figure 1. Example of Oracle wait time occurrences placed in time-based bins from an AWR report.

Figure 1 above is an example of v$event_histogram data for the wait event, cursor: pin S wait on X. (my ref: 20110425_0800_0900.html)  If you look at the top part of the Figure 1 you'll notice the average wait time is 5,111ms, which is 5.111 seconds! So what is the typical value and what does the distribution look like? To answer these questions, I created a Mathematica based tool that uses the bin inputs from v$event_histogram, Statspack, or an AWR report. How to get the tool and use the tool is explained next.

Finding the Typical Value and Plotting the Histogram

I created a nice little tool called, OraPub's Wait Event Time Distribution Analysis Tool which can be downloaded and used for free HERE. The interface is not as clean as I'd like, but to make the tool freely available without you having to license Mathematica, this is what I had to do.

Retrieving the Data from the AWR report

The AWR event histogram data is provided in percentages. For example, if you look closely at Figure 1, you'll see that 6.4% of the wait event occurrences for the cursor: pin S wait on X were between 8ms and <= 16ms. It's a little strange the first few times you retrieve the wait time data, but you'll get the hang of it. For the event, cursor: pin S wait on X in Figure 1, the bin data is as follows (0ms to <=1ms, 1ms to <=2ms, 2ms <=4ms, ..., 4096ms to <=8192): 0, 0, 0, 0, 6.4, 19.4, 7.9, 3.7, 5.9, 8.1, 3.2, 1.1, 2.3, 5.0, 33.7,  and 3.3. Remember that these values are percentages and should add up to 100, that is, 100%. The situation shown in Figure 1 is indeed an unusual distribution of values. Most wait event occurrences tend to be heavy near the beginning (e.g., 1ms) and extremely light at the end (e.g., 8sec).

Using the Analysis Tool

Data Entry

Figure 2 below shows most of the data entry area expanded along with the data entered. This version of my tool doesn't provide entry for the final 3.3% of the wait occurrences, which are wait times between 8192ms and <= 16384ms.
Figure 2. Partial data entry area of analysis tool.
Figure 2 above shows the upper portion of the data entry area of the tool. Rather than using the slider bar, data entry is much easier if you click the "+" sign for all the entries areas and then actually type in the values. Every time you press the "tab" key the tool will recalculate, so try to avoid this. For the Total Waits entry, start with 100 and then go to 500 or 1000 for the final graphs. Do NOT enter the actual number of waits that occurred as this will take too long to render the histograms. Plus once you get over a couple hundreds samples, the results will be basically the same.

Statistical Output
Figure 3. Statistical data and three of the possible five histograms based on Figure 1 and Figure 2 data.
Figure 3 above shows some key statistics (e.g., median) and three of the five possible histograms. The first two histograms did not render. The first $Aborted histogram has the bin size set to 1ms and it was not possible to show over 8000 histogram bars! The second $Aborted histogram has the bin size set to 2ms and again, it was not possible to show over 5000 histogram bars.

Let's look at the resulting statistics. Figure 3 shows the following statistics:

Tot Pct is the total percentage of wait time entered. You may recall the tool did not have a data entry field of wait times between 8 and 16 seconds, which in this case was 3.3% of the values. Since the wait event percentages we entered adds up about 97%, it appears we entered the data correctly.

Average is the statistical mean. This should be close to the average wait time reported in the AWR report. The AWR reports shows the average wait time is 5111ms and the data I entered combined with the limitation in bin details, the tool determined the average was 4792ms. That's a 6% difference. Considering the crazy event time dispersion, that doesn't bother me.

Median is the statistical median, which when there is a single mode (i.e., histogram peak), is usually the typical value. The value is 413ms, which is over 4 seconds less then than the average! Wow, what a difference and this is why taking the time to do this can be very worthwhile! I will write more about this below.

Std Dev is the standard deviation. This is gives us an idea of the data dispersion, but when the data is not normally distributed (think: bell curve), which this data set not, this can be very valuable in communicating the data value dispersion.

p-Value is the statistical p-Value comparing the data with a normal distribution. Basically, if the p-Value is greater than 0.05 then the data is likely normally distributed (think: bell curve). I have yet to see wait event times be normally distributed. It's no surprise then, even at 5 decimal places (which you can't tell from the display), the p-Value rounds to zero.

The Histograms

Figure 3 above also shows the resulting histograms. Usually, five histograms will plot. The first three are standard histograms, but with different bin sizes. The first and second, which aborted, I set bin sizes of 1ms and 2ms respectively. Usually, it is useful to see the distribution at this level of detail. Usually, the wait event times are relatively short and I want to see the details at the short duration times, hence my setting the bin sizes to 1ms and 2ms. However, in this case the interesting times are much larger than one or two milliseconds. The third histogram Mathematica automatically sets the bin size. Usually all three histograms are created.

The fourth histogram is a Probability Histogram. In Figure 3, the Probability Histogram tells us that about 60% of the values occurred within the first bin, which is between 0ms and <= 1,667ms (5000ms/3 bars), that is between 0 and <= 1.7 second.

The fifth histogram is the Cumulative Count Histogram. This is just another way to get an understanding of how the data is distributed; based on the number of wait occurrences, not the percentage.

Analysis of Our Data

Now let's use the tool to learn something interesting, and yes useful, about the cursor: pin S wait on X wait times. First, notice the mean and median are very different. This is typical. It is common for the median (usually the "typical value") to be less than half of the average. In this case it is a factor  of 10. The average is around 5 seconds and the typical value is around 1/2 second... a massive difference.

This is important: Notice that the AWR report (shown in Figure 1) and the subsequent data entered into the tool (not shown in Figure 2) shows that 33.7% of the wait time occurrences are between 8 and 16 seconds. In our minds, it is common to think there is a massive buildup around 8 to 16 seconds. But there is not! We need to remember that 33.7% of the wait occurrences are spread, that is dispersed, over 8 seconds! Not 1ms, 2ms, or 4ms as with near the left portion of the histogram. This is why it is so important to SEE the data in a histogram. And the difference between the mean and median also helps accentuate this.

Sixty percent of the wait occurrences occur within 1ms. That's pretty good. However, what is disturbing is if the wait does not stop at 1ms, it can end up lasting over 2 seconds. This indicates there is a problem because the way Oracle latches and mutexes are designed to work is for most of the sleeps (that is what mutex/latch wait time is, "sleep" time) to quickly reduce as the wait time increases. I don't want to get into Oracle serialization and mutex internals, but I will say this is a problem that should not be ignored. (I realize that was a gross understatement.)

To summarize, instead of saying, "The average wait time is about 5 seconds and we need to look into this." which horribly simplifies the complexity of the situation, we could say something like this: While CPU consumption and single block buffer IO reads must first be addressed, there is an unusual and potentially significant issue regarding library cache serialization. If it weren't troubling enough the average wait time is around 5 seconds, typically 60% of the time the waits are less than 1ms, yet if the wait is not satisfied within 1ms, the wait time can easily be between 8 to 16 seconds. This intense "get it now or never" situation can result in an extremely volatile performance situation and could be the result of an Oracle mutex bug..." Oh yeah... I would also show the standard histogram as well. (That's the first histogram shown in Figure 3.)

In Summary

This is just one example where Oracle wait event times are not normally distributed and therefore speaking in terms of "average" miscommunicates what is really occurring. When you need to have a deeper understanding of the timing situation and visually see the situation, you can use v$event_histogram data combined with OraPub's Wait Event Time Distribution Analysis Tool.

Have fun and all the best in your Oracle performance endeavors!

Craig.

Wednesday, September 7, 2011

Anticipating SQL Elapsed Times

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Getting More Practical

Many of my postings can be considered quite theoretical. But in this posting I'm going to make a clear application of quantitatively modeling response time and understanding why Oracle systems behave as Operations Research queuing theory state (which was the topic of my previous posting).

In this posting I'm building from two recent posts. In July I demonstrated that tuning Oracle can reduce the CPU and the total time it takes to process a single LIO. In August I demonstrated that a standard Operations Research queuing theory quantitative model can be used to anticipate the time it takes to process a single LIO. Now it's time to go from micro to macro!

In this posting I will develop a simple SQL statement elapsed time model that takes two input parameters; the time to process a single piece of work (e.g., 1 ms per logical IO) and also the number of pieces of work to process (e.g., 1000 logical IOs). As you will see, what we know from our personal experience will play out in this quantitative model; adding confidence that Oracle systems do in fact behave as Operations Research queuing theory would have us believe.

Very Simple SQL Elapsed Time Model

In my July posting, I introduced a very simple SQL statement elapsed time model. The model shows that the time to run a SQL statement can be represented as the amount of work to be processed multiplied by the time it takes to process each piece of work. (This is a very simplistic and limited model.) For example, if a SQL statement must process 1000 LIOs and each LIO takes 1.0 ms, then the elapsed time will be 1,000 ms.

To make this a little more elegant, let's create an equation.

E = Work X Time per work

where:

E is the elapsed time (e.g., 1 second)
Work is the amount of work to be completed (e.g., 1000 LIOs)
Time per Work is the time it takes to process a single piece of work (e.g., 1.0 ms/LIO)

This simple (and limited) model indicates we can reduce a statement's elapsed time by reducing either or both the amount of work to be processed or the time to process a single piece of work. Before we get into the actual experiment, let's first review each of these terms.

Important: This is key: I define response time as the time to process a single piece of work. Elapsed time is the time to process many pieces of work.

Representing and quantifying work.

The amount of work to be processed is very straightforward. We can measure this by gathering statistics from v$sesstat, SQL tracing, and perhaps other ways. How we decide to represent work, is typically based on what constrains the SQL statement elapsed time. For example, if a SQL statement must process 1M LIOs and 1K PIOs (block reads) and there is a raging CPU bottleneck, choosing LIOs to represent the work is probably a good choice. But if there is a raging IO read bottleneck, then choosing PIOs is probably a better choice. This topic is known as choosing the unit of work and was presented in my previous posting's section Unit of Work Time. I have discussed this in many of blog postings and cover it extensively in my Advanced Oracle Performance Analysis class.

While I'm not going to detail my experimental results in this posting, as you might expect, if you reduce the amount of work to be processed by 50% you are likely to see a similar reduction in elapsed time. However, I do present experimental evidence of this occurring in my conference presentation, SQL Elapsed Time Analysis.

Quantifying Time per Work.

Our models work very nicely if we classify SQL elapsed time into either CPU consumed or non-idle wait time. As you may know, it is rather simple to gather this time from either the instance statistic (v$sesstat) or the time system model (v$ses_time_model) views. To determine the time to process a single piece of work, we simply divide the total time (i.e., cpu + non-idle wait time) by the total amount of work (e.g., 1000 LIOs). This can be done for a specific SQL statement, a process, or over an interval of time (think: Statspack/AWR).

In my previous posting, I detailed how tuning Oracle reduced the response time for a LIO and how that response time reduction occurred as Operations Research queuing theory states.

In this posting, I'm going to focus on the impact of response time on a SQL staetment's elapsed time. Specifically, I want to experimentally and demonstrably see if reducing response time by 50% will reduce the elapsed of a SQL statement also by 50%... just as the above simple elapsed time model indicates.

Experimental Setup

For this posting, I am going to use the exact same data set I used in my previous blog entry. While I didn't present this in the previous posting, in addition to gathering the response time related metrics, I also gathered the elapsed time for a very LIO dependent SQL statement. All the experimental data is included in the Mathematica analysis notebook and can be viewed and download from my previous posting. The data collection script can also be viewed on-line from the previous posting. Link to previous posting.

But to summarize the situation: I created a massive CPU bottleneck by having a number of Oracle sessions run SQL where all their blocks reside in the buffer cache, that is the SQL is logical IO (v$sysstat: session logical reads) dependent. I gathered 30 ten minute samples with the CBC latches set to 256 and then to 32768. During these ten minute collection periods, I sampled the elapsed time of a specific LIO dependent SQL statement.  With 256 CBC latches, around 23 elapsed times where collected. With 32768 CBC latches, around 71 elapsed times where collected. The difference in the number of elapsed time samples was the result of the SQL completing sooner when there was 32768 CBC latches. The SQL elapsed times where gathered using the OraPub SQL Elapsed Time Sampler (beta 3c).

Experimental Results Analyzed
Figure 1.
Figure 1 contains a summary of the experimental results. The Instance Rt (ms/lio) is simply the total time (CPU time plus non-idle wait time) divided by the total number of LIOs processed during the sample interval. The results are stunningly straightforward: By adding additional CBC latches to combat the massive CBC latch contention, the instance LIO response time decreased by 85% and the elapsed time for the specific LIO dependent SQL statement being monitored decreased by 86%. That is truly amazing!

Will this always occur? For this particular SQL statement probably YES, but for other SQL statements, it depends. The key is understanding what resource is in short supply and understanding if the SQL statement is dependent on this resource. Because there was massive CBC latch contention caused by an intense LIO workload and limited CPU power, I knew that SQL statements who performance was massively affected by LIO would therefore also have their elapsed time highly impacted by LIO response time. If I had selected a statement that was all about physical IO or inserts or updates, and not LIO, their performance (i.e., elapsed time) would likely not have been effected.

This may sound very theoretical but you probably can relate this type of situation to your personal Oracle performance experiences. Relating the OS bottleneck to both the Oracle instance situation and the application SQL is extremely important to more fully understanding the performance situation and also to derive multiple appropriate solutions. For example, if there is a massive IO read bottleneck, you can expect SQL that is heavily IO read centric to be affected while LIO dependent SQL will likely not be affected. I discuss this in the Methods & Madness chapter (1) in my book entitled, Oracle Performance Firefighting. This posting helps build a quantitative framework for what we have experienced in real production Oracle systems.

In Summary

Understanding the experimental results should at a minimum add confidence to what you have likely already experienced but hopefully also opens the door towards anticipating SQL statement elapsed times when tuning either Oracle (reducing response time) or tuning a SQL statement (reducing work to process).

In this short posting, I've demonstrated a heavily LIO dependent SQL statement's elapsed was reduced nearly the same percentage as the LIO response time reduction. The LIO response time reduction was due to adding additional CBC latches.

A broder application of this posting is we can increase SQL statement performance by tuning it (i.e., reduce the pieces of work it must process) and also by tuning Oracle (i.e., reduce LIO response time). While not mentioned in this entry, it should make sense that an OS centric solution would be use faster CPUs as this would also reduce LIO response time! Notice, we have a solution from an Oracle, an application, and an OS perspective. I call this approach the OraPub 3-Circle Method.

Thanks for reading!

Craig.

If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.

Tuesday, August 23, 2011

Why tuning Oracle works and modeling it

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Have you ever wondered why tuning Oracle improves performance? There are of course obvious answers, but then there are the deeper answers. More profound answers. It's like answering the question, "Why is the sky blue?" Sure you can say, it because the sun's light rays are scattered when they hit the Earth's atmosphere. But then why does scattering the light rays turn the sky blue? And it goes on and on. It can be just like that with Oracle performance.

Last month I blogged about CBC latches, CPU consumption, and wait time. In that posting I demonstrated that by adding cache buffer chain (CBC) latches to a CBC latch constrained system the CPU consumption per logical IO decreased.

In this posting I want to demonstrate how a change in CPU consumed per logical IO causes a corresponding change in the time it takes to process a logical IO...just as Operations Research queuing theory states.

Note: When I write, "tuning Oracle" I am referring to altering instance parameters that do not influence the optimizer to change a SQL statement's execution path. For this posting, I'm typically referring to instance parameters that alter the number of cache buffer chains and latches.

For many of you, this posting will be immensely satisfying because we will have quantified and modeled the Oracle system, taken a tuning solution and quantitatively observed and understood why it altered the system, and then we observed the result closely matched our quantitative model. If this still seems overly theoretical, in the next blog entry you will see how we can use this understanding to anticipate the impact on a SQL statement's elapsed time!

In my previous CBC latches, CPU consumption, and wait time posting I defined and used a few terms that must be understood before this blog posting will make any sense. The terms are unit of work, service time, queue time, response time, arrival rate, and elapsed time. Response time is the time to complete a single unit of work and elapsed time is the time to process multiple units of work. If this is somewhat confusing, please refer to that previous blog entry.

The Experimental Setup

To meet my objectives  I created an experiment that is easily repeatable. I created a massive CPU bottleneck by having a number of Oracle sessions run SQL where all their blocks reside in the buffer cache, that is SQL is logical IO (v$sysstat: session logical reads) dependent. I gathered 30 ten minute samples with the CBC latches set to 256 and then to 32768. During these ten minute collection periods, I sampled the elapsed time of a specific LIO dependent SQL statement.  With 256 CBC latches, around 23 elapsed times where collected. With 32768 CBC latches, around 71 elapsed times where collected. The difference in the number of elapsed time samples was the result of the SQL completing sooner when there was 32768 CBC latches.

If you look closely at my data collection script, you can easily see how I captured, stored, and retrieved the performance data. You can download the data collection script here. The SQL elapsed times where gathered using the OraPub SQL Elapsed Time Sampler.

You can download the raw data text files (256 latches32768 latches) and the Mathematica analysis notebook (PDFnotebook).

Concepts/Terms Quickly Reviewed

Unit of Work Time

Current Oracle performance analysis focuses much on the time involved (CPU plus non-idle wait time) related to SQL statement completion, process completion, or an Oracle instance over an specified interval (think: Statspack/AWR). That's great and is a fantastic analysis leap forward from ratio analysis and wait event analysis because it better reflects what a user is experiencing and it includes both wait time and CPU consumption. But to unite Oracle time based analysis with Operations Research queuing theory, we need the time related to a specific piece (or unit) of work. When we do this, we gain the advantages of our Oracle analysis plus all the years of proven Operations Research! Yeah... it's a big deal.

There are many ways to describe the work being processed in an Oracle system. When we say, "The LIO workload is unusually high today" we are relating performance to the LIO workload. Or how about, "Parsing is hammering performance!" or "Disk reads are intense and really slow today and it's affecting some very key SQL statements." Each of these statements is speaking and relating system performance to a type of work; namely logical IO (session logical reads), parsing (parse count (hard)), and disk reads (physical reads).

We can use this natural way of relating work and performance in very profound ways. What I'm going to show you is how to quantify these performance statements and then demonstrate how tuning Oracle changed the underlying Operations Research queuing theory parameters and then in my next posting how this affects SQL elapsed times.

How Oracle Tuning Reduces CPU per Unit Of Work

Think of it like this: Acquiring a latch or mutex consists of repeatedly checking a memory address (which consumes CPU) and possibly sleeping (which can be implemented in a number of ways). If there are 100 sessions requesting a latch and there is only one latch, you can see there will be a lot more spinning and sleeping compared to if there was 100 latches. By increasing the number of latches, we are effectively reducing the number of spins involved to process a LIO, which translates into reducing the CPU involved to process a LIO (on average).

Let's get quantitative. For example, if over a one hour period Oracle processes consumed 1,000 seconds of CPU time while processing 5,0000,000 logical IOs, then the average CPU time to process a logical IO is 0.20 ms/lio.

Here are some additional terms quickly defined:

Response Time (Rt or R) is the time it takes to process a single unit of work. Queuing theory states that response time is service time (defined below) plus queue time (defined below).

Service Time (St or S) is the CPU consumed to process a single unit of work. We get this data from v$sys_time_model, summing the DB CPU and background cpu time columns. For those of you who are familiar with service time, while I don't detail this in this blog entry, Oracle service time, that is the CPU it takes to process a unit of work, is nearly constant regardless of the arrival rate... just as the theory indicates.

Queue time (Qt or Q) is the non-idle wait time related to processing a single unit of work. We get this data from v$system_event. For those of you familiar with queue time, when response time increases, it is because the queue time increases, not because service time increases... and Oracle systems behave like the theory indicates.

Arrival Rate (L) is the number of units of work that arrive into the Oracle system per unit of time. For example, 120 physical IOs per second or 120 pio/sec. In a stable system, the arrival rate will equal the workload, which is why I commonly use the word workload. This is avoid introducing yet another term and confusing people. The symbol L is used because the arrival rate is always depicted using the greek symbol lambda.

Now that I've covered the experimental setup and the key terms and concepts, let's take a look at the actual experimental results.

The Experimental Results Analyzed

The objective of the posting is to demonstrate that tuning Oracle by adding CBC latches in a CPU bound system with significant CBC latch contention system:
  1. Reduces the CPU consumed per logical IO (service time),
  2. Reduces response time as Operations Research queueing theory states.
The Drop in CPU Consumed per Logical IO.

As I demonstrated in my CBC latches, CPU consumption, and wait time posting, in a system that is CPU constrained experiencing massive CBC latch contention, one of the possible solutions is to increase the number of CBC latches. This causes a decrease in the CPU consumed while processing a LIO, that is the service time (CPU ms/lio or simply ms/lio). (This solution will only work if CBC latch access is not specific to a few CBC latches. Why?) This blog posting's experiment also easily demonstrates this phenomenon.
Figure 1.
Figure 1 above shows the Operations Research queuing theory parameter results. Notice the 72% decrease in average service time when the number of latches was increased from 256 to 32768.  Numerically, it looks like a very real decrease in service time!
Figure 2.
Figure 2 above is a histogram of the service times. The red-like color bars are the sample service times when there was 32768 CBC latches and the blue-like bars are the sample times when there was 256 latches. Visually, it looks like when adding CBC latches the service time decrease is very significant!

Just to make sure that statistically the service times are significantly different, I performed a significance test. Since the sample distributions where not normally distributed (obvious with the blue-like bars), a location significant test was performed. Mathematica choose the Kruskal-Wallis test and the resulting P-value was 44.3x10-19, which is far below my chosen alpha of 0.05. Therefore, statistically there is a significant difference (and in this case a decrease) in the service times. You can view all these details in the Mathematica notepad and it's PDF. The link is provided in the Experimental Setup section above.

Response Time Decreases as Queuing Theory States

In this experiment I captured both the CPU time (service time, St, S) and the non-idle wait time (queue time, Qt, Q) related to a LIO. This is the time it takes to process a LIO (CPU time plus non-idle wait time), which can be called the response time (Rt, R). Referring once again to Figure 1 above, notice the response time dropped 85% from 0.0633 ms/lio (w/256 CBC latches) down to 0.0093 ms/lio (32768 CBC latches). As with service time, I performed a significance test and the P-Value was 3.0x10-11. The histogram looks very much like Figure 2. You can see the histogram in the Mathematica files (link in Experimental Setup section above.)

That's all good, but this section is really focused on asking the question, "Is this decrease in response time consistent with queuing theory?" Read on!

Develop a Simple Response Time Model

To answer this question, I'm going to develop a very simple quantitative response time model based on the Oracle system when it was configured with only 256 CBC latches. The classic Operations Research queuing theory response time model for a CPU subsystem is:

R = S / ( 1 - ( L*S/M)^M )

where:

R is the response time (ms/lio)
S is the service time (ms/lio)
L is the arrival rate (lio/ms)
M is the number of effective servers (will be close to the number of CPU cores or perhaps threads in an AIX system)

Referring to Figure 1 above, notice we have values for all variables except M, the number of effective servers. In a CPU subsystem, M is the number of CPU cores or perhaps threads. Since we have real data, we can derive the number of effective servers. If the system is CPU bound, the number of effective servers is typically pretty close the number of actual servers (i.e., CPU cores). Let's check it out!

You cannot solve for M using standard Algebra... it won't work. Even Mathematica's WolframAlpha will tell you this! What is needed is some cyclical process that converges on M. In 2010 I created a simple web application, that anyone can access on-line, to solve for M. I call it the OraPub M-Solver and here is the URL: http://filezone.orapub.com/cgi-bin/msolve.cgi If you search Google for "msolver" and especially "orapub msolver" it will be the top result.

Placing the values from our system into OraPub's M-Solver, you will see what is shown in Figure 3.
Figure 3.
Press the submit button to solve for M and in a few seconds you will receive what is shown in Figure 4.
Figure 4.
Figure 4 shows M at 4.598. There are four physical CPU cores in this system... not bad and very typical difference. (While I'm not going to go down this path, notice at the bottom of the Figure 4 there is a link to plot the resulting response time curve.) As Figure 4 shows, we now have all the variable values for the response time formula; M, L, S, Q, and R.

Testing the Response Time Model

The question before us is, does the change in the service time (S) produce a corresponding change in the response time (R) as queuing theory states? Let's check!

Placing the modified service time (S) into our response time (R) formula along with the initial arrival rate (L) and effective servers (M):

R = S / ( 1 - ( L*S/M)^M )
    = 0.0087205 / ( 1 - ( 126.851*0.0087205/4.59756)^4.59756 )
   = 0.008733

Our model anticipates the response time to be 0.008733 ms/lio. The experimentally observed response time was 0.0093257 ms/lio. That's really close! As Figure 5 shows, the difference is only 6.4%.
Figure 5.
Figure 5 shows that when additional CBC latches were added and only incorporating the service time change into our response time model, the predicted response time differed only 6.4%. Considering the simplicity of our model, this is outstanding!

You may have noticed that in Figure 1 when the additional latches where added and the system stabilized, the arrival rate increased by 175%. To be correct (and fair) to our response time model, we need to account for this change in the arrival rate. As most of you know, when we increase the arrival rate the resulting response time can also increase. So be fair, we need to incorporate the arrival rate increase into our model.
Figure 6.
Figure 6 shows the results when incorporating both the change in service time (S) and arrival rate (L) into the response time model. In this case, our prediction was off by 10%. Again, considering the simplicity of our model (which can be greatly enhanced as I discuss in my Oracle Forecasting & Predictive Analysis course), this is outstanding!

Very cool, eh? What we have seen is that by tuning Oracle we have reduced the time it takes to process a logical IO (response time) and this reduction is as our classic CPU queuing theory based model indicates.

To Summarize...

The main point of this posting is to demonstrate that when we tuned Oracle by adding additional CBC latches, we effectively altered the Oracle kernel code path making it more efficient AND the resulting LIO response time changed as Operation Research queuing theory states!

In a little more detail, this is what occurred:
  1. There was an intense CPU bottleneck along with raging CBC latch contention.
  2. We observed the CPU time to process a single LIO (S) was 0.0313 ms and the total time to process a LIO (R) was 0.0633 ms.
  3. We increased the number of CBC latches from 256 to 32768.
  4. We restarted the system and let it stabilize.
  5. We observed the CPU time to process a single LIO (S) decreased by 72%, the arrival rate (L) increased by 175%, and the total time to process a LIO (R) decreased by 85%.
  6. Our response time model predicted, with the decrease in CPU time to process a LIO (S) and also the increase in the arrival rate (L), a response time (R) that was 10% greater than what actually occurred.
For many of you, this will be immensely satisfying because we have quantified and modeled an Oracle system, taken a tuning solution and quantitatively observed the resulting change and understood why it altered the system, and then we demonstrated the observed result closely matched our quantitate model.

If this seems overly theoretical, in the next blog entry you'll see how we can use this information to anticipate the impact on a SQL statement's elapsed time!

Thanks for reading!

Craig.

If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.

Wednesday, August 3, 2011

True SQL Elapsed Times... gathering

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Getting laughed at is no fun...

Here's the situation: You run a Statspack or AWR report, determine the key SQL statements, and based on the total elapsed time and the number of executions you determine their average elapsed times. You then tell your user's you can see for their key SQL statement the average elapsed time is X. Then you notice they are kind of snickering because while it does sometimes take this long, it usually doesn't. And in their minds, your reputation and their trust in your skills sinks a little lower. If this situation makes you feel uncomfortable, then read on!

Average elapsed times do not tell us much.

If you have been following my blogs, you may recall that last February (2011) I blogged about SQL Statement Elapsed Times. One of the key take aways was SQL statement elapsed times are not normally distributed. That is, for a given statement, if you gather a bunch of elapsed time samples, there will not be an equal number of samples below and above the average. Said another way, that nice bell curve histogram most people envision when we say, "The statement takes an average of X seconds to run" is not close to the truth. Not even close.

Since SQL statement elapsed times are not normally distributed, knowing the average elapsed time is not all that useful and can easily mislead people.

Collecting real elapsed times.

To demonstrate this, in the February blog posting I created a procedure to gather elapsed times for a given sql_id and plan_hash_value. The problem was (and still is) the procedure can consume an entire CPU core while collecting data! Perhaps this is OK when gathering experimental data (sometimes), but obviously it's not going to be acceptable when you're performance firefighting on a CPU bottlenecked system and need to get a truer understanding of the elapsed times without hammering your system in the process. But if you'd like, you can download this free tool here.

For months now, I have been planning on creating better SQL elapsed time collector. A tool that ever-so-lightly collects elapsed times for a given statement's sql_id. At the time of this blog entry, a number of DBAs are beta testing and I have some preliminary data from some of them already.

But you may be thinking: What about other methods of gathering elapsed times? There are other methods and below I'll discuss both instrumentation and tracing. And then, I'll compare them with the new tool that's in beta as I type.

What about instrumentation and tracing?

Both instrumentation and tracing can produce very good elapsed time samples. When using instrumentation, the SQL you are interested in must (obviously) be instrumented. Not the module, action, user, or chunk of code, but the actual SQL statement. This is what I'm talking about:

Get time T0
Run SQL statement
Get time T1
Save elapsed time as T1-T0

Tracing is another option that produces solid elapsed times. To prove this to myself, I enabled tracing for a specific sql_id like this:
alter system set events 'sql_trace [sql:5hy19uf6q4unx]';
exec dbms_lock.sleep(60);
alter system set events 'sql_trace off';
I also wrote a bourne shell script trace file processor, which given the sql_id will produce all the sql_id elapsed times contained with the trace file. It was written for Linux based on an Oracle 11.2 trace file, but I suspect can be easily modified if necessary. Click here to view the shell script.

To really prove to myself that these methods worked, I performed an experiment. You can download the master experiment script here. I created a CPU bottleneck (top wait event was latch: cache buffer chain), instrumented the SQL and ran it as I just mentioned above (i.e., get time t0, etc.) while also tracing just that statement. I did the entire experiment twice; each time with a different average sleep time between the executions for the SQL statement I was gathering elapsed times for. The sleep times were log normal distributed, which is much more realistic than a contestant (i.e., uniform) or normally distributed sleep times. Each of the two sample times were around 5 minutes. The experimental setup worked wonderfully.
Figure 1.
Figure 2.
Figure 1 shows a summary of data collected when the sleep time averaged around 2 seconds and Figure 2 data was collected when the sleep times averaged around 1.8 seconds. The load was also a little more intense during the Figure 2 collection, hence the elapsed times are slightly longer. Within Figure 1 and Figure 2, notice how similar the captured elapsed times are! Statistically speaking, there is no difference between them (alpha=0.05, p-value 0.980) This means both methods will yield the same test results... you pick which one works the best for you!
Figure 3.
Figure 3 really shows the story! Figure 3 above contains two smoothed histograms based on the data in Figure 2 above. Notice the lines are difficult to distinguish. This means both the instrumentation and SQL tracing elapsed time strategies resulted in nearly the exact same results. And that is what the statistics also told us.

The point of Figure 1, Figure 2, and Figure 3 is both instrumentation and SQL tracing produce good and the same elapsed time data.

That's great if you're application is instrumented or you can (or want to) enable SQL tracing... even at the sql_id level. But perhaps this is not a production system option? Is there another option? Read on...

Another option: The OraPub Elapsed Time Sampler

If you need good SQL statement elapsed times and the SQL of interest is not instrumented and tracing is not a viable option, then consider the OraPub Elapsed Time Sampler. The sampling overhead is minuscule, it is simple to use, it is customizable, and it is amazingly accurate. Here is the link to the tool's web-page.  As the web-site states (at this time), if you would like to beta test the tool, please email me at orapub.general@gmail.com. Once the beta testing is completed, the tool will be available on my web-site, but for a few dollars.

Little sampling overhead.

All performance tools place some overhead on the systems they are monitoring. I decided to actually observe and honeslty report the overhead. The product incorporates several strategies to reduce the overhead. In the initial beta version there are also three precision options, which impacts the gathering overhead. The precision options are low, normal, and high.

When the product is looking for the specific SQL to complete, the CPU impact is around 1.2%, 1.2% (not a typo) and 20% on a single core for the low, normal, and high precision options respectively. This means if you have a 4 core server and gathering at the low precision the collection server process would be consuming 0.3% of the CPU resources. When the tool is looking for the SQL to monitor, the CPU overhead is 12%, 56%, and 100% on a single CPU core for the low, normal and high precision options respectively. This means if you have a 4 core server and gathering at the low precision, the collection server process would be consuming 3% of the CPU resources (3% of a core = 12% / 4 cores).

I typically use the low precision setting because, as you'll see below, even at this level the results are stunningly accurate. Keep in mind, these numbers are for the initial beta version (3c) and I'm still working to reduce the overhead. I also planning on creating a super-low precision/impact setting.

Simple to use.

First you get the sql_id you want to monitor. (If you found a problem SQL statement in a Statspack or AWR report, the sql_id will be right there in the report.) Then determine the sampling precision and duration. And finally, execute the sampling like this:
exec op_sample_elapsed_v3.sample(600,'5hy19uf6q4unx','low','none','key');
When the sampling is complete, simply query from the sample data table, like this:

SQL> select elapsed_time_s from op_elapsed_samples;

ELAPSED_TIME_S
--------------
      1.767666
      1.518642
      1.518561
      ...

Simple to install.

You create the sample data table and create a single package. Installation done!

Amazingly accurate.

This is the really cool part! My core objectives where low overhead and to be accurate "enough." As I mentioned above, the overhead is virtually zero. It's easy to collect accurate data when the elapsed times are long, perhaps greater than 20 seconds. But for statements just a second or two long...it becomes very difficult to maintain the balance of low overhead and accuracy.

You can download all the experimental data, some of what I show in this blog entry: NORMAL precision data and analysis (PDF, Mathematica notebook) and LOW precision data and analysis (PDF, Mathematica notebook). You can also view/download the my master experiment text file, which is what I copy and paste from when I ran the experiment.

The tool's collected elapsed times are extremely accurate. While I didn't mention this above (would have been a distraction), while gathering the sample data summarized in Figure 1, Figure 2, and Figure 3, I was also gathering elapsed time samples using this tool... I just didn't show that data in those figures. (sneaky, I know) Here is data in both table and histogram format.
Figure 4. Normal precision and overhead.
Figure 5. Normal precision and overhead.
Figure 4 and Figure 5 contains all the data from a 5 minute sample interval at the normal precision level. The sleep time between the SQL statement I was looking for was around 2 seconds (log normal distributed). As you probably inferred by looking at Figure 4 and Figure 5, statistically there is no difference between the elapsed time gathering methods. The smoothed historgram (Figure 5) clearly shows there is virtually no difference in the collection methods.

But what about the low precision setting? After all, the lowest precision setting places a near undectable load on the system. Figure 6 and Figure 7 show the results.
Figure 6. Low precision and overhead.
Figure 7. Low precision and overhead.
By looking at Figure 4, Figure 5, Figure 6 and Figure 7 you probably inferred that any of the collection methods will work fine and produce the same results (statistically speaking, alpha 0.05). Yes, this is correct!

To summarize the OraPub SQL Elapsed Time Sampler option: If your SQL is not instrumented and SQL tracing is not a production option and you have the spending authority of around a box of candy, then this product should satisfy your requirements. In fact, any of the precision settings will produce accurate results along with a shockingly (or perhaps refreshingly) low sampling overhead.

Again, here's the link to the tool's web page.

Send me your data!

If you would like, you can email me your elapsed time data and I'll run it through a Mathematica notebook, which will crank out a number of graphs and tables. Usually the results are very informative and immensely satisfying.

In Summary...

My objective in this blog entry was not to push my new tool (really). I was just as interested in understanding the accuracy and similarity of all three collection methods (instrumentation, tracing, and OraPub's sampling tool). As I mentioned at the top, its easy to get the SQL statement average elapsed time...but that can be very misleading and not all that helpful. What is needed are good elapsed time samples. This blog entry presents three ways to get really, really good elapsed times with relatively low overhead and at virtually no cost.

Thanks for reading!

Craig.



If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.