Tuesday, December 10, 2013

Tool Options: Detailing Oracle Process CPU Consumption

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Detailing an Oracle Database process's CPU consumption is amazing, a lot of fun, and can lead to some “ah ha” moments. To make this posting daily digestible, I’m breaking it up into a multi-part series. Today I’m posting the second part entitle, Tools Options. Here we go…

Tool Options


As I mentioned in my previous post entitled, The Situation, there is bad news and there is good news. The bad news, to my knowledge, Oracle does not collect process CPU consumption at the Oracle kernel level. For sure, Oracle collects Oracle process CPU consumption, but it is the total CPU consumption, with the key word being total. What I want is process CPU consumption at the Oracle kernel function level.

When I started my quest I knew there had to be some good news! With all the OS tools out there, there must be one that will meet my requirements. There were four tools that caught my attention: strace/truss, dtrace, gdb, and perf. Here is what I learned in my quest.

System Call Tracing Using STRACE/TRUSS


I started with strace because I am very familiar with it. For many years I have traced Oracle processes at the operating system level using strace. (If you are on Solaris, the command is truss.) It's a fantastic way to learn how Oracle really works. And it's a great teaching tool (which I'm always looking for).

Here is an example of what you can do with strace. Suppose I wanted to know how long it takes an Oracle server process to read a single block from disk. Through Oracle wait interface we would see this as a db file sequential read. To do an Operating system trace on the Oracle server process 2518, I could issue this command:
$ strace –rp 2518
...
     0.000324 clock_gettime(CLOCK_MONOTONIC, {504, 52586559}) = 0
     0.000040 clock_gettime(CLOCK_MONOTONIC, {504, 52625324}) = 0
     0.000040 pread(257, "\6\242\0\f\0"..., 8192, 427270144) = 8192
     0.000047 clock_gettime(CLOCK_MONOTONIC, {504, 52712996}) = 0
     0.000044 clock_gettime(CLOCK_MONOTONIC, {504, 52757393}) = 0
     0.000329 clock_gettime(CLOCK_MONOTONIC, {504, 53086771}) = 0
     0.000040 clock_gettime(CLOCK_MONOTONIC, {504, 53125505}) = 0
     0.000040 pread(257, "\6\76 [y\f\0"..., 8192, 427278336) = 8192
     0.000047 clock_gettime(CLOCK_MONOTONIC, {504, 53213583}) = 0
     0.000040 clock_gettime(CLOCK_MONOTONIC, {504, 53253021}) = 0
     0.000327 clock_gettime(CLOCK_MONOTONIC, {504, 53580561}) = 0
     0.000040 clock_gettime(CLOCK_MONOTONIC, {504, 53619199}) = 0
     0.000040 pread(257, "\6\273\f\0"..., 8192, 427286528) = 8192
     0.000047 clock_gettime(CLOCK_MONOTONIC, {504, 53706779}) = 0
     0.000040 clock_gettime(CLOCK_MONOTONIC, {504, 53752611}) = 0
The beauty of strace is all the system calls are shown in the OS manual pages. For example, if I want to learn about the clock_gettime call, I simply do a "man clock_gettime". I actually did that a few months ago when exploring Oracle Database 12c. My past research has been on 11g and when Oracle asks the OS for the time, I only observed Oracle making the gettimeofday call. What happened to the gettimeofday call? James Morle has a very nice posting about this.

What did I learn from the above? I observed that Oracle is instrumenting a single block read using the clock_gettime call, the server processes is requesting a single 8K block using the pread call, and the first pread shown above took about 0.047 ms (not 0.040 ms).

While that's great, notice there is no reference to CPU consumption and there is no mention of Oracle kernel functions either. In fact, only the "system calls" are shown. Also, based on what the contents of the Linux /proc/sys/kernel/vsyscall64 file is set to, I may not see all the system calls!

So clearly, strace is not going to work.

Detailed Tracing Using DTRACE


If you are a developer on Solaris, you probably have used dtrace.  DTrace is a framework for troubleshooting processes as they are running...and in production! DTrace can be used to capture CPU consumption, IO read times, IO write times, network activity, and many other things. It is amazing!

DTrace programs resemble awk programs in structure; they consist of a list of one or more probes (instrumentation points), and each probe is associated with an action. These probes are comparable to a pointcut in aspect-oriented programming. Whenever the condition for the probe is met, the associated action is executed (the probe "fires"). A typical probe might fire when a certain file is opened, or a process is started, or a certain line of code is executed. The probe may result in some information being displayed.

The good news is, DTrace is very powerful. I think any DBA could see the value in learning how Oracle works by using DTrace. Have you ever wondered how Tanel Poder finds out so much about the Oracle kernel? Tanel is a super big fan of DTrace and you quickly see this when reading his work.

There is a down side though. There is a considerable learning curve. True this rarely stops the performance researcher... but it sure can slows them down. While DTrace is standard for Solaris, it's a little trickier on Linux. The DTrace package and the Unbreakable Enterprise Kernel (UEK) package are available on the Unbreakable Linux Network (ULN), but not on the public yum server. You must register your system with ULN before you can download the required packages. The Oracle Linux Administrator's Solutions Guide for Release 6 lists the packages you need to download from ULN and shows how to install them. ...bummer.

Hmm... So dtrace is not installed and not working by default. That's a problem for me and especially in this quest. Whenever I do my research, I want others to be able to duplicate it. I also usually don't want someone to have to install, configure, and learn about another tool in the process. This combined with the learning curve, put the kibosh on DTrace for this endeavor.

Using the GNU Debugger: GDB


The  purpose  of  a debugger such as GDB is to allow you to see what is going on ‘‘inside’’ another program while it executes—or  what  another program was doing at the moment it crashed.

I have spoken to Frits Hoogland about gdb before and was really impressed with what one can do with it. Frits uses gdb quite a bit, so his blog postings have many great examples. Here is just one good example. At this years IOUG/Collaborate conference in Las Vegas, Frits is going to present a 30 minute Quick Tip on using how DBAs can use gdb. I hope the room is full!

The word "debugger" is a clue that gdb is not going to meet my objective. GDB is great for stepping through Oracle's kernel code (and even interrupting or stopping it!), but it wasn't clear how I could capture an Oracle process's CPU consumption by Oracle kernel function.

As with strace and dtrace, gdb isn't what I'm looking for.

Using the Linux Profiler: Perf


First of all, Perf is a free Linux process profiling tool. When I checked on my pretty-much default OEL 6.2 and Oracle Database 12c box, the perf tools where already there!

On your Linux box, just do:
$ which perf
/usr/sbin/perf
In many ways Perf is like Oracle's ASH facility in that, it samples data. Unlike Oracle's wait interface it is about instrumentation providing timing information. For example, we can ask Perf to sample a process for 5 seconds at the default rate of 1K/sec and to keep a count whenever it sees the process "on CPU". Now suppose over the 5 second interval, the process was "on CPU" for 3K samples. Therefore, we know that that 3/5 of the 5 second interval, that is, 60% of the time it was "on CPU."

Oh yeah... one other thing. Perf keeps track of the "count" at the function level. This means I can determine how much CPU time (more correctly the counts) each Oracle kernel function "consumes." Therefore, it appears Perf will do exactly what I need; keep track of an Oracle's process's CPU consumption down to the Oracle kernel function. Perf-ect! (sorry about that)

To help make this crystal clear, now I can get something like this:
Count    Kernel Function
-------- ---------------
  1500   abc124
   900   td8tr
   200   kcbte
    12   kbce8
------
  2612 counts over a 5 second interval
There is a lot more to Perf than I have mentioned. There are actually suite of Perf tools, that all begin with the word "Perf". For example:

perf stat is used to obtain, that is collect, the event counts.
perf record is used to record the event counts for later reporting.
perf report is used to break down and report on the events process, function, etc.
perf top is like the standard "top" command but provides function level detail for a specific process.

Here is a quick example of "perf stat" collecting the default events for process 28497 for five seconds. Notice the "sleep 5"? The execution duration of the command "sleep 5" is the collection period duration. I could have used another command like, "ls -1" but "sleep 5" is an obvious way to sleep for five seconds.
$ ps -eaf | grep oracleprod35
oracle   28497 28496  8 09:58 ?        00:31:36 oracleprod35 (DESCRIPTION=(LOCAL=YES)...
oracle   42265 28425  0 15:51 pts/1    00:00:00 grep oracleprod35

$ perf stat -p 28497 sleep 5

 Performance counter stats for process id '28497':

      560.686866 task-clock               #    0.112 CPUs utilized          
              32 context-switches         #    0.000 M/sec                  
               1 CPU-migrations           #    0.000 M/sec                  
               0 page-faults              #    0.000 M/sec                  
   1,522,513,670 cycles                   #    2.715 GHz                    [83.11%]
     952,309,947 stalled-cycles-frontend  #   62.55% frontend cycles idle   [83.65%]
     568,827,423 stalled-cycles-backend   #   37.36% backend  cycles idle   [66.93%]
   1,333,413,297 instructions             #    0.88  insns per cycle        
                                          #    0.71  stalled cycles per insn[83.54%]
     249,397,610 branches                 #  444.807 M/sec                  [82.99%]
       4,478,117 branch-misses            #    1.80% of all branches        [83.40%]

      5.000696361 seconds time elapsed
Here is an example of "perf top" for process id 28497 at a 30 second refresh rate.
$ perf top -p 28497 -d 30

PerfTop: 89 irqs/sec  kernel: 5.6%  exact: 0.0% [1000Hz cycles],(target_pid: 28497)
---------------------------------------------------------------------------

             samples  pcnt function                   DSO
             _______ _____ __________________________ __________________________

              205.00  7.6% kcbchg1_main                product/12.1.0/bin/oracle
              148.00  5.5% kcbget                      product/12.1.0/bin/oracle
              124.00  4.6% kcrfw_redo_gen_ext          product/12.1.0/bin/oracle
              114.00  4.2% kdkcmp1                     product/12.1.0/bin/oracle
              100.00  3.7% __intel_ssse3_rep_memcpy    product/12.1.0/bin/oracle
               99.00  3.7% kcrfw_copy_cv               product/12.1.0/bin/oracle
               89.00  3.3% kcoapl                      product/12.1.0/bin/oracle
               84.00  3.1% kcbgcur                     product/12.1.0/bin/oracle
               77.00  2.9% kdiins1                     product/12.1.0/bin/oracle
               62.00  2.3% kduovw                      product/12.1.0/bin/oracle
               56.00  2.1% ktuchg2                     product/12.1.0/bin/oracle
               52.00  1.9% ktugur                      product/12.1.0/bin/oracle
               51.00  1.9% kdimod0                     product/12.1.0/bin/oracle
               50.00  1.9% updrow                      product/12.1.0/bin/oracle
               46.00  1.7% kauupd                      product/12.1.0/bin/oracle
               42.00  1.6% qertbFetch                  product/12.1.0/bin/oracle
               38.00  1.4% kdudcp                      product/12.1.0/bin/oracle
Here is were it really get interesting. I'm going to record just CPU "cycles" for process 28497 for 30 seconds. It's simple:
$ perf record -e cycles -p 28497 sleep 30

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.093 MB perf.data (~4057 samples) ]
At this point the data has been stored in the default perf.data file. Now I'm going to create the default report based on the perf.data file, like this:
$ perf report

# Events: 2K cycles
#
# Overhead          Command      Shared Object                       Symbol
# ........  ...............  .................  ...........................
#
     7.91%  oracle_28497_pr  oracle             [.] kcbchg1_main
     4.87%  oracle_28497_pr  oracle             [.] kcbget
     4.07%  oracle_28497_pr  oracle             [.] kcrfw_redo_gen_ext
     3.86%  oracle_28497_pr  oracle             [.] kdkcmp1
     3.42%  oracle_28497_pr  oracle             [.] __intel_ssse3_rep_memcpy
     3.37%  oracle_28497_pr  oracle             [.] kcoapl
     3.23%  oracle_28497_pr  oracle             [.] kcrfw_copy_cv
     3.08%  oracle_28497_pr  oracle             [.] kcbgcur
     2.56%  oracle_28497_pr  oracle             [.] kdiins1
     2.22%  oracle_28497_pr  oracle             [.] ktuchg2
     2.03%  oracle_28497_pr  oracle             [.] ktbgwl
     2.00%  oracle_28497_pr  oracle             [.] kauupd
     1.88%  oracle_28497_pr  oracle             [.] kduovw
     1.78%  oracle_28497_pr  oracle             [.] kdimod0
     1.56%  oracle_28497_pr  oracle             [.] kdudcp
Oh yeah... I forgot to mention that the "perf report" command has the capability to output the results in comma delimitated format... can anyone say, "external table"? I think you see where I'm going with this. However, we need to get past a potential deal killer first.

Can We Go From Count To Time?


This is actually a very serious issue. Most DBAs will and should pause and consider the situation when jumping from sample counts to time.

Here is an example situation. Suppose over a 30 second interval the Oracle view v$sess_time_model (stat: db cpu) shows a server process consumed 10 seconds of CPU. Also, according to Perf, the Oracle kernel function kcb1 accounts for 60% of the CPU counts and kcb2 accounts for the remaining 40%.

Now the question is, can we attribute 60% of the 10 seconds of CPU time to the Oracle kernel function kcb1 and 40% of the 10 seconds of CPU to kcb2? If so, then kcb1 would be assigned 6 seconds and kcb2 4 seconds.

Using the count-to-seconds math above combined with Perf's default sample rate of 1000/sec, (i.e., 1/ms), then each CPU count is equivalent to 1 ms of CPU consumption. I suspect it is likely either less than 1ms or greater than 1ms. Assuming either direction is just as likely to occur (and I can't prove this to you), then I suspect assessing one count to 1ms of CPU is OK.

Based on the above, I am comfortable jumping from Perf counts to time.

So Where Are We In This Quest?


At this point, in terms of CPU consumption I'd say we are here:
Time Component                                                secs       %
------------------------------------------------------- ---------- -------
cpu : [.] kcbgtcr                                           29.714   66.87
cpu : [.] kdstf000010100001km                                3.716    8.36
cpu : [.] lnxsum                                             3.541    7.97
cpu : [?] sum of funcs consuming less than 2% of CPU ti      2.393    5.38
cpu : [.] kaf4reasrp0km                                      2.180    4.91
As DBAs we already know how to get Oracle wait time. This puts us in the position to combine Oracle wait time and Oracle process CPU consumption by kernel function. Something like this:
Time Component                                                secs       %
------------------------------------------------------- ---------- -------
cpu : [.] kcbgtcr                                           29.714   66.87
cpu : [.] kdstf000010100001km                                3.716    8.36
cpu : [.] lnxsum                                             3.541    7.97
cpu : [?] sum of funcs consuming less than 2% of CPU ti      2.393    5.38
cpu : [.] kaf4reasrp0km                                      2.180    4.91
wait: latch: cache buffers chains                            2.100    4.73
Obviously we are not finished. First, we need to create a tool to collect, merge, and report the above details. And second, we need to gain some understanding about Oracle function names. For example, just what is "kcbgtcr" anyways?! In my next posting I'll cover both of these issues.

But until then, log into your non-production yet active Linux box and some a few "perf top" and a few "perf record" and then "perf report" commands. If it's not production, profile the Oracle log writer or database writer background process or perhaps an active server process!

Feeling dangerous yet?

If you can’t wait until the next posting (or you have lots of time), here are some resources.

1. Presentation: Detailing Oracle CPU Consumption. Download here.
2. Tool: Oracle wait time and kernel function CPU monitor. Download the fulltime.sh script.

Thanks for reading,

Craig.

Friday, November 29, 2013

The Situation: Detailing Oracle Process CPU Consumption

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Detailing an Oracle Database process's CPU consumption is amazing, a lot of fun, and can lead to some “ah ha” moments. To make this posting daily digestible, I’m breaking it up into a multi-part series. Today I’m posting the first part entitle, The Situation. Here we go…

The Situation: The Way It Is...

By now, most Oracle DBAs evaluate Oracle performance based on time, that is a time based analysis or simply TBA. For sure there are other methods. And many methods are in part, time based. But I digress.

The beauty of a time based analysis is it helps us understand what a user is experiencing. Because users experience time, when our analysis is time based we can quantify their experience. Completely? No, but for sure in part. And in part is better than no part at all and much better than many alternatives.

Oracle processes yearn for CPU! It is paramount to understand that an Oracle server or background process either consumes CPU or does not. Currently, there is no third alternative. The “does not consume CPU” is called wait time and the wait time can be classified as either “idle” or “non-idle”. I like to say that Oracle processes want to burn CPU, they want to consume CPU, they yearn for CPU. If they can’t, there must be a very good reason why they most stop and wait before consuming CPU. And it’s the wait events that provide clues as to why CPU can not be consumed.

What Is In The Time?

An Oracle time based analysis includes both Oracle CPU consumption and Oracle wait time. And usually it excludes the time between the Oracle client process and the Oracle server process. This is shock to many, but capturing the time between the Oracle client and server process is not that simple. And it requires more than simply looking at Oracle trace files or v$ views.

The "big bar" figure below displays the "total time." That is, all the Oracle process CPU and wait time over a specific snapshot interval. This is one way to communicate the total CPU and total wait time.


There are a number of ways to collect the basic timing information. Most techniques use Oracle trace files or the Oracle v$ views.

Oracle's wait interface is amazing! Multiple database performance tool vendors will tell you Oracle’s wait interface is far superior than DB2 and SQL Server. Below is a graphical example taken from the Stori product visually showing the top five wait events, that is, all the Oracle wait time for all Oracle processes over a specified snapshot interval grouped by the top five events.


The number of wait events and the detailed data available is amazing. As of Oracle Database 12c (12.1) there are just under 1600 wait events. If this seems overwhelming, most DBAs routinely only work with a handful at a time and probably truly understand even less.

Available CPU Consumption Detail

Well... the detail is limited. However, Gathering CPU consumption is very straightforward via trace files and the Oracle Database v$ views. From the v$ views, CPU consumption can be gathered for a session from v$sess_time_model and for an entire instance from v$sys_time_model. I have blogged numerous times about the quality of these CPU consumption values for both server processes and background processes here and here. (While I will not cover this here, the closer you look at Oracle CPU consumption, the more likely you understand even Oracle is making some CPU consumption assumptions.)

The glaring hole in Oracle time based analysis is Oracle does not provide detailed CPU consumption numbers. For an Oracle process the most detailed information we can obtain from Oracle is the total consumption.

If your boss asked you what Oracle is doing “with all that CPU” the best you can do is provide the total amount and then make inferences based on the Oracle wait events, the operating system situation, the application code and the user experience. That’s a lot of analysis information and obviously allows for a very detailed and spot-on analysis.

But still, answering the “What is Oracle doing with all that CPU?” should in my opinion be much easier to answer and more clear. And compared to the wait interface detail, available CPU consumption detail is simply incomplete. I think as DBAs, we can do better.

What I want is the same level of CPU detail as wait event detail. Or at least something so I can reduce the amount of inferencing required when answering the question, "What is Oracle doing with all that CPU?" It would be nice to gather Oracle CPU consumption for any Oracle process grouped by Oracle kernel function. Below is an example of what I would like to see.


Once I know the Oracle function, then I can find out, with some good detail, what the process is doing while consuming the CPU...and answer “the question.”

Is This Really Worth My Time?

Perhaps not. But consider these benefits. As I mentioned previously, I want to better answer the question, "What is Oracle doing with all that CPU?" Another benefit is to confirm the performance story. For example, if there is a parsing issue, I will expect to see some type of shared pool and/or library cache related CPU consumption. Another less obvious benefit is understanding the performance situation when a small amount of wait time is present. Without the wait time, our inferencing capability is limited. Knowing the CPU consumption details can negate the absence of wait time detail limitation. And finally, you may find an Oracle bug.

Next Steps

This is a good place to end this posting. In the next posting, we will explore the tools available to gather Oracle process CPU consumption down to the Oracle kernel function.

But if you can’t wait until the next posting (or you have lots of time), here are some resources.

1. Presentation: Detailing Oracle CPU Consumption. Download here.
2. Tool: Oracle wait time and kernel function CPU monitor. Download the fulltime.sh script.

Thanks for reading,

Craig.

Wednesday, October 9, 2013

Fixed in 12c? Instance CPU Consumption Reporting via v$sysstat

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Is Oracle 12c Instance Level CPU Consumption Reporting Fixed?


I was curious if Oracle had fixed instance level v$sysstat CPU consumption statistic reporting in 12c. Why was a curious? Because in the past, the Oracle Database 6, 7, 8i, 9i, 10g, 11g have all incorrectly reported instance wide CPU consumption when sourced from the v$sysstat view. Also, when Stori (OraPub's performance analysis product) pulls CPU consumption, I've got to be confident in the result!

As reported in my Firefighting Book, my tests have shown the time model view (v$sess_time_model, v$sys_time_model) CPU consumption for both foreground and background processes are spot on. The lesson was to always use v$sys_time_model for CPU consumption.

Has this been fixed in Oracle Database 12c? This is what I'm going to find out!

If you don't want to get into the details, here is the results of my experiments:
All three workloads tested showed Oracle Database 12c reported instance wide CPU consumption incorrectly when sourced from the v$sysstat view statistic, CPU used by this session.
How did I arrive at this conclusion? Read on... 

On a related note, last March I posted the results of an experiment about the correctness of SQL statement CPU times. The experiment compared SQL CPU times from four different sources. The results were stunning.

Experimental Setup


My test is simple. So simple you can copy and paste the below script into your system, as long as it is Oracle Database 10g or higher. You can download the script HERE and it shown below.

$ cat cpu_sys_delta_expr.sql
set tab off
alter system set resource_manager_plan = '';
show parameter resource_manager_plan

drop table op_bogus;
create table op_bogus (sample_no number, ss_sec number, tm_db_cpu_sec number, tm_bg_cpu_sec number);

begin
declare
  chill_var   number := 180;
  max_sample_var  number := 31;
  sample_number_var  number;

  t0_ss_sec_var number;
  t0_tm_db_cpu_sec_var number;
  t0_tm_bg_cpu_sec_var number;
  t1_ss_sec_var number;
  t1_tm_db_cpu_sec_var number;
  t1_tm_bg_cpu_sec_var number;
  delta_ss_sec_var number;
  delta_tm_db_cpu_sec_var number;
  delta_tm_bg_cpu_sec_var number;
begin
  delete from op_bogus;

  for sample_number_var in 1..max_sample_var
  loop

    select ss.value/100 , 
           stmfg.value/1000000 ,
           stmbg.value/1000000 
    into   t0_ss_sec_var, t0_tm_db_cpu_sec_var, t0_tm_bg_cpu_sec_var
    from   v$sysstat ss,
           v$sys_time_model stmbg,
           v$sys_time_model stmfg
    where  stmfg.stat_name = 'DB CPU'
      and  stmbg.stat_name = 'background cpu time'
      and  ss.name = 'CPU used by this session';

    dbms_lock.sleep(chill_var);

    select ss.value/100 , 
           stmfg.value/1000000 ,
           stmbg.value/1000000 
    into   t1_ss_sec_var, t1_tm_db_cpu_sec_var, t1_tm_bg_cpu_sec_var
    from   v$sysstat ss,
           v$sys_time_model stmbg,
           v$sys_time_model stmfg
    where  stmfg.stat_name = 'DB CPU'
      and  stmbg.stat_name = 'background cpu time'
      and  ss.name = 'CPU used by this session';

    delta_ss_sec_var        := t1_ss_sec_var - t0_ss_sec_var ;
    delta_tm_db_cpu_sec_var := t1_tm_db_cpu_sec_var - t0_tm_db_cpu_sec_var ;
    delta_tm_bg_cpu_sec_var := t1_tm_bg_cpu_sec_var - t0_tm_bg_cpu_sec_var ;

    insert into op_bogus values ( sample_number_var, delta_ss_sec_var, 
      delta_tm_db_cpu_sec_var, delta_tm_bg_cpu_sec_var );

    commit;
  end loop;
end;
end;
/

select sample_no, ss_sec-(TM_DB_CPU_SEC+TM_BG_CPU_SEC) diff_sec
from   op_bogus
order by 1
/

col np noprint

prompt sysstat results
select sample_no np, ss_sec||',' from op_bogus order by 1;

prompt time model  results
select sample_no np, (TM_DB_CPU_SEC+TM_BG_CPU_SEC)||',' from op_bogus order by 1;

Unlike the time model's v$sys_time_model view, the v$sysstat view does not categorize CPU consumption by foreground and by background process. It simply places "all" the CPU consumption into the single statistics, CPU used by this session. Therefore, to compare the CPU consumption values I must sum the v$sys_time_model DB CPU and background cpu time statistics when comparing to the v$sysstat CPU used by this session statistic.

Using my free "opload" toolkit (which you can download HERE), I placed three distinct loads (details in next section) on my Oracle Database 12c (12.1.0.1) system. It's a 64 bit six core Linux (2.6.32-300.3.1.el6uek.x86_64) box.

I only recycled the instance before the first workload change. However, between each workload change, I killed all the server/foreground processes and let the system settle down, a verified the settling via the OS command vmstat. For each of the 3 workloads, 31 samples at 180 seconds each were gathered, resulting in 93 samples for a total of 16740 seconds (4.65 hours) of experimental data.

Experimental Results


To summarize again, Oracle Database 12c v$sysstat still incorrectly reports Oracle process CPU consumption. Here are the experimental details, which are explained in the sections below.


I chose to do the experimental analysis in Mathematica. All the raw experimental results (which are in text files), the Mathematica notepads in both native Mathematica and PDF, the charts, and the data collection script have been zipped into a single file that can be downloaded HERE.  There is a LOT more detail in the Mathematic notepads.

The Workloads

Total instance CPU was gathered from three distinct loads were used: DML, Pin S, and Mixed. The DML load consisted of multiple sessions updated and inserting and deleting rows. The Pin S load was the result of multiple sessions opening and closing cursors. There were enough sessions doing this to causes sessions to wait on simply pinning the cursor in shared mode, hence the wait "cursor: pin S". The Mixed workload contains sessions doing queries and some DML. I didn't hammer the system to death, but there was a nice workload mix.

Load: DML

Summary: Oracle Database 12c v$sysstat and v$sys_time_model CPU consumption is reported differently.

Details: The average instance CPU consumption for the 31 samples (180 seconds each) from v$sysstat is 39.12 seconds and the sample set is not normally distributed (p-value = 0.000). The mean for v$sys_time_model is 41.24 and the sample set is normally distributed (p-value = 0.469).

While that's only a 1.85 second difference, it's also a 4.60% difference. Because both sample set distributions are not normally distributed, a location equivalency test was used to determine if the differences in the means is statistically significant. Mathematica choose to do a Kruskal-Vallis test resulting in a p-value of 0.000 meaning the sample set CPU times from v$sysstat and v$sys_time_model are statistically different. Shown together below are their smoothed histograms.
This image above shows the two DML sample sets (blue:v$sysstat, red:v$sys_time_model) smooth histograms. Statistically the two samples are different and numerically their averages are 4.6% different. 

Load: Pin S

Summary: Oracle Database 12c v$sysstat and v$sys_time_model CPU consumption is reported differently.

Details: The average instance CPU consumption for the 31 samples (180 seconds each) from v$sysstat is 1072.90 seconds and the sample set is not normally distributed (p-value = 0.000). The mean for v$sys_time_model is 1073.17 and the sample set is not normally distributed (p-value = 0.000).

That is only a 0.62 second difference, which is also a 0.10% difference. Because both sample set distributions are not normally distributed, a location equivalency test was used to determine if the differences in the means is statistically significant. Mathematica choose to do a Kruskal-Vallis test resulting in a p-value of 0.000 meaning the sample set CPU times from v$sysstat and v$sys_time_model are statistically different. Shown together below are their smoothed histograms.
This image above shows the two Pin S sample sets (blue:v$sysstat, red:v$sys_time_model) smooth histograms. Statistically the two samples are different and numerically their averages are 0.10% different. 

Load: Mixed

Summary: Oracle Database 12c v$sysstat and v$sys_time_model CPU consumption is reported differently.

Details: The average instance CPU consumption for the 31 samples (180 seconds each) from v$sysstat is 477.24 seconds and the sample set is not normally distributed (p-value = 0.026). The mean for v$sys_time_model is 518.52 and the sample set is normally distributed (p-value = 0.530).

That is a shocking 42.43 second difference, which is also a 8.55% difference! Because both sample set distributions are not normally distributed, a location equivalency test was used to determine if the differences in the means is statistically significant. Somewhat interestingly Mathematica chose to do a K-sample Anderson–Darling test resulting in a p-value of 0.000 meaning the sample set CPU times from v$sysstat and v$sys_time_model are statistically different. Shown together below are their smoothed histograms.
This image above shows the two Mixe sample sets (blue:v$sysstat, red:v$sys_time_model) smooth histograms. Statistically the two samples are different and numerically their averages are 8.55% different. 

What Does This Mean?


There are three basic "take aways" from this experiment:
  1. Oracle Database 12c reports instance level CPU times from v$sysstat incorrectly.
  2. Based on my tests from 11g and reported in my Oracle Performance Firefighting book, instance CPU consumption based on v$sys_time_model is correct.
  3. When gathering instance wide CPU consumption, use the time model view v$sys_time_model.

Thanks for reading!

Craig.

Tuesday, September 17, 2013

Fixed in 12c? Session CPU Consumption Statistics

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

What? Fixed? Problem? Huh?

While going through and checking and updating my Oracle Performance Firefighting course for Oracle Database 12c, I once again compared session level CPU consumption from v$session and v$sess_time_model. In the past v$sesstat server process CPU consumption could be significantly less than what is shown in v$sess_time_model. Also, many background processes showed zero CPU consumption in v$sesstat while v$sess_time_model clearly showed CPU consumption. The lesson was to always use v$sess_time_model for CPU consumption.

Though I have not tested this in 12c yet, in pre-12c releases this difference is very apparent when comparing values from v$sysstat and v$sys_time_model. To check this out for yourself, look at one of your Statspack or AWR reports. You will notice the Instance Statistic CPU used by this session usually to be shockingly less then the Time Model statistic DB CPU.

Should I Care?

If your Oracle performance analysis is highly quantitative driven  (which I hope it is) you need accurate CPU consumption values. ... you should care.

This also has been an issue for those developing performance tuning products! OraPub's Stori product (which helps you find, understand and solve Oracle performance problems) is based highly on quantitative analysis and therefore understanding the raw statistics reliability is extremely important.

This posting will take a quick stroll though Oracle foreground and background process CPU consumption reporting, how 11g and 12c reporting compare, and how you can easily check out the situation on your system. Read on...

So Much To Write About 12c!

I have so much to write about. There are a number of very interesting changes in the Oracle Database 12c release. If you have downloaded my recent NOUG presentation containing some of my initial 12c research, you'll know what I mean. What I presented was raw and not thoroughly tested...but I told you that. It will take me months before I get to the bottom of all of changes and post the results here, convert that into conference presentations and merge into my Oracle performance classes!

Note: When you attend my Oracle Performance classes you will get the most recent results of my research. Even material that I have not brought to conclusion I will talk about... but I'll tell you my "confidence level" and why.

Experiment Setup

My test is simple. So simple you can copy and paste the below script into your system, as long as it is Oracle Database 10g or higher. You can download the script HERE and it shown below.
$ cat cpu_compare.sql

rem Session CPU consumption statistic compare
rem Craig Shallahamer, craig@orapub.com

set tab off
col sid         format 99990
col ss_sec      format 99990.00
col program     format a22 trunc
col db_cpu_sec  format 99990.00
col bg_cpu_sec  format 99990.00

select ss.sid, se.program, ss.value/100 ss_sec,
       stmfg.value/1000000 db_cpu, stmbg.value/1000000 bg_cpu
from   v$session se,
       v$sesstat ss,
       v$statname sn,
       v$sess_time_model stmbg,
       v$sess_time_model stmfg
where  se.sid = ss.sid 
  and  se.sid = stmfg.sid
  and  se.sid = stmbg.sid 
  and  stmfg.stat_name = 'DB CPU'
  and  stmbg.stat_name = 'background cpu time'
  and  ss.statistic# = sn.statistic#
  and  sn.name = 'CPU used by this session'
order by 3 desc, 5 desc, 1
/
Interesting, the 12c instance statistic "CPU used by this session" is statistic number 17 in 12c, yet it is 14 in 11g.  NEVER hard code statistic numbers into your scripts!!!

Using my free "opload" toolkit (which you can download HERE), I placed a number of different loads on my Oracle Database 12c (12.1.0.1) system. It's a 64 bit six core Linux (2.6.32-300.3.1.el6uek.x86_64) box.

It is important to know when looking at the experimental results that I only recycled the instance before the first test. However, between each load change, I killed all the server/foreground processes. Why is this important? Because you may notice some of the background processes having a shockingly large percentage of CPU consumption... this is because they have been running much longer than the server processes.

After killing the server processes and restarting a new and different load, I let the system stabilize for a minute and then waited (patiently) at least three minutes (usually 5+) before running the above script.

The CPU Statistics

A little background on what the statistics mean. Each statistic shown represents the CPU consumption since the session has connected. What is unique about the time model view is there is a statistic for both foreground/server processes and also for background processes. Since the time model view was introduced in 10g, I have always seen the values placed in the correct bucket. The raw CPU time model values are in microseconds. If you look at my script above, you may have noticed I divided the raw statistic value by 1000000 to convert its value into seconds. In contract, the v$sesstat values are in hundredths of a second, that is centi-seconds and therefore I divided by 100.

In the early 10g years, I did quite a bit of research on the trustworthiness of the time model CPU consumption values. The results showed time model CPU values matches the operating system. This is documented in my book, Oracle Performance Firefighting. (Which is still a powerful performance book for 12c by the way...)

In each of the loads below, I ran the above cpu_compare.sql script. In every situation, you can easily see there is a near perfect match...unless the CPU time is simply not reported!

Experimental Results

I ran a number of different loads; light buffer get, intense buffer get, free buffer wait load (physical reads, dml, small buffer cache), physical read (small buffer cache with some IO read related wait time), and an intense DML load to stress the redo mechanism. I figure if the CPU consumption results are consistent for all these workloads, the conclusions will be solid.

The order I present the results below is the same order in which I changed the load and collected the data. It is good know because, after the first experiment, you can see the influence of the load on the background process CPU consumption change.

The "load" sessions are the "sqlplus@sixcore (TNS..." sessions, but one of them is the reporting script being run. For example, for the first load (see directly below) there are three sessions. The first two shown are the load sessions and further the other is the session that runs the cpu_compare.sql script.

Load: Light Buffer Get (LIO) Load

I created a buffer cache large enough to cache all the data, started two Oracle sessions, the CPU utilization was around 5% with a CPU run queue between 0 and 2 (6 core box), and there was virtually no IO reported at the OS level (vmstat "wait for IO" statistic was 0).
SQL> @cpu_compare.sql

   SID PROGRAM                   SS_SEC DB_CPU_SEC BG_CPU_SEC
------ ---------------------- --------- ---------- ----------
   403 sqlplus@sixcore (TNS V   7089.64   7089.65      0.00
   167 sqlplus@sixcore (TNS V    385.94    385.96      0.00
   162 oracle@sixcore (MMNL)      30.52      0.00     30.96
   320 oracle@sixcore (CJQ0)      13.20      0.00     13.20
    82 oracle@sixcore (MMON)       4.67      0.00      4.67
   319 oracle@sixcore (SMON)       1.55      0.00      1.55
     7 sqlplus@sixcore (TNS V      1.32      1.33      0.00
    86 oracle@sixcore (W000)       0.94      0.00      0.97
     5 oracle@sixcore (W001)       0.90      0.00      0.93
   324 oracle@sixcore (W002)       0.70      0.00      0.71
   322 oracle@sixcore (W003)       0.61      0.00      0.64
   239 oracle@sixcore (DBRM)       0.52      0.00      1.79
   163 oracle@sixcore (FBDA)       0.51      0.00      0.51
    80 oracle@sixcore (Q002)       0.46      0.00      0.46
   399 oracle@sixcore (RECO)       0.32      0.00      0.32
   318 oracle@sixcore (DIA0)       0.00      0.00     24.29
    81 oracle@sixcore (CKPT)       0.00      0.00     10.03
   398 oracle@sixcore (DBW0)       0.00      0.00      5.87
   238 oracle@sixcore (PSP0)       0.00      0.00      5.80
   159 oracle@sixcore (PMON)       0.00      0.00      5.44
   397 oracle@sixcore (GEN0)       0.00      0.00      1.77
   160 oracle@sixcore (DIAG)       0.00      0.00      1.43
     2 oracle@sixcore (LGWR)       0.00      0.00      1.31
    85 oracle@sixcore (SMCO)       0.00      0.00      1.17
     3 oracle@sixcore (LREG)       0.00      0.00      1.12
     4 oracle@sixcore (TT00)       0.00      0.00      0.95
     1 oracle@sixcore (MMAN)       0.00      0.00      0.91
   400 oracle@sixcore (QM02)       0.00      0.00      0.43
   396 oracle@sixcore (TMON)       0.00      0.00      0.39
   242 oracle@sixcore (AQPC)       0.00      0.00      0.32
   161 oracle@sixcore (LG00)       0.00      0.00      0.00
   166 oracle@sixcore (Q003)       0.00      0.00      0.00
   240 oracle@sixcore (LG01)       0.00      0.00      0.00
   244 oracle@sixcore (VKRM)       0.00      0.00      0.00
   317 oracle@sixcore (VKTM)       0.00      0.00      0.00

35 rows selected.

Load: Significant Buffer Get (LIO) Load, Minimal Contention

I created a buffer cache large enough to cache all the data, started four Oracle sessions, the CPU utilization was around 30% with a CPU run queue between 1 and 4 (6 core box), and there was virtually no IO reported at the OS level (vmstat "wait for IO" statistic was 0)..
SQL> @cpu_compare

   SID PROGRAM                   SS_SEC DB_CPU_SEC BG_CPU_SEC
------ ---------------------- --------- ---------- ----------
     8 sqlplus@sixcore (TNS V   1111.14    1111.16       0.00
    90 sqlplus@sixcore (TNS V   1110.99    1111.00       0.00
   246 sqlplus@sixcore (TNS V    187.39     187.24       0.00
   405 sqlplus@sixcore (TNS V    179.85     179.72       0.00
   162 oracle@sixcore (MMNL)      31.38       0.00      31.86
   320 oracle@sixcore (CJQ0)      13.68       0.00      13.69
    82 oracle@sixcore (MMON)       4.79       0.00       4.79
   319 oracle@sixcore (SMON)       1.59       0.00       1.59
    86 oracle@sixcore (W000)       0.99       0.00       0.99
     5 oracle@sixcore (W001)       0.94       0.00       0.95
   324 oracle@sixcore (W002)       0.72       0.00       0.73
   322 oracle@sixcore (W003)       0.66       0.00       0.66
   239 oracle@sixcore (DBRM)       0.52       0.00       1.86
   163 oracle@sixcore (FBDA)       0.52       0.00       0.52
    80 oracle@sixcore (Q002)       0.46       0.00       0.47
   399 oracle@sixcore (RECO)       0.32       0.00       0.32
   403 sqlplus@sixcore (TNS V      0.03       0.02       0.00
   318 oracle@sixcore (DIA0)       0.00       0.00      25.00
    81 oracle@sixcore (CKPT)       0.00       0.00      10.33
   238 oracle@sixcore (PSP0)       0.00       0.00       5.96
   398 oracle@sixcore (DBW0)       0.00       0.00       5.93
   159 oracle@sixcore (PMON)       0.00       0.00       5.57
   397 oracle@sixcore (GEN0)       0.00       0.00       1.87
   160 oracle@sixcore (DIAG)       0.00       0.00       1.48
     2 oracle@sixcore (LGWR)       0.00       0.00       1.36
    85 oracle@sixcore (SMCO)       0.00       0.00       1.22
     3 oracle@sixcore (LREG)       0.00       0.00       1.15
     4 oracle@sixcore (TT00)       0.00       0.00       0.98
     1 oracle@sixcore (MMAN)       0.00       0.00       0.93
   400 oracle@sixcore (QM02)       0.00       0.00       0.44
   396 oracle@sixcore (TMON)       0.00       0.00       0.40
   242 oracle@sixcore (AQPC)       0.00       0.00       0.33
   161 oracle@sixcore (LG00)       0.00       0.00       0.00
   166 oracle@sixcore (Q003)       0.00       0.00       0.00
   240 oracle@sixcore (LG01)       0.00       0.00       0.00
   244 oracle@sixcore (VKRM)       0.00       0.00       0.00
   317 oracle@sixcore (VKTM)       0.00       0.00       0.00

37 rows selected.

Load: Significant Buffer Get (LIO) Load, Significant Contention

I created a buffer cache large enough to cache all the data, started twelve Oracle sessions, the CPU utilization was between 85% to 96% with a CPU run queue between 4 and 8 (6 core box), and there was virtually no IO reported at the OS level (vmstat "wait for IO" statistic was 0).
SQL> @cpu_compare

   SID PROGRAM                   SS_SEC DB_CPU_SEC BG_CPU_SEC
------ ---------------------- --------- ---------- ----------
    90 sqlplus@sixcore (TNS V   1643.65    1643.66       0.00
     8 sqlplus@sixcore (TNS V   1642.70    1642.72       0.00
   167 sqlplus@sixcore (TNS V    470.41     472.35       0.00
   404 sqlplus@sixcore (TNS V    469.40     471.35       0.00
   168 sqlplus@sixcore (TNS V    466.80     466.82       0.00
     9 sqlplus@sixcore (TNS V    461.92     464.99       0.00
   246 sqlplus@sixcore (TNS V    281.84     281.68       0.00
   405 sqlplus@sixcore (TNS V    275.22     275.23       0.00
    88 sqlplus@sixcore (TNS V     89.71      89.55       0.00
     7 sqlplus@sixcore (TNS V     88.16      88.16       0.00
   245 sqlplus@sixcore (TNS V     86.97      86.97       0.00
   326 sqlplus@sixcore (TNS V     86.58      86.40       0.00
   162 oracle@sixcore (MMNL)      31.83       0.00      32.33
   320 oracle@sixcore (CJQ0)      13.82       0.00      13.82
    82 oracle@sixcore (MMON)       4.85       0.00       4.85
   319 oracle@sixcore (SMON)       1.60       0.00       1.60
    86 oracle@sixcore (W000)       1.00       0.00       1.01
     5 oracle@sixcore (W001)       0.95       0.00       0.96
   324 oracle@sixcore (W002)       0.72       0.00       0.74
   322 oracle@sixcore (W003)       0.66       0.00       0.67
   163 oracle@sixcore (FBDA)       0.53       0.00       0.53
   239 oracle@sixcore (DBRM)       0.52       0.00       1.87
    80 oracle@sixcore (Q002)       0.46       0.00       0.47
   403 sqlplus@sixcore (TNS V      0.36       0.35       0.00
   399 oracle@sixcore (RECO)       0.33       0.00       0.33
   318 oracle@sixcore (DIA0)       0.00       0.00      25.41
    81 oracle@sixcore (CKPT)       0.00       0.00      10.48
   238 oracle@sixcore (PSP0)       0.00       0.00       6.04
   398 oracle@sixcore (DBW0)       0.00       0.00       5.96
   159 oracle@sixcore (PMON)       0.00       0.00       5.64
   397 oracle@sixcore (GEN0)       0.00       0.00       1.90
   160 oracle@sixcore (DIAG)       0.00       0.00       1.51
     2 oracle@sixcore (LGWR)       0.00       0.00       1.40
    85 oracle@sixcore (SMCO)       0.00       0.00       1.25
     3 oracle@sixcore (LREG)       0.00       0.00       1.16
     4 oracle@sixcore (TT00)       0.00       0.00       0.99
     1 oracle@sixcore (MMAN)       0.00       0.00       0.95
   400 oracle@sixcore (QM02)       0.00       0.00       0.45
   396 oracle@sixcore (TMON)       0.00       0.00       0.41
   242 oracle@sixcore (AQPC)       0.00       0.00       0.33
   161 oracle@sixcore (LG00)       0.00       0.00       0.00
   166 oracle@sixcore (Q003)       0.00       0.00       0.00
   240 oracle@sixcore (LG01)       0.00       0.00       0.00
   244 oracle@sixcore (VKRM)       0.00       0.00       0.00
   317 oracle@sixcore (VKTM)       0.00       0.00       0.00

45 rows selected.

Load: Significant Physical Read IO (PIOR) Load, Significant IO Contention

I created a very small buffer cache to force lots of physical IO reads plus some DML activity, started six Oracle sessions, the CPU utilization was between 50% to 75% with a CPU run queue between 2 and 6 (6 core box), and the vmstat "wait for IO" statistic was between 4 and 15.
SQL> @cpu_compare

   SID PROGRAM                   SS_SEC DB_CPU_SEC BG_CPU_SEC
------ ---------------------- --------- ---------- ----------
    88 sqlplus@sixcore (TNS V    781.91     781.91       0.00
    91 sqlplus@sixcore (TNS V    775.41     775.43       0.00
   167 sqlplus@sixcore (TNS V    774.11     773.95       0.00
   404 sqlplus@sixcore (TNS V    774.07     774.63       0.00
    10 sqlplus@sixcore (TNS V    772.18     772.18       0.00
   162 oracle@sixcore (MMNL)      33.40       0.00      33.96
   247 sqlplus@sixcore (TNS V     24.38      24.38       0.00
   407 sqlplus@sixcore (TNS V     22.15      22.10       0.00
     7 sqlplus@sixcore (TNS V     20.74      20.64       0.00
   320 oracle@sixcore (CJQ0)      14.80       0.00      14.81
    82 oracle@sixcore (MMON)       5.07       0.00       5.08
   319 oracle@sixcore (SMON)       1.66       0.00       1.67
     5 oracle@sixcore (W001)       1.31       0.00       1.32
    86 oracle@sixcore (W000)       1.30       0.00       1.31
   322 oracle@sixcore (W003)       0.94       0.00       0.94
   324 oracle@sixcore (W002)       0.90       0.00       0.90
   163 oracle@sixcore (FBDA)       0.55       0.00       0.56
   239 oracle@sixcore (DBRM)       0.52       0.00       1.97
    80 oracle@sixcore (Q002)       0.48       0.00       0.49
   399 oracle@sixcore (RECO)       0.34       0.00       0.34
   245 oracle@sixcore (W004)       0.04       0.00       0.04
   169 sqlplus@sixcore (TNS V      0.01       0.02       0.00
    81 oracle@sixcore (CKPT)       0.00       0.00     212.42
   318 oracle@sixcore (DIA0)       0.00       0.00      26.89
   398 oracle@sixcore (DBW0)       0.00       0.00      15.20
   238 oracle@sixcore (PSP0)       0.00       0.00       6.31
   159 oracle@sixcore (PMON)       0.00       0.00       5.88
     2 oracle@sixcore (LGWR)       0.00       0.00       2.75
   397 oracle@sixcore (GEN0)       0.00       0.00       2.05
   160 oracle@sixcore (DIAG)       0.00       0.00       1.57
    85 oracle@sixcore (SMCO)       0.00       0.00       1.35
     3 oracle@sixcore (LREG)       0.00       0.00       1.22
     4 oracle@sixcore (TT00)       0.00       0.00       1.03
     1 oracle@sixcore (MMAN)       0.00       0.00       1.00
   400 oracle@sixcore (QM02)       0.00       0.00       0.46
   396 oracle@sixcore (TMON)       0.00       0.00       0.44
   242 oracle@sixcore (AQPC)       0.00       0.00       0.35
   161 oracle@sixcore (LG00)       0.00       0.00       0.00
   166 oracle@sixcore (Q003)       0.00       0.00       0.00
   240 oracle@sixcore (LG01)       0.00       0.00       0.00
   244 oracle@sixcore (VKRM)       0.00       0.00       0.00
   317 oracle@sixcore (VKTM)       0.00       0.00       0.00

42 rows selected.

Load: Significant DML Load Resulting In Significant Redo Related Contention

I created a very small buffer cache to encourage physical IO reads plus significant DML activity, started three Oracle sessions, the CPU utilization was around 5% with a CPU run queue between 1 and 2 (6 core box), and the vmstat "wait for IO" statistic was between 15 and 20.
SQL> @cpu_compare

   SID PROGRAM                   SS_SEC DB_CPU_SEC BG_CPU_SEC
------ ---------------------- --------- ---------- ----------
     7 sqlplus@sixcore (TNS V    136.11     136.13       0.00
   162 oracle@sixcore (MMNL)      34.02       0.00      34.44
   320 oracle@sixcore (CJQ0)      15.20       0.00      15.20
   404 sqlplus@sixcore (TNS V      7.91       7.90       0.00
    82 oracle@sixcore (MMON)       5.15       0.00       5.15
    88 sqlplus@sixcore (TNS V      5.07       5.07       0.00
   319 oracle@sixcore (SMON)       1.69       0.00       1.71
    86 oracle@sixcore (W000)       1.32       0.00       1.33
     5 oracle@sixcore (W001)       1.32       0.00       1.32
   322 oracle@sixcore (W003)       0.95       0.00       0.96
   324 oracle@sixcore (W002)       0.91       0.00       0.92
   163 oracle@sixcore (FBDA)       0.57       0.00       0.56
   239 oracle@sixcore (DBRM)       0.52       0.00       2.00
    80 oracle@sixcore (Q002)       0.50       0.00       0.50
   399 oracle@sixcore (RECO)       0.35       0.00       0.36
   245 oracle@sixcore (W004)       0.05       0.00       0.05
    81 oracle@sixcore (CKPT)       0.00       0.00     214.92
   318 oracle@sixcore (DIA0)       0.00       0.00      27.26
   398 oracle@sixcore (DBW0)       0.00       0.00      21.77
   159 oracle@sixcore (PMON)       0.00       0.00       6.46
   238 oracle@sixcore (PSP0)       0.00       0.00       6.40
     2 oracle@sixcore (LGWR)       0.00       0.00       4.27
   397 oracle@sixcore (GEN0)       0.00       0.00       2.05
   160 oracle@sixcore (DIAG)       0.00       0.00       1.59
    85 oracle@sixcore (SMCO)       0.00       0.00       1.37
     3 oracle@sixcore (LREG)       0.00       0.00       1.23
     4 oracle@sixcore (TT00)       0.00       0.00       1.05
     1 oracle@sixcore (MMAN)       0.00       0.00       1.02
   400 oracle@sixcore (QM02)       0.00       0.00       0.47
   396 oracle@sixcore (TMON)       0.00       0.00       0.44
   242 oracle@sixcore (AQPC)       0.00       0.00       0.35
   161 oracle@sixcore (LG00)       0.00       0.00       0.00
   166 oracle@sixcore (Q003)       0.00       0.00       0.00
   167 sqlplus@sixcore (TNS V      0.00       0.02       0.00
   240 oracle@sixcore (LG01)       0.00       0.00       0.00
   244 oracle@sixcore (VKRM)       0.00       0.00       0.00
   317 oracle@sixcore (VKTM)       0.00       0.00       0.00

37 rows selected.

What Does This Mean?

There are a number of "take aways" from this experiment.
  1. Foreground/Server process CPU consumption values match for both v$sesstat and v$sess_time_model. This is not always the case in Oracle Database 10g and 11g.
  2. Foreground/Server and background process CPU consumption continue to be placed in their proper statistical bucket. Foreground/Server process CPU time into DB CPU and the background processes into background cpu time. This was the same in both Oracle Database 10g and 11g.
  3. Some background process's CPU consumption continues to NOT be reported in the instance statistic view v$sesstat. From my perspective this continues to be a bug. But we can deal with this from two perspectives. First, we have the time model view which looks to report correctly for all background processes. Second, background process CPU consumption is usually very small compared to foreground/server process CPU and if it is significant you can easily see this on an operating system monitor, like top.
Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my One-Day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.

Monday, August 26, 2013

Is Oracle 12c Faster Than 11g?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.


I Know That's Not A Fair Question


Is 12c faster than 11g? Yeah, I know that's not a fair question. But I do want to know if 12c can process 1000 buffer gets faster than 11g... don't you?

I want to know how much CPU time is consumed when Oracle processes a single buffer get without virtually any resource competition. Why? Because when a buffer is processed without any wait time and there is no CPU subsystem queuing, the performance situation turns into Oracle kernel code versus Oracle kernel code!

Just to avoid any confusion, I mean the same thing when writing a buffer get, a logical IO, or simply LIO.

Aren't You A Little Curious?


Suppose my experiment shows running the same SQL statement (and execution plan) running on 11g consumes 0.05 ms of CPU time to process a single LIO yet on 12c it only takes 0.03 ms. Then I know Oracle has made some low level kernel code optimization improvements. And if a SQL statement is LIO dependent, without any additional Oracle performance improvements I would expect the SQL statement's elapsed time to be reduced. (related posting HERE)

However, if the situation is reversed and 12c consumes more CPU to process a single buffer in a "no load" experiment, then Oracle had better have some new features to make up for the initial performance "loss." The new features could be a better caching algorithm, new SQL optimization paths, improved memory serialization control, new process architecture options, slick instance parameter settings that I can change, ... or something. But there had better be something.

Why A Buffer Get?


The CPU time to process a single buffer is an important performance metric. On an active system with little CPU subsystem queuing and virtually no Oracle wait time and the same SQL (and execution plan) running, the performance situation boils down to pure Oracle kernel code. Every line of Oracle source code consumes a little CPU. If a new Oracle kernel release requires 100 more lines of kernel code to be executed to process a buffer get, that will be reflected in the CPU required to process a single buffer get.

It's important to understand this is NOT about the SQL, the SQL execution plan, concurrency, the IO subsystem, the operating system, the operating system version, or even the CPU speed. It's about raw Oracle kernel code processing. It's pure and a great way to compare Oracle versions.

Experimental Setup


The experimental setup was very simple. For example, I did not alter the load on the system or force CPU queuing and Oracle wait time. (That's coming though...) I created a buffer cache large enough to buffer all the blocks my SQL would touch and limited the number of concurrent sessions to ensure the CPU utilization was between 30% and 40%. A low CPU utilization reduces the likelihood of CPU queuing and Oracle wait time.

There was no DML, only queries.

I also compared the SQL execution plans for both 11g and 12c to ensure they were identical. They were... once I disabled parallel query in 12c.

I used my OPLOAD TOOL KIT to place a logical IO (i.e., buffer get) load on the system. Before I began collecting data, I recycled the Oracle instance, started the LIO load, let is settle for a few minutes, and using vmstat ensured there was no IO activity and that CPU utilization was always below 40%.

My data collection was very straightforward. I made an initial statistic collection, slept for 3 minutes, made another statistics collection, calculated the deltas, stored the delta data, slept for 5 seconds, and then repeated 90 times.

The three minute sample time reduces the impact of collection overhead, the 5 second chill time reduces the likelihood that sample X collection will impact sample X+1 collection, and finally, the 90 samples gives me a good statistical picture of reality.

I also turned off AWR collection, the result cache, and the resource manager. For 12c I checked to ensure threaded execution was set to false. It turns out that on my system, I was unable to get the instance parameter threaded_execution to TRUE. Even if it was set to TRUE in the instance parameter file, after the instance started it was always FALSE. Perhaps I don't have thread capability on my system?

The data collection script is well commented and can be seen HERE. You can see all 11g data HERE and all the 12c data HERE. Included is each sample is the sample's duration, total buffer gets processed, total non-idle wait time, and total CPU consumption. While in this posting I only discuss the CPU consumption and buffer gets processed, those of you who have taken my Advanced Oracle Performance Analysis will know what to do with the other stats!

The data source for CPU consumption was from the v$sys_time_model view's DB CPU and background cpu time columns. For buffer gets the view was v$sysstat and column session logical reads. It is then a simple matter to get the average CPU per buffer get by dividing the total CPU consumption by total buffer gets. The file containing just the CPU per LIO samples for 11g is HERE and for 12c HERE. These two CPU/LIO files are what I contrast in this experiment.

The six core Linux box details:

[oracle@sixcore ~]$ uname -a
Linux sixcore 2.6.32-300.3.1.el6uek.x86_64 #1 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

The 12c details:

[oracle@sixcore ~]$ sqlplus system/manager

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 16 09:53:46 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Tue Aug 13 2013 18:08:24 -07:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

The 11g details:

[oracle@sixcore ~]$ sqlplus system/manager

SQL*Plus: Release 11.2.0.1.0 Production on Fri Aug 16 09:54:25 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Now on to the results...

The Experimental Results


To summarize, the 12c average CPU consumed per buffer get was around 11% less compared to 11g. Honestly, I didn't know what to expect. Why? Because with each new release Oracle puts more and more features into their product, so it must be extremely difficult to reduce the CPU consumption to process a little tiny buffer get! Even keeping the CPU consumption per buffer get the same is an amazing accomplishment!

The free statistical package R was used to perform the statistical analysis. The raw R output is HERE and the R script is HERE.

Here are the results:

version     average      median     p-value   samples
         (ms CPU/LIO) (ms CPU/LIO)

11g        0.004725     0.004725     0.7029     90
12c        0.004420     0.004408     0.0000     90

Numerically we can see there is a difference between the 11g and 12c average CPU per LIO. But this does not mean anything! The key to understanding the results is to check if there is a statistically significant difference between 11g and 12c, not if there is a numeric difference. The statistical analysis is in the next section.

Also, this experiment is not about how much CPU time is involved... it's about the comparison. For example, suppose 11c consumes 0.05 ms of CPU time per LIO. This experiment is not about the 0.05 because that depends on a number of other factors I don't care about in this experiment, such as the CPU speed, the operation system, SQL execution plan, etc.

This experiment is about the difference in CPU consumption per LIO between two Oracle versions... period.

Is 12c Really Faster Than 11g?


Just because, in my experiment, the average and median CPU per LIO for 12c is less than in 11g, it does not mean 12c is noticeably faster than 11g. First we need to see if there is a statistical difference and then second consider the real-life impact.

Because the 12c samples are not normally distributed (p-value < 0.05) a sample t-test is not appropriate when comparing the two sample sets (11g and 12c CPU/LIO samples). We need to perform a location test. R contains an appropriate test, called the "wilcox" test.

Note: My appologies (not really) for the simplicity of this: For this experiment, a p-value of 1.0 means the sample sets are the same, a p-value greater than 0.050 means the sample sets are statistically the same, a p-value less than 0.050 means the sample sets are statistically different, and a p-value of 0.0000 means the sample sets are very different.

The result of the wilcox test is a p-value of 0.0000. Since the wilcox p-value is 0.0000, we can say the sample sets are statistically different... which means, in my test, the CPU per LIO for 12c is statistically speaking significantly less than 11g.

Let's take a look at the two samples using a "smoothed" histogram.

The blue histogram on the left is 12c and the red histogram on the right is 11g. Clearly, the two sample sets look very different. And statistical significance tests say they are different. So for my experiment:
Statistically speaking 12c is faster at processing buffer gets than 11g.
But will we and our users notice the difference?

Will Anyone Notice The Difference?


While in my test 12c consumed a statistically significant 11% less CPU per buffer get compared to 11g, will we notice or feel a difference? Let's look at this from two perspectives. First an elapsed time perspective and second a throughput perspective.

Elapsed Time Perspective

Suppose a query touched 100,000 buffers. Based on my no-load test 11g would complete the query in around 472.5 ms, that is 100,000 lio * 0.004725 ms/lio. In 12c the query would complete in around 442.0 ms, that is 10000 lio * 0.004420 ms/lio. If you could feel the difference, I recommend reducing your caffeine intake!

However, as the CPU workload increases the queuing impact would begin sooner in 11g than in 12c. (The next section explains why.) When this occurs, the time to process a single buffer contains both CPU time and queue time (which we would see as non-idle wait time) AND if a lot of buffers are touched in a SQL statement users may begin to feel the impact. But on a no-load system, I can't imagine anyone could notice the difference.

Throughput Perspective

Let's change our perspective from a single SQL elapsed time perspective to a system-wide throughput perspective. Consuming less CPU to process a LIO means more LIOs can be processed before queuing sets in and slows down performance. To give you a visual picture of this, look at the below response time chart.

Note: You can download the spreadsheet for the below chart HERE and the actual raw tool at the OraPub website HERE. I teach how to do this kind of analysis in my Advanced Oracle Performance Analysis class (always instructor led by me).

Above, the blue line represents the 11g system and the orange line represents the 12c system. The orange line crosses the Y-axis below the blue line because it takes less CPU time to process a single piece of work AND when there is no load (a near zero Arrival Rate) there is no queueing... only CPU consumption!

Here's an example shown by the circles on the above chart. If the 11c workload was at 1150 LIO/ms, the 12c workload could increase to around 1350 LIO/ms before the 12c response time equaled the 11g response time. So the 11% decrease in the 12c CPU per LIO translated into a 19% increase in workload before the 12c response time matched 11c response time (response time is the CPU time plus wait time per LIO). What does this mean? In my experiment:
The CPU subsystem can process more work running 12c than 11g before performance begins to degrade.
So we're off to a good start with 12c and I can't wait to do more exploring!

Thanks for reading!

Craig.

Monday, July 1, 2013

Hmm... You Oracle Database Heretic!

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.


Is it OK to be an Oracle Database heretic? 


Oracle Database performance tuning and analysis has come a long way in the last 20 years. First there was the “just add more resources” approach and tuning the blatantly poor SQL. Then there was ratio analysis, followed by wait event analysis, time based analysis, and unit of work time based analysis.  (Consider searching my blog for "unit of work".) In addition to the performance diagnosis and analysis evolution, the Oracle Database as a product has changed, and architectures are more diverse. Yet with all this change, some things are in many ways timeless. They relate to complexity, basic mathematical statistics, efficiency, and doctrinal purity. Over these past few weeks, I posted three different "myths." I have recently posted about complexity (better tools)basic mathematical statistics (average SQL elapsed times) and a cloud vendor's desire for inefficient systems. This post centers on breaking fundamental Oracle Database principles in the name of improved performance.

Myth #4


The fourth myth is It is heresy to consider the impact of deviating from core relational database principles. How did James Tiberius Kirk beat the Kobayashi Maru? He changed the rules. To beat the performance game, Oracle Database sometimes allows performance analysts to change the rules as well. While this is relational heresy to some, the freedom is intoxicating!

Here is an example of what I’m talking about: Would you seriously consider allowing a commit statement to immediately return as successful when the associated redo has not been written to disk?

If you have been working with Oracle databases for many years, there are probably many concepts that we will naturally agree or disagree on … even without really thinking about it. For sure, I am a big fan of simplicity and choosing consistent performance over the hope of amazing yet inconsistent performance. But sometimes there are ways to have your cake and eat it too.

Back to my example about commits: Oracle’s commit write facility provides the ability for a commit to quickly return even though the redo has not been written to an on-line redo log. The have-your-cake aspect is that this can be applied at the instance and session—and even at the statement—level. However, committed data could be lost if a failure occurs.

A few years ago, while introducing this “feature” in my Oracle Performance Firefighting class, a student proudly announced that her company uses the commit write facility. The other students were shocked and honestly appalled. I was smiling like a Cheshire cat! She went on to explain that their application was about social networking. She posed the question, “If you are adding people to your social network and during the save-your-work process there is a failure, what would you do?” Everyone replied, “Just re-enter the information.” She replied, “Exactly! So why would a company want to spend literally millions more on hardware in the unlikely event of a failure when the business solution impact is to simply have the users re-enter a few email addresses?” There was silence and then a big smile on everyone’s face!

My point is that it may be OK to break some fundamental rules as long as the business and its users are not negatively affected. But it takes a brave person to face the immediate “you're a heretic” response. So the next time you’re facing a serious performance issue, think about a few possible ways to break the rules.

Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my One-Day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net.




Thursday, June 20, 2013

Hmm... Cloud Vendors Want Efficient Systems

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Of course they do. Right?


Oracle Database performance tuning and analysis has come a long way in the last 20 years. First there was the “just add more resources” approach and tuning the blatantly poor SQL. Then there was ratio analysis, followed by wait event analysis, time based analysis, and unit of work time based analysis. In addition to the performance diagnosis and analysis evolution, the Oracle Database as a product has changed, and architectures are more diverse. Yet with all this change, some things are in many ways timeless. They relate to complexity, basic mathematical statistics, efficiency, and doctrinal purity. Over the next few weeks, I'll post four different "myths." I have recently posted about complexity (better tools) and basic mathematical statistics (average SQL elapsed times). This posting centers on a cloud vendor's desire for efficiency.

Myth #3


The third myth is Infrastructure-as-a-Service (think: cloud) vendors want efficient systems. Fact: A cloud vendor generates revenue when they charge you for CPU cycles, I/O operations, IO space or network activity. Therefore, cloud vendors who charge you for computing resources want you to consume more CPU cycles, perform more I/O operations, consume more disk space, and send/receive more network packets. If that doesn’t convince you, think about the marketing messages promising that all the computing resources you will ever need will be easily available.

The solution to “buy more hardware” has become very, very easy. There are no additional contracts; no one to call to install more hardware; no approvals, purchase orders, or approval chain of command; and on and on. This is a vendor’s dream!

The message here is “Eat more and be happy, for tomorrow we die,” not “Eat healthy and prosper” (or something like that).

But there is good news! Now performance specialists can more easily quantify their work monetarily. Instead of saying the business process dropped from 2 hours to 2 minutes, we can also say we have saved the company $2 million each year!

How can we make such an outlandish statement? It’s because the quantitative performance analyst knows how much less CPU time will be consumed and how many fewer I/O operations and network transfers are required. And the analyst knows how much CPU, I/O, and network resources cost! (If you don’t know how much they cost, then ask.)

This is yet another reason to get good—very good—at quantitative Oracle performance analysis! We need to understand that it’s now very easy to acquire more computing capacity, so the pressure and priority to optimize can be allowed to decrease ... but tomorrow we die. Focus on the ability to quantify our performance work monetarily, and the priorities will more likely align properly.

Now for the commercial: Where can you get the best quantitative Oracle database performance analysis training in the world? OraPub of course and here is the link to get started: www.orapub.com. (Not just Oracle database performance tuning training, but quantitative Oracle database performance analysis training.)

Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my One-Day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net.



Tuesday, May 28, 2013

Hmm... Users Experience the Average SQL Elapsed Time

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

You're joking. Right?


Oracle Database performance tuning and analysis has come a long way in the last 20 years. First there was the “just add more resources” approach and tuning the blatantly poor SQL. Then there was ratio analysis, followed by wait event analysis, time based analysis, and unit of work time based analysis. In addition to the performance diagnosis and analysis evolution, the Oracle Database as a product has changed, and architectures are more diverse. Yet with all this change, some things are in many ways timeless. They relate to complexity, basic mathematical statistics, efficiency, and doctrinal purity. Over the next few weeks, I'll post four different "myths." Last week I posted about increased complexity. This posting centers on a user's experience.

Myth #2

The second myth is Users experience the average SQL elapsed time. Concurrency, multiple execution plans, different bind variables, and the current cache situation make the average elapsed time less relevant—and perhaps even misleading. Let me explain.

If I tell someone the average elapsed time for their key SQL statement is 10 seconds, 99.9% of the time they will picture in their minds a bell curve. They will think to themselves, “OK. This means that usually the statement runs for about 10 seconds, sometimes less and sometimes more.” Unless verified, my research clearly shows that there is likely to be a significant difference between the average and the typical elapsed time(s). This means we are missetting expectations and flat-out misleading users. Not the place we want to be!

It is very simple to calculate the average SQL statement elapsed time. Even a Statspack report will show you the total elapsed time and the total number of executions over the report interval. The average elapsed time is simply the total elapsed time divided by the total number of executions. So it’s very enticing to make a quick statement about the elapsed time using the average.

Now suppose this SQL statement has two execution plans: one typically completes in 2 seconds and the other completes in 18 seconds. Now also assume that they both get executed the same number of times (don’t count on this in your environment!). The average elapsed time would then be 10 seconds. Now I picture myself telling my client that I had discovered the key SQL statement and its average elapsed time is 10 seconds. The “2 seconds” group would think I’m making the situation look worse than it is so I can claim an amazing performance improvement. The “12 seconds” group would think I’m simply an idiot. Either way, I lose. So I needed a way to find out the truth.

What I learned through my research is that if the statement is important to the users and I care about not misleading them, I need to collect some data, calculate the statistics, and create a histogram. I even blogged about the problem here and and provided solutions here! My research clearly showed three reliable ways to collect elapsed times: Oracle tracing by SQL_ID, instrumenting application code, and sampling running SQL. I have created a low overhead tool called “SQL Sampler” to sample SQL statement elapsed times. You can download it for free from my website here. The moral of this story is Do not expect users to experience average elapsed times, and if appropriate, provide details about the typical elapsed times.

Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my One-Day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net.