If you have worked in IT long enough then it is hard to miss the acronym "AWR". AWR is short for Automatic Workload Repository report and is probably the first word out of a DBA's mouth at the mention of performance problems in your application. If you are like most people then your head would start spinning when you perchance happen to glance the report. You are not alone, most DBAs don't understand 90% of what is in the report and how to make sense of it. Most times the DBAs tend to look at such reports with a preconceived bias since they are looking for patterns most are familiar with like full table scans, too much CPU use or too much disk I/O, etc and then lean the findings accordingly. So what is a layman with reasonable intelligence to do when you see the report and how does one validate what the DBA is saying.
So here goes....Before we do anything a little history... AWR is the Pièce de résistance of what is called as Oracle Wait Interface (OWI), one of the features that sets Oracle apart from the other databases. So while evolving the Oracle engine over the many years, Oracle realized the importance of measuring every touch point of a SQL as it progress through the Oracle RDBMS engine. The OWI was the result, this was initially very cumbersome to read, analyze and diagnose. As releases of Oracle have come and gone they have fine tuned the OWI such that today it produces a neat report (default every hour) recording all activities in the database and capturing every wait event the SQLs were subjected to. No special switch or extra software is required since Oracle 10g onwards the AWR is ready to go out of the box. The DBA can control the frequency of the report generation based on need and you also control the retention period of the records so that you can go back in time if needed.
Ok coming back to reading the AWR, the first thing you want to make sure is - If the issue is really caused by the DB? To do this the best thing to do is to glance at the DB Time which is reported at the very start of the report.
At the very bottom I have culled out 4 tables from the numerous that you would encounter in an AWR report to illustrate how you can make a fairly good inference based on glancing a few key data points instead of getting intimidated by the sea of data in an AWR report. We will refer to this data below for our analysis.
Let's start with the first table. Looking at the 180 mins of elapsed time (meaning this report is for 3 hrs), the application is roughly spending 320 mins on the DB. What this implies is that roughly 320/180 = 1.8 DB Seconds is being spent for every elapsed second. Confusing? In a DB there are thousands of transactions at any given second and servers have more than one CPU so multiple transactions can run in parallel. For example if we ran 2 transactions a second, the DB Time would be 2 seconds, 10 transactions in a second implies 10 DB seconds and so on. Which is why you see DB Time being more than the wall clock, in our case 320 DB Minutes in 180 wall clock minutes. DB Time is the total time spent by sessions in the DB doing active work which include time spent on CPU, I/O and other waits. Consequently the higher the DB Time for a given hour for example, higher the load on the DB. So for a 60 min period if you saw the DB Time as 600 mins, then that implies a busier DB because you are executing more transaction concurrently in a given minute.
Now let's move on to the second table. Here if you look at DB time spent in a second, you will see that it is 1.8 DB seconds, meaning on avg, there are about 1.8 sessions active in the database doing real work. For example in our case DB Time of 320, divided by wall clock of 180 mins give you roughly 1.8 sessions active sessions per second. The higher the number of active sessions in a given second the more the load on the DB.
To cross check, search for "user commits" in the report or Table 3 below.
So in the 3 hour period we had about 12000 transactions, this times the 1.6 DB seconds per transaction (column 3 of Table 2) will give you back the 320 DB mins spent by the DB executing SQLs. Obviously you want the DB Time spent per transaction to be as small as possible.
Now, we have to see if we can break down this DB Time into its components, how is this time distributed, meaning how many seconds did the SQL spend executing on the CPU, doing I/O or waiting for a lock (enqueues, latches etc are too complicated for now, just imagine them all as being similar to locks primarily use to control concurrency to common objects like tables, rows, etc). I am also excluding interconnect latency, network etc from our discussion for now.
First search for "Top 5 Timed Foreground Events" in the report or look at Table 4 below. Now, look at the % DB Time column, pay attention to those that have a higher value for this column since these are the prime drivers of DB Time. In the above example you can see that almost 40+28=68% of DB Time is consumed by the 2 top events. Both of these are I/O related. So now at least you know where to look, are your SQLs returning too many rows, is the I/O response pretty bad on the server, is DB not sized to cache enough result sets, etc.
The 3rd row in Table 4 indicates 19% of DB Time is spent on row locks, meaning you have sessions wanting to change same set of rows but cannot do so all at once until the holder of the lock doing the change finishes. This indicates a code problem, check for unnecessary access to same rows or single row table to implement serialization, usually applications at the start of transaction update a master table or something and then go do a bunch of stuff before coming back and committing or rolling back the update on the master table. In apps that have a lot of sessions this will cause a backup of waiting sessions because the locks are not released fast enough, eventually your apps server will run out of connection threads and the whole thing stops.
Now, the 4th row in Table 4, DB CPU is critical, in CPU bound databases you will see this as the top event. There is a very easy way to see how much CPU is used by the DB. DB CPU was about 2335s or 39 mins for the whole 3 hours. So 39 mins out of a total DB Time of 320 mins is only 12% and now we can conclude that in our example above most of our DB Time is spent doing I/O.
Another interesting tidbit is this, look for "Host CPU" in the report to look for the number of CPUs on the DB server:
Host CPU (CPUs: 6 Cores: 3 Sockets: )
So we have 6 cores, meaning in a 60 min hour we have 60 X 6 = 360 CPU mins, so for 3 hours we have 1080 CPU mins and we used only 39 CPU mins, meaning only 39/1080 = 3.6% of all available CPU on the box! Tiny indeed! If you had a CPU bound DB, you would probably see DB CPU more like 900 - 1000 mins, and that is not a good sign. Usually indicates contention for latches or you have SQLs doing too many logical I/Os or lot of parsing due to the application not using bind variables, etc. More on these later but at the very least I hope this write-up gives you the ability to quickly look at a few data points and infer what is ailing performance of your database.
Steve 2/9/2013 03:07:12 am
This explanation is so cool! Thanks, at least now I know what to look for before my team starts the blame game!
Valentim Negrellos 5/27/2013 03:54:02 am
Thanks for sharing this very good post about AWR report. I think this post is a very good point to start understand the AWR report. Did you write more about this subject?
Manoj 5/27/2013 04:16:29 am
Hi Valentim,
Thanks for your comments. The AWR analysis can become so voluminous to write about that I have deliberately tried to keep it very simple. I will add more to this as I go on.
Manoj
Great Help for those who wants to start analyzing the reports
Alex 7/23/2013 11:53:05 pm
Indeed a simple, quick and comprehensive post for locating the key points of a AWR in a glance!
Yogesh 7/25/2013 10:55:09 pm
Where are the tables that is been referenced here? I can't seem to find it.
Manoj 7/25/2013 10:59:51 pm
Hi Yogesh, it is write below the blog. If you cannot see it may be because of the browser. I have had the same issues before. I use Firefox and I can see it.
Yogesh 7/25/2013 11:37:33 pm
Thanks Manoj! Good work...looking forward for the next bits...
kuntal 8/19/2013 06:49:12 pm
Thanks Manoj. This is very helpful.
You made it very simple to understand first 2 tables in AWR on which remaining analysis of report depends.Thanks for sharing the knowledge!! 8/23/2013 04:22:00 pm
Swapnil 8/23/2013 04:23:31 pm
You made it very simple to understand first 2 tables in AWR on which remaining analysis of report depends.Thanks for sharing the knowledge!!
Aijaz9/12/2013 02:10:56 am
Awsome explanation about AWR report....Thank you very much...
Siva 9/18/2013 03:25:57 pm
Thanks Manoj for your wonderful explanation on awr report.
Arvind 11/13/2013 05:45:00 pm
Thanks Manoj for wonderful article. And must say that it gives much confidence to dig AWR report at much depper level.
vijaya 12/2/2013 03:17:02 pm
very nice thank you for this post
monto 12/20/2013 01:11:10 pm
Awesome explaination.
Sabyasanchi 1/14/2014 01:53:04 pm
Hi I am not able to see the table in any of the browser, is there any way i can see those table as they will be great help in understanding of your explanation.
Manoj 1/14/2014 10:17:00 pm
Sabyasanchi, Can you see if you view the tables using another browser like Firefox etc, sometimes the older versions may have issues rendering the pages. Let me know.
Ram Limbu 2/9/2014 05:13:51 pm
Great explanation with examples.I need explanations for more categories.
Ravi 8/5/2014 12:33:35 pm
Hi,
This is a very good explanation of the AWR reports, but unfortunately i was not able to see the tables at the end of the blog like a couple of the readers. I have used the firefox, chrome, and IE but no luck. Is it possible to send the article by mail. Or even if you can tell us the heading of the table we can narrow it and correlate the article to those tables from our own AWR reports.
Thanks once again for your efforts and yeah u made a layman a little more intelligent.
Manoj 8/25/2014 12:09:37 pm
You are right, the tables are indeed missing. I will try to put those back.
Thanks, Manoj
Roger 9/24/2014 10:51:51 am
Any idea when these missing tables will be put back?
(or are you able to email them to me?)
As the article makes reference to the tables, trying to read it with the context.
Thanks.
Vinod Kumar 8/26/2014 11:50:58 pm
A great article for AWR analyses .
Manoj 10/14/2014 09:55:08 am
I have updated the tables, I hope you all can see it.
Gokul 3/4/2015 08:03:39 am
Very simple and no nonsense explanation,still too early to grasp everything but at least the overview i got was excellent.
Gboly 11/11/2015 12:00:01 pm
Where are the tables??
Manoj Appully 11/15/2015 10:43:22 am
The table is at the very end. I am not sure why some browsers are not showing.
Table 1
A SAMPLE SNIPPET OF AN ORACLE AWR REPORT
Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 21688 25-Jan-13 01:00:00 135 41.5 End Snap: 21706 25-Jan-13 04:00:00 136 41.2 Elapsed: 180.00 (mins) DB Time: 319.96 (mins)
Table 2
Per Second Per Transaction Per Exec Per Call
DB Time(s): 1.8 1.6 0.08 0.01 DB CPU(s): 0.2 0.2 0.01 0.00
Table 3
Statistic Total per Second per Trans user commits 12,076 1.12 1.00
Table 4
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
db file sequential read 13,930,462 7,574 1 39.45 User I/O direct path read 221,978 5,336 24 27.80 User I/O enq: TX - row lock contention 64,519 3,627 56 18.89 Application DB CPU 2,335 12.16 SQL*Net more data to client 4,989,199 191 0 0.99 Network
Samuel 3/1/2016 01:26:29 am
Nice write up...keep up the good work.
Timothy 5/14/2016 09:14:00 am
Awesome explanation where simplification lies in each and every aspect of article.
JItendra sharma 7/1/2016 11:03:23 pm
Excellent explanation ...
MN 10/13/2017 02:20:23 pm
Very Good explanation