top of page
Writer's pictureManoj Appully

A Layman's guide to identifying expensive SQL in Oracle

Updated: Jul 29, 2019

Most often than naught you may find yourself wondering what is causing all the latency in the application. You quickly decide without any evidence whatsoever that since nothing has changed (at least that is what you believe!) it must all be the DBA and his DB's handiwork. So you decide to march down the aisle to the DBA cube and demand an explanation. And most often than naught you may find yourself facing a surprised DBA who steadfastly stands his ground saying nothing is wrong with the database. So what do you do? Well, follow me then, this is continuation of the previous blog about trying to understand AWR.

If you suspect the database as a prime contributor of latency then you are not in the minority especially if you are an application developer. This said it does not mean you are correct in finger pointing at the DB. The DB is only a messenger in most cases, all it can do is react to the way you write your application logic, SQL and load. And you may be surprised to learn that most often it is a poorly written application SQL that is to be blamed. So before you can march to the DBA to demand an explanation you may want to do your homework.

Here is a simple method to find out offensive SQL and without looking too geeky by using just two Oracle provided scripts: awrrpt.sql & awrsqrpt.sql

Firstly start with a clock, identify the time periods when you noticed latency or someone complained about slowness. Then ask the DBA to provide you the AWR during that time period, it is always better to ask for AWRs that are taken in 30 min interval or 1 hr interval as opposed to getting one gigantic AWR that covers say 3-hrs. (You can run AWR report on Unix/Linux if you have access by running SQL *Plus and logging in as a user who has access to the AWR stored procedures. From the SQL *Plus command line create the AWR report by calling the SQL script - $ORACLE_HOME/rdbms/admin/awrrpt.sql; All AWR scripts are stored in $ORACLE_HOME/rdbms/admin)

Once you get the AWR pay attention to DB Time and DB CPU. If the DB Time is inordinately high compared to Elapsed time then you know there is something going on because the DB is busy. Anyway my previous blog about AWR should help.

I have culled some important sections from AWR reports and shown below. Jump to the SQL ordered by Elapsed Time section (Table 1 below) in the AWR report, this gives you SQLs that are taking a long time to complete . Here pay attention to the columns Executions and Elapsed Time Per Exec. You want to see if there are SQLs that are executed a large number of times whose Elapsed Time Per Exec is high. Just a single SQL running  a couple of times for an hour is not what you want to concentrate on but maybe a candidate for tuning in general. I am more interested in finding SQLs that are executed many times and consumes a lot of resources  like CPU or I/O, in essence concentrate on the longest pole in the tent. If a SQL is executed several hundred times in a 30-min or 60-min AWR and if the Elapsed Time Per Exec is close to second or several seconds then you have your culprit SQL. Usually frequently executed SQLs should take less than 10-20ms to complete if they are good in an OLTP system.

In our case from Table 1, we see that SQL with SQL ID cf6jd7kan5y25 has run around 18K times with an elapsed time of nearly 20s, ouch!! This was an AWR taken for a 30 min period, and if you look at % Total column, this sucker is taking almost 99% of the DB activity.

So to find what is wrong with this SQL we have to look deeper to see its plan and statistics. There are many methods, but one easy was is to run another SQL script found under $ORACLE_HOME/rdbms/admin (on UNIX/Linux) called awrsqrpt.sql. This guy is a gem of a script. You can run this just like you run awrrpt.sql that generated the AWR report, the only difference being that it will ask you for a SQL ID, and in our case it is cf6jd7kan5y25. The awrsqrpt.sql is useful because it mines the AWR repository in the database and can show you how a single SQL behaved at various times. It gives you 3 useful things: Number of Plans a SQL has, Runtime Statistics of the SQL and the Explain Plan for each Plan that the SQL has.

Now look at Table 2, it is just a summary of the SQL, with full SQL text and total elapsed time for the period. Remember that awrsqrpt.sql like awrrpt.sql asks you for the time period you are interested in.

Table 3 is more interesting, this shows that this SQL has 6 different SQL execution plans (also called Plan Hash Values or PHV). Don't be surprised. A SQL can have multiple execution plans based on many conditions like values passed in binds (bind peeking... more later on this), type of cursor sharing, permissions, etc. Unfortunately a bad SQL execution plan can also be in this list. In our case from Table 2 we see that the SQL has executed only 1 execution plan and that is 355709903 for the 30-min period we took this from. In many cases if you extend your snapshot window you can see that the SQL has executed different plans at various times. What you need to be mindful is to figure out the good execution plan from the 6 we see and check if by some chance the SQL flipped the plan and chose a bad plan and is causing us grief.

Table 4 gives you the SQL statistics, like Elapsed Time, CPU, Buffer gets, etc. I am only interested in the Per Execution column, here we see that Elapsed Time per execution is way too high, CPU is relatively low but still high, leading me to believe there is some I/O contention or some other concurrency issue like latches, buffer busy etc. If Elapsed Time per Exec is close to CPU Time per Exec then we know the SQL is CPU bound and doing mostly buffer reads from memory as opposed to I/O. Look also at Buffer gets per Exec, even though it looks low, given the number of executions these add up. Our goal is to choose a plan for this SQL that has the lowest buffer gets per execution and has the lowest elapsed & CPU time.

Table 5 gives you the actual SQL Plan for this particular Plan Hash Values which is 355709903.

The report will similarly list SQL Statistics and SQL Execution Plan for every one of the 6 plans we saw in Table 3. I have listed another set for Plan Hash value 157698868 which is the second plan in Table 3.

Once you are armed with this kind of data you can then confidently walk to the DBA and have an intelligent conversation. Your goal should be to make sure the SQL is picking the right plan. If the SQL has only one plan then look at the Explain Plan, is it using the right indexes, right join orders, etc and see if you can change the SQL. If there are multiple plans then see which plan is the best for this SQL, usually you will know when you look at the Explain Plans in the report or the Plan statistics that may show various buffer gets per exec for each plan, if indeed it has executed many plans during the course of the time period you have chosen. In this example I showed you, the SQL was indeed using the wrong plan and we could force it to use the right plan by using SQL Profile. More on SQL Profile in the next blog. Once we applied a SQL Profile the execution times came down to less than 10ms and buffer gets per execution went down to 45 from 2400!









Krishna 4 /14/2015 06:55:28 am

Could you please send this document on my email? I am unable to access this publishing on scribd.


Vivek Ranka 8/19/2016 04:23:52 am

This is an awesome troubleshooting guide for an application administrator.!! Thanks..!!

Reply

40 views0 comments

Recent Posts

See All

RAC and Index Contentions

If your application happens to be one where surrogate keys are used in abundance and if some of those tables are inserted/updated very...

Comments


bottom of page