In my previous blog about interpreting AWR reports in Oracle, I had only skimmed the surface, enough to give any one a quick idea of what is happening inside their Oracle database. While it is hard even for the truly tested experienced ace DBAs to really understand all the mumbo-jumbo in an AWR report, it is enough if one can concentrate on the real symptoms and not get drowned in the noise which is the AWR report. So here I will briefly take up the most common Wait Events seen in many Oracle databases - I/O waits. A database and I/O are like twin cousins - inseparable! The three common Wait Events that you will in the Top 5 Wait Events section in the AWR report are:- 1. db file scattered read 2. db file sequential read 3. log file sync Let's take a deeper look at each without getting too technical. 1. db file scattered read - Even though the name is confusing, this event essentially means you have full table scans going on in your database and that is not a good thing. The only "scattered" part about doing a full table scan is the aspect of reading all rows in the table that may be "scattered" all over the disk. While the AWR does not directly tell you which SQL is doing the full table scan, you can use some of the techniques in the previous blog I had written. One has to worry about full table scans when this event is in the Top 5 Wait Events and more importantly is taking a large part of the %DB Time column. So what do you do? Many things but the most easiest would be to take a look at the SQL that is doing the full table scan, change it if you can, add an index, collect statistics or add a SQL Profile or reduce the amount of rows in a table by archiving them. 2. db file sequential read - This is more common than full table scans and is trickier to fix. This Wait Event essentially means the database is doing single block reads of an index or a table that is being accessed by an index. One can ask what is wrong with an index access and that is why this guy is not as innocent as he seems.So for all practical purposes index access is usually a good thing but many times people create so many indexes on a table that many indexes have very poor selectivity. What I mean by poor selectivity is that the if you had to reach a particular entry in a poorly selective index you will have to read a whole bunch of data in the index before you reach what you want. This means a ton of I/O by the database before it can find the entry it wants. Imagine trying to find a word in a book whose index is more than 5000 pages. Same thing here, wrong index created, bad statistics influencing wrong index selection, and bad consequences. So what do you do? Same thing, find the SQLs that are doing these, change them if you can so that you can put extra conditions in the WHERE clause, drop poorly selective indexes and create better ones by choosing the right columns to index on, use partitioning if table is very large and make sure you are collecting statistics. Rebuilding indexes may provide only temporary relief, in fact it may be better to recreate the table by ordering data as the columns of the index. 3. log file sync - This wait is also common and is caused by excessive time taken by the log writer process in writing committed transactions to the redo log. This Wait Event has many sub-parts and one of them is the 'log file parallel write' which is just the pure I/O time taken to write redo records from the log buffer in memory to the disk where redo files are kept. You can search for 'log file parallel write' in an AWR report and compare the times per second or transaction for this and the 'log file sync' event, if they are similar and if they exhibit several milli seconds (say more than 20ms) per write then you may have a poorly performing I/O subsystem and may want your storage guru to take a look. The other cause that keeps log writers busy are applications that do a lot of single row commits, what I mean by this is that the application inserts/updates/deletes a single row and issues a commit, this is not a problem in most cases but if all users end up doing this on a table or set of few tables millions of times then these commits will keep coming faster than the log writer can keep up with. So if this is the case then you may want to check with your application architect.
Thanks for this article - very useful, along with the first part. Cheers.
Comments