More often than naught, I come across application managers and developers grappling to figure out what is causing the CPU to spike on their Java application servers. This is not that difficult to nail down, depending on the operating system your JVM is running on. In a nutshell the approach to finding out who the CPU hog is in your JVM requires 3 steps regardless of the OS. 1. Identify the process id of the JVM (java) process that is hogging all the CPU. Usually this is the topmost process if indeed Java is the main CPU consumer.
2. Find which thread in the process id identified in step 1 is on the top consuming CPU.
3. Convert the thread number in step 2 to hexadecimal and search for the same in a thread dump of the Java process, and viola you can now see the exact method and class that is causing the CPU to spike.
Let's get into some specifics. In the case of Sun Solaris or AIX being your OS you can do the following: Run a top (or topas in AIX) or prstat and confirm that you do see java as the top most CPU hogging process. You would see something like below:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
15600 oracle 232M 137M cpu2 0 10 51:19:24 78% java/54
1. The number 54 in java/54 is actually the number of threads the process with pid 15600 has. Watch the top or prstat output for a few minutes to see if there is a single process or a couple of processes that appear frequently in the output. Note down the PID which here is 15600 above.
2. You can then run prstat -mL -p 15600 and actually watch which threads within the process are making the top list most of the time. You may see something like below:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
15600 oracle 232M 137M cpu2 0 10 51:19:24 78% java/76
15600 oracle 232M 137M cpu2 0 10 10:05:18 5% java/84
Assume for now it is 76 as indicated in the LWPID column.
3. Next, run a kill -3 15600. The command kill -3 will take a thread dump of PID 15600 and will not kill your java process. But be careful to make sure youtake the thread dump with kill -3 when you have some CPU cycles tospare, if you are 100% maxed out on CPU then the kill -3 may not complete.
4. Now, open the thread dump output of kill -3 15600 (usually your default Java log directory and may vary) and search for the hexadecimal equivalent of your nid 76 which you found in step 2, this would be 0x4c, you can usually find a nid=0x4c in the thread dump. Look at the stack and you will see the exact method or location that is causing CPU consumption.
There are other round about methods like using pstack etc to get to the same information but the above is easier.
If you are running your Java application on Linux then prstat will not work, in this case first run top to confirm the PID of the Java process as in step 1 above. Next, being in top press Shift+h, this will bring up all the threads. Note down the PID on this screen, this will be your native thread id or nid. Convert this nid to hex equivalent and then search for the same in the thread dump as outlined in step 4. Or you can also run ps -eLF in Linux and look for the LWP column.
Alf 4/7/2013 11:45:06 pm
Great advice. Please notice that NLWP in the output of prstat is not the LWP id, but the total number of LWPs that the process has. You have to get the LWPid from the output of prstat -p pid -Lm to perform steps 3 to 4
Manoj Appully 4/8/2013 01:40:52 am
Alf, thanks for noticing that. You are absolutely correct in what you pointed out. I have made the corrections. Thanks!
magento website developerlink 6/16/2013 06:18:27 pm
Firstly I would like to congratulate you that you dared to write on this serious and significant topic related with "Java application". Secondly, your writing has an ease that clearly express how good a writer you are. You know how to involve a reader and increase his curiosity to read more.
Manoj Appully 6/19/2013 11:25:12 am
Thanks for the comments, one of our goals is to take the tech mumbo jumbo to a minimum!