I always get asked this question by my clients: What exactly do I need to monitor in my application, when it comes to performance and availability ?
Thanks (?!) to technologies like byte code instrumentation and JMX, you can literally have hundreds of thousands of Metrics coming from a single Java application. One can easily get overwhelmed by the amount of information that you can capture from a running JVM. The fact of the matter is, you don’t need to monitor the hundreds of thousands of metrics.
Monitoring Vs Diagnostics
Don’t get me wrong. In order to monitor the availability and performance, you don’t need to monitor every little detail in your application. However, when it comes to diagnostics where you are trying to uncover a problem in your application, you may indeed need lot more metrics than you do for monitoring. What I’m discussing here is only applicable for monitoring, which in most of the cases is as critical as diagnostics if not more.
Without further due, here are the top 5 Metrics you must monitor.
1. Process availability
That’s right. It may seem obvious. But I have seen several times that while my customers monitor tons of metrics that are actually useless, they miss the all-important metric to monitor – the availability of the process.
What is it ?
Process is the ‘java’ executable (the JVM) that is running the application. Here is how you find the process in two most popular operating systems
Unix:
ps -ef | grep java
Note that you will have to fish through the list of processes to find your application.
Tip: In older unix flavors (especially the BSD based), you must use ps -auxww to show the full command line of the process.
Windows:
Use the Task manager to locate the process.
Why do I need to monitor this metric ?
No process = No application = Angry customers.
2. Java Heap Usage
Heap is the memory used by your java application for creating and storing Objects. To the dismay of many developers, heap is not unlimited. Running out of Heap will stall your application in most of the situations.
What is it ?
When you start a Java application using the command ‘java’, one of the command line options is the maximum amount of memory that the application can use.
java -Xmx1024m <application>
In the above command line, the application gets 1GB max (which is a lot btw) to be used as Heap. Heap is where objects created by your application live. When you run out of Heap, your application can no longer create new objects and you will see the infamous ‘OutOfMemory’ errors in the standard out of your application.
Why do I need to monitor this metric ?
1. A constant growth in Heap usage indicates a memory leak, a serious issue which can create production outage
2. Running into OutOfMemory error can have very serious effects. For example, it can stop the JVM altogether, or force a heap dump which will pause the JVM, or worst case, it can kill the JVM
3. Garbage collection Overhead
Yes it is Garbage. yes, when not cleaned, it will have stinking effect on your application. Poor Garbage collection performance, either due to your code or due to the application server configuration, can have severe effects on your Application.
What is it ?
Garbage Collection is the mechanism that discards unused Objects from the Heap, reclaiming the space for application use. JVM initiates a garbage collection process whenever it needs to. The critical point to note is Garbage collection can be a resource intensive process and can pause the JVM which directly results in poor performance.
You must monitor two metrics related to GC.
a. How often GC runs
b. How long does each GC cycle take
Why do I need to monitor this metric ?
1. Frequent GC can mean poor performance as GC is a resource intensive process. It can also mean an underlying memory leak that needs to be addressed quickly.
2. Not all applications have the same requirements. Some applications can withstand a longer GC pause time, but some cannot. Monitoring GC will help you determine the behavior and tune the GC accordingly (There are GC policies that you can play with to arrive at the right policy)
4. Number of Active Threads
Work enters your Application through Threads. A Thread in JVM is analogous to ‘process’ in the operating system. When you have too many threads active, it will mean your application is working very hard, or worse, there is an underlying issue (perhaps a poorly responding backend which ties up the thread). Regardless of the reason, too many threads will slow down your application and some cases hang not only your application, but the entire server.
What is it ?
Work is performed through ‘Threads’ by your application. The number of threads you allocate is configurable in most cases, using Thread Pools. The number of threads active is a direct indicator of work being performed by the Application. In some cases due to various reasons that are out of the scope of this document, too many threads become active and can have serious effects on the application.
Why do I need to monitor this metric ?
1. First thing you will notice when you have too many threads is active is the CPU utilization on the Server. There will most probably be a spike in CPU utilization (though not always). Having too many threads active will severely impact the performance of the application
2. Having too many threads active will lead to context switching among the CPUs and again, will have have serious adverse effects on your application.
3. Too many threads active can indicate a poorly responding (or non-responsive) backend.
5. Response Time
The obvious one. You must monitor the response time of your application so that you know how well it performs.
What is it ?
The time taken by your application to process requests. This is what the end user sees as the response time.
Why do I need to monitor this metric ?
Let’s face it. Nobody likes a slow application. Increased response time can mean several underlying reasons (perhaps you are running out of hardware resources, perhaps you are running out of Backend database connections in your connection pool, perhaps Garbage collection is taking up all the time, perhaps a remote web service you rely on responds poorly, and so on) So, it is important to monitor and act right away when you sense an increase in the response time.
There you have it. 5 important metrics you should monitor in your Java application.
Now this is great .But HOW DO I MONITOR THESE METRICS ? Well, there are tons of ways, but a reliable, efficient way would be to invest in a commercial APM tool. Read my article on ‘6 ways APM tools can make you sleep better at night‘ and go from there.
Good Luck
Hi,
i like your blogs. They are very knowledgeable .In this blog you mention 5 metric but i can count only 4(CPU utilization, Active Thread, Context Switching).Could pls tell me which one i am missing.
Thanks
Thanks Naik.
1. Process availability
2. Java Heap Usage
3. Garbage collection Overhead
4. Number of Active Threads
5. Response Time
Good luck.
Very informative and easy to understand the way the content is structured and presented
Good one karunsubramanian.