≡ Menu

6 ways APM tools can make you sleep better at night

  •  
  •  
  •  

It was 2:35 AM on a Tuesday when the phone rang. I was sound asleep. With some struggle I managed to find the cell phone which was next to my pillow. I strained my eyes to look at the screen to see who was calling. It read ‘Central Monitoring and Operations’. Without getting up, I placed the cell phone on my ear and in a cold and sleepy tone I said ‘Hello’.

baby-sleeping-600

“Oh, hi. Sorry to bother you at this time. Are you the oncall person for Application Support?”

I knew it was not going to be good. “Yes”, I said.

“We just got a call from XYZ financial. All their transactions coming to our systems are failing since last night”.

“Hmm.. Is this something that can wait till morning” ? I knew it could not, but I asked anyway.

“I don’t think so. Their VP was on the line and he sounded pretty upset. He did mention that there were serious financial implications because of this outage”

“Great”, I thought to myself – this is going to be a long night (or day, or whatever). I pulled myself up.

The operations guy continued, “There is a bridge line open. If you can join right away, that will be great. Meeting invite should be in your inbox”.

“Yes, give me 5 minutes”. I hung up the phone, got up, VPNed to the network and started looking at things. It turned out that our Application was indeed having issues. It was not completely down, however, it had been throwing ‘Out Of Memory’ errors since 9:30 PM. IT was crawling and basically was not processing any requests. I had to restart the Application Server to restore the service, which I did. But that was just a temporary solution. I knew I had to answer some tough questions from my management later.

I had to answer some tough questions in the morning when I got to work at 8:30 AM. My manager was waiting for me and without wasting any time, he took me to our Director’s office and I can clearly tell both of them were not happy. Both my manager and director were veterans with 15+ years of service. Both of them were technically sound as well.

“Thanks for your work last night Karun, hope you had some sleep”, said my director. “No problem”, I said.

“Listen, I have a meeting with our VP in 15 minutes. I need to have a plan on how we are going to deal with this XYZ financial outage”. I nodded in approval, waiting for him to proceed.

Here are the questions I need answers from you:

How did our customer find an issue with our application before we did ?.

“The issue had been going on for hours before the customer contacted us. Why didn’t we catch the issue for hours ?”

Do you have all the diagnostic data you need from last night to proceed with analysis for root cause ? How do we make sure diagnostics information is collected when the issue surfaces.

” How can we prevent this from happening again ?”

I paused for a moment, and then said “A P M”.

“all your questions can be addressed when we invest in a good quality APM tool”.

—————————————————————————————————————

Few things in the world are worse than getting called at 2:35 AM to work on critical issue. As engineers, managers and directors responsible for the health of our IT systems, we must act NOW before things get out of our hands. When it comes to monitoring and diagnostics of Software Applications, APM tools (Application Performance management tools) can be life savers. Gone are the days where you can monitor your systems  using a cron based shell script written by some admin years ago. I have done that, and paid the price several times.

There are 6 things that an APM brings to the table that will give you well deserved un interrupted sleep.

Here are they:

1. When configured correctly, APM tools can alert you before a production issue occurs.

How do they do it ?

APM tools collect performance metrics from an instrumented application nonstop. These metrics are collected typically in a central place that can be used to create monitoring elements such as ‘Alerts’. Modern APM tools ‘learn’ the trend and set baselines automatically. You can set your own threshold if you prefer. You can easily configure an Alert that can page/email the appropriate support/operation personal.

In the chart below, you can see the heap utilization gradually goes up, a classic example of a memory leak. If you had an APM tool correctly configured, it could have triggered an alert when things are starting to go bad.

sleep better heap

2. APM tools can take actions you configure when an event occurs

How do they do it ?

When an event occurs, for example, a breach of a threshold, you can configure an APM tool to take ‘action’. This action could be running a shell script, or perhaps sending an email or my favorite, taking a thread dump. This thread dump can later be analyzed to find the root cause of the issue. This cool feature of most APM tools actually enables us to have a virutal assistant doing things without you waking up in the middle of the night.

3. APM Tools can automatically discover backends

How do they do it?

APM tools will automatically detect the backends that your application is making calls to. For example, it can detect all backend databases and all web service calls that your application is making. In complex applications, sometimes no body in the company knows how many database backends that is out there. With APM, you can clearly see the application interaction with other systems. There are number of times where some system somewhere will get decommissioned, only to find that your production application was using it for one particular functionality – 2:35 AM calls will be back in the menu. You don’t want that.

4. APM tools can take snapshots of the Application including business transactions

How do they do it ?

Snapshot of the application is much more than just a thread dump. A snapshot will show you everything that the application was doing during the time the snapshot was taken, including details down to the SQL query level and code level diagnostics. This snapshot is extermely useful when dealing with slow applications or hung up applications.

5. APM Tools store performance metrics data for future use/analysis

How do they do it ?

APM tools store the performance metrics data collected in either File system or RDBMS. This information is of primal importance to troubleshoot an issue that occurred in the past. It is also useful for trend analysis

6. APM Tools can create neat reports that audience of all types can understand

How do they do it ?

At the end of the day, you must have charts, graphs, tables and stats to prove your point. APMs can produce beautiful reports that can be customized and shared with various audience. It is important to structure your report for various types of audience. For example, a developer will be interested in JDBC Connection Pool utilization where as it will mean nothing to a VP of marketting. Most tools can create Excel, PDF or HTML reports (or all of them).

There you have it. Losing sleep is no fun. Especially when your production systems are running with NO monitoring or broken monitoring and your customer is screaming. You must invest your time and money into procuring a quality APM product. The ROI on this effort will be priceless. Remember, “Prevention is better than cure”.

Once you have an APM tool up and running, for the first time you can have your un interrupted 7-8 hours of sleep, time that is well spent.


  •  
  •  
  •  
{ 0 comments… add one }

Leave a Comment