AlertSite-Blog-CTA-Header-2-1

Ham ‘n Eggin’ with Logs and APM

ham-and-egg-alertsite-logentries

This guest post was written by Trevor Parsons, PhD; Co-founder and Chief Scientist, Logentries.

I recently wrote a post entitled APM is not enough outlining how log data has become a new rich source of information that can be used in conjunction with your APM solution to help you better understand your systems. In short, there are some instances, especially for cloud based systems, such as SaaS or PaaS components, where you may need to use log data to shed light on parts of your system where an APM solution can not reach. This can be particularly convenient in the case where instrumentation might be difficult or impossible.

For example, think of a situation where you are using some SaaS components as part of your end-to-end system; you will unlikely be able to instrument this SaaS service, but such services regularly produce log data or provide APIs that you can poll to get useful information in the service such as errors or performance information.

On the flip side, however, sometimes logs are not enough — or more precisely, sometimes you may simply not have enough fine-grained log events to be able to get full visibility into parts of your system. This is where APM can come in as a great way to be able to “flick a switch” and turn on profiling such that you begin to capture performance data from within software components that you may not be able to otherwise modify. For example, in the case where you wanted to add additional log events to a software component that you did not write and you do not have access to the source.

APM and logs can truly be complementary solutions and where one falls down the other will step up it’s game – in golfing terms often referred to as “Ham n Eggin’,” i.e. When one player is not doing well, another takes up the slack.

Here’s a typical real world example: Think of a typical application with both a Web and mobile app interface where you are tracking both user behavior and related performance. Logging from the client Web browser or from the mobile application (using a client side logging library for example) can give you some great insight into how a user is using the application functionality, what buttons they are clicking on etc.

Here’s a typical log event captured from the client side:

5-12-2014 12:15:111 JS_LOG IP=212.212.212.213 REQUEST=SendMessage UniqueClientID=123456 PAYLOAD={’MessageTO’:’user123’, ‘Message’:’Hey Tom, love the photo.’} RESPONSE=’200’, RESPONSE_TIME=‘10s’

The event tells us that a given user (with user ID 123456) sent a message to their buddy “Tom,” which gives us insight into what the user was doing. You can use this client side log data to aggregate log events like this to give you a view into a particular users behavior over time or to look across group user behaviour.

You can then also cross correlate this with performance metrics contained in the log event to easily see if there are any performance issues or outliers in terms of response times.

logentries-visualizing-feature

Visualizing Feature Usage Trends and Response Time Data from Logs (click to enlarge image)

In the example above, however, we can see that it took 10 seconds to get a response, which is not an acceptable wait time – at least not according to the 3 response time limits as outlined by Jakob Nielsen in his seminal publication on Usability Engineering. To investigate further a next logical step would be to look at the backend logs to figure out how long it took the server to provide a response. This helps figure out if the long response time was due to time spent on the network or, rendering on the clients browser/mobile app or if indeed there was an issue with performance within the application backend.

A quick look at the Web server logs shows us that the issue is indeed in the backend as the apache Web log still shows a response time of 9.8 seconds:

123.345.678.910 – - [20/Jun/2014:20:36:10 +0000]  301 “GET /api/send_message/?&from=1403295811054 HTTP/1.1″ 178 9800

However, at this point we need to deeper dive into the application logic to better understand where this time is being spent. Consider a situation where your backend consists of a number of middleware components and application logic where you do not have access to the source code and where no meaningful log events containing performance data is available. Enabling an APM solution will not only give you visibility into the performance of the application logic, it also allows you to trace a given transaction across your different components so that you can pinpoint exactly where in your application the bottleneck is.

When you turn on the APM solution it can dynamically instrument your application (using techniques like bytecode instrumentation for example) such that it immediately starts to capture performance information and transaction traces through your code. Probes can capture the time it takes each method to execute (in the case of a Java application, for example). Transaction traces can also be recorded using a unique identifier and passing it along the transaction so that different method or function calls can be linked together and the sequence of method calls presented, resulting in something like the transaction below:

1. request URL = ‘GET /api/send_message/’, execution_time= 9854
2. –> method called = ‘org.apache.catalina.connector.RequestFacade.getSession()’ execution_time = 9701
3. —->method called = ‘org.apache.catalina.connector.RequestFacade.lookupReceiver()’ execution_time= 9500
4. —->method called= ‘org.apache.catalina.connector.RequestFacade.postMessage()’ execution_time= 199
5. –> method called = ‘org.apache.catalina.connector.cleanupActions()’ execution_time= 46
 

The above example shows that a significant percentage of time is actually being spent in a specific method (lookupReceiver())which we should look at optimizing. APM solutions like AlertSite will roll up these transactions into a nice dashboard and allow you to easily see where the highest percentage of time is being spent and what methods are being called most frequently for example.

alertsite-uxm-transaction-tracing

Transaction tracing with Alertsite (click to enlarge image)

So if you want complete visibility across your systems, one size doesn’t always fit all.  Log management and APM solutions fit very well together and are very much complementary. Where one solution falls down the other generally picks up the slack .… you might even say they go together like ham and eggs!

The good news is you can check out both AlertSite and Logentries for free.

See also:

Comments

  1. Lamaur Cheetum says:

    Sometimes the info. Log. Data will rewrite great.

  2. My brother recommended I might like this web site. He was totally
    right. This post truly made my day. Yoou cann’t
    imagine simply how much time I had spent for this information! Thanks!

Speak Your Mind

*