En Ru
Embedded monitoring
for business applications

Business Application State Monitoring

11.03.2009
Photo: Elena Popretinskaya, Architect by Elena Popretinskaya, CTO        

We have developed a Monitoring Framework, which simplifies investigations of problems in complex software systems, like intermittent performance issues, or functional problems appearing only under specific circumstances.

This article describes such problems, our solutions approach and the structure of our monitoring framework.


Introduction


Enterprise software applications very often have high complexity, usually handling billions of simple or complex transactions, performed interactively by users or automatically by the system, every day.
Any of these thousands of actions per minute may begin to behave abnormally at any time, sometimes only at peak loads, sometimes continuously and therefore easily reproducible, but often in a sporadic, random and apparently non-reproducible way, like sudden performance
  Useful related links:

Complex algorithm development

GS Monitoring Framework
slowdowns, unexpected errors, loss of data, or even crashes, and, mostly, for reasons difficult to find.
In general, the reasons are incorrect program implementations, database deadlocks or performance issues, incorrect system settings.
These types of problems are often extremely hard to reproduce in a development environment, as these may show up only under a specific load in the production environment. Therefore problem investigation can become a real nightmare.

 

Problem Investigation Difficulties


For a developer or administrator, investigating the reason and prevention of these kinds of problems is quite difficult and time-consuming. It usually involves some of the following questions:
  • How to find and reproduce    abnormal behavior ?
                                                  sporadic errors ?
                                                  occasional performance problems ?
  • How to trace these down to a software module or component level?
  • How to debug these ?

The most common techniques used today are logging, source review and analysis.
A much more effective tool is an embedded application monitoring system, as described herein.

 

Weakness of the Logging Method


In most cases, developers and administrators use different logs, causing the following issues:
  • With production systems, logging usually works in limited mode (only the most important info, warnings, errors are being logged) and the information provided is not enough to investigate the problem.
  • Switching from logging mode to debug-level causes the system to produce hundreds of megabytes of logs with billions of records, most of these are totally useless. It’s nearly impossible to find the problem causes in debug-level logs, without implementing some kind of a log parser. Besides that, debug-level logging may also cause performance problems itself, by working with several hundred megabyte file, and most of the time it is not even permitted for a  production system.
  • Logs do not help a lot if a problem is of random, occasional nature.
 

The Power of Monitoring


The method of monitoring has several advantages:
  • Monitoring allows to work with real informative facts for problem investigation. It allows to create logs filtered from information trash.
  • Monitoring allows concentration on problem areas via a notification mechanism. This is useful especially for determining and investigating problems of sporadic nature.
  • Monitoring can be used at production systems, since it does not degrade systems performance.
 

Monitoring Indicators


In any application, a set of key values and characteristics can be defined, which indicate the current state of a system.  Typical examples of such indicators are:
  • Performance of key methods (e.g. the database access methods). This may discover bottle-neck areas, helping a developer/administrator to pay particular attention to these. For example, a too slow method execution may indicate that it has been implemented in a suboptimal way, or the component associated with this method (e.g. database) is not properly tuned.
  • Certain event occurrence rates (e.g. counts per minute of a method call, user requests per minute, errors per method call, etc.). Monitoring of this indicator allows discovering abnormal system behavior.  For example, a high error occurrence for a certain method is in most cases an evidence of an incorrect method implementation.
When a system works in normal mode, then the values of these indicators lie within etalon boundaries, and don’t need to be logged, because such information is not significant for systems maintenance. If the value of one or several indicators exceeds etalon boundaries (e.g. the error rate becomes greater then zero), then this indicates abnormal system behavior. Such event should be recorded and notified to the system administrator.
 

Monitoring Approach


The simplest way to monitor the state of a system is to monitor key indicators by this simplified scheme:
  • Every time the value of a monitored system indicator is being changed, these changes are analyzed.
  • If the indicator value becomes abnormal (it differs from the etalon value greater than to be expected) then we need to process this incident, in most cases just write a record to the log and/or notify the administrator.

Picture: Basic approach to monitoring
Figure 1 – Basic approach to monitoring


 

Requirements to System Monitoring

 
  • Monitoring must be easy to use. Ideally, we would like to separate the monitoring logic from the application logic. We do not want to write repeatedly code throughout the application. We only want to add monitoring points for key indicators, let the monitoring system analyze the indicator, and eventually process any abnormal state. So far, point monitoring is similar to logging. We don’t implement a logging function every time we want to log something. We just tell the logger what should be recorded, and it takes care of details.
  • The influence on the application must be minimal. Monitoring should not cause any performance decrease; detected errors must not cause the monitored operation to fail.
  • The Monitoring system should be flexible and expandable, and the monitor settings should be simple and easy to use. There should be a capability to implement custom monitoring functionality in order to monitor values which are significant for business logic (e.g. the value of a project budget should be monitored in order to avoid exceed in its limits).
Meeting these requirements allows minimizing development efforts.

Here is a simple example:
Lets assume that we do not use any kind of framework and want to implement monitoring logic for 3 similar indicators (e.g. performance of 3 methods).
Generally, we need to implement for every indicator:
  • etalon indicator value initialization logic
  • analyzing logic
  • abnormal state processing logic
We need to implement 3x3=9 methods/classes and call them N times in the same order (where N –monitoring point count). This requires about 3x3xN additional lines of code.
In case of a well thought-out approach, this would be 3 methods/classes which can be also used for other indicators, and 1 additional line of code for every monitoring point.

 

The GERSIS Monitoring Framework


The main parts of our Monitoring System are built on the base of a monitoring framework:
  • Monitor Manager provides access to different types and instances of so called monitors. It also manages the monitor lifecycle: instantiation, initialization, and destruction. Using a Monitor Manager simplifies the process of adding monitoring points to the system.
  • Monitors. Every monitor watches a specified indicator value and discovers abnormal behavior. Since the logic of indicator analysis can be the same for any group of indicators, one monitor type can be used for many indicators (e.g. the Method Performance Monitor can be used for different methods at the system). Monitors are working in parallel with monitored operations in order to minimize their influence on the system. Custom monitors can be implemented from scratch, or by extending standard monitors, e.g. we want to monitor a budget value of a department and notify the manager if it’s being exceeded. This provides framework flexibility.
  • Monitor Adapters. Every abnormal indicator state will be processed by the monitor. Monitor Adapters keep the logic of analysis and processing separated. This allows different types of monitors to use the same adapters. For example, a typical case of abnormal state processing is logging. Therefore logger adapters can be used almost for every monitor in the system. In order to provide additional flexibility, a set of monitor adapters can be configured for a given monitor instance (e.g. via a file).
  • Monitor Initializers. Every monitor has to deal with etalon values to discover abnormal states. In most cases, the monitoring logic has to deal with the same types of monitored indicator values. Therefore etalon values have the same type for different monitors. This means the logic of initialization of etalon values can be the same for these monitors (e.g. load etalon values from a file) and we don’t want to implement it in every monitor to avoid logic duplication. A concrete monitor initializer can be configured for a concrete monitor instance (e.g. via a file). In addition, a developer can implement custom monitor initializers (e.g. an initializer which reads etalons from a database) and configure a monitoring system for his needs. This provides framework flexibility and expandability.
  • Monitoring Settings Factory. This component provides access to concrete monitor instance settings (a set of Monitor Adapters and Monitor Initializers for a concrete monitor instance). The settings can be stored in different ways: as XML-file, in the database, hardcoded, etc. The Monitor Settings Factory encapsulates these details and provides a common way to retrieve settings. The Monitor Manager is configured to use a concrete Monitoring Settings Factory implementation (it can be standard or custom, developed for a concrete system).

 
Picture: Monitoring System built on the base of Monitoring Framework
Figure 2 – Monitoring System built on the base of Monitoring Framework



The GERSIS Monitoring Framework has been developed with the following targets in mind:

Requirement How GERSIS SOFTWARE Monitoring Framework fulfills it
Monitoring must be easy to use The Monitor Manager takes care of the monitoring lifecycle.
A uniform Monitor Interface along with a set of standard components (Monitors, Monitor Adapters, Initializers) allows in most cases a Monitoring Framework by implementation of only 1-2 lines of additional code per monitoring point.
Influence on the main application must be minimal Monitors work in parallel with the monitored system by using multithreading technology.
Monitoring system must be flexible and expandable Using Monitor Adapters and Initializers permits re-using sets of components to execute similar tasks (monitor initialization, abnormal state processing) for different monitor types.
Every Monitoring Framework component is described with a corresponding interface. This allows implementation of custom components and provides all necessary system specific capabilities.
Settings are handled by the Monitoring Settings Factory and can be easily changed. To change settings, only the configuration file needs to be edited. This avoids time consuming code refactoring.


If you are interested in the GERSIS Monitoring Framework and want to get more information about it or obtain a test version, then please contact us here or contact Valery Kireitchik directly.


Bookmark or share:
Digg Stamble Upon Facebook Technorati Twitter Mr. Wong GoogleLinkedIn Delicious