Saturday, September 12, 2009

Distributed Application Management - End-to-End Management

When a problem occurs, the amount of time it takes to identify the root cause and apply remedy directly impacts the overall service level of the application. Because modern application environments are built on a wide array of technologies, each of which may impact performance and availability in a particular way, it is important to have a complete set of tools to access the specific diagnostic information for each technology. If one relies on point solutions, troubleshooting will invariably involve a significant degree of context switching, which basically means looking at multiple consoles and having to cut and paste information between them to get to the root-cause, resulting in delayed resolutions and increased stress for administrators. Relying on point solutions also slows down the diagnostics of the actual problem because of the finger-pointing between different organizations. Hence it is very desirable that these tools are integrated to provide a comprehensive view of the performance and availability of the applications and the underlying infrastructure as well as the ability to rapidly diagnose problems when service-levels are violated or are close to being violated.

In troubleshooting, the first step is to isolate the components that may be causing the problem. This task can be greatly simplified through the use of integrated configuration management capabilities and the dependency information from the CMDB. Dependency information stored in the CMDB helps narrow down the list of components that may be contributing to a problem. Once the components are identified, change history information stored in the CMDB can rapidly provide insights on why a previously working component began to malfunction. In the e-commerce application example, the IT staff could use the CMDB to identify the components that are associated with checking out, which would include the checkout logic, the application server and the database. After that, they could search the CMDB for all the associated components for changes that have been made against them to see if any behavior change can be attributed to changes in configuration.

For many kinds of problems, administrators need access to historical data about multiple tiers of computing before the root-cause can be identified. An integrated tool that can correlate end-user response times with middleware and database processing times can save the administrators precious time and effort. Through the recorded performance data, one could visualize the demands that were placed on an application in a given point in time, information on resource consumption and potential contention. In the e-commerce checkout example, the IT staff could retrieve performance data collected from the application, the web server, the application server, the database, and the operating system in order to visualize the behavior of the environment. They may discover that the database server had a very high load because of competing batch workload, which slowed down checkout processing.

Another technology that is useful for troubleshooting performance problem is transaction tracing. In modern distributed application environments, processing of a request frequently involves multiple components, which may or may not even run on the same server machine. Using these transaction tracing tools that are designed specifically for the type of application being analyzed, one could follow through the processing of these requests to find out how much time is spent at each step in order to identify bottlenecks. Recall the management-aware discussion? If your application platform and management tools actually understand each other, you will get better information, more timely information and you will make better decisions. Ask your vendor to demonstrate the depth of the management tool’s ability to learn about your application and database infrastructure. In the e-commerce application example, the IT staff could look up collected trace information about checkout operations starting at the application server mid-tier level, and discover that most of the time was spent in the database. Using tools designed specifically for troubleshooting database, administrators could drill down to the database to analyze SQL statements to look for ways to optimize them, such as rewriting the SQL statements or adjusting table indices.

Picture: Transaction Diagnostic for Siebel CRM