Thursday, May 15, 2008

Demystifying Siebel Application Response Measurement

Siebel Application Response Measurement (SARM) is a performance-tracing framework that was originally introduced in Siebel 7.5. Even though the technology has existed for almost five years, it seems there are still some misconceptions about its design and intended use. Since I was the original product manager for SARM, I guess I can try to offer some explanations.

Myth #1 – SARM is Siebel ARM

Back was Siebel was an independent company, our strategy to provide Siebel management tools was to instrument the Siebel platform and work with 3rd party ISVs to adapt their tools to work with Siebel. As part of this strategy, we thought it would be a good thing to try to comply with industry standards such as Application Response Measurement (ARM) so that tools that support ARM can be used to monitor and diagnostic Siebel performance. Therefore, it is possible to consume SARM data by using an ARM-compliant tool.

However, strictly speaking, SARM is not an implementation of ARM. The problem with standards is that they often have to sacrifice capabilities for compatibility and provide the lowest common denominator solution. We found that ARM, specifically ARM 2.0, was not rich enough to capture Siebel-specific performance data. As a result, we built SARM to capture a superset of the information, and pass a subset of that to the ARM API. Specifically, contextual information such as the names of the Siebel UI views, business components, workflow processes and scripts are not passed through the ARM API, which would make it a bit difficult to tell what goes on in processing transaction requests.

In other words, to fully take advantage of the rich information captured by SARM, you need a tool that processes the native SARM data stream.

Myth #2 – SARM has high overhead

The driver behind SARM was the need for a way to identify transaction request performance bottlenecks, especially for interactive user workload. It used to be rather strict-forward to do this in the Siebel 2000 (version 6) days, as Siebel applications were deployed with 2-tier client/server topologies, with direct connections from clients to the database. In Siebel 7, the topology became truly multi-tiered, and with database connection pooling, there was no deterministic way to tie a database transaction to the user request. SARM was intended to be the remedy by providing a way to trace transaction request throughout the Siebel mid-tier.

As a performance management tool, the last thing that we needed was having SARM introduce more performance problems. Consequently, we were obsessed in squeezing every last bit of performance out of the tool and making its overhead as low as possible. This was achieved through several means:
- Record timing information while doing as little secondary processing as possible in real-time
- Use highly optimized buffered I/O to persist performance data
- Provide various throttling mechanisms to control the amount of SARM data captured

Prior to releasing SARM, we ran SARM through numerous load-testing scenarios. For example, in the Call Center 1 load tests, which simulated hundreds of simultaneous users running against a single Siebel app server, we observed SARM overhead to be less than 3%, well within our product performance requirement. We thought this was a reasonable cost to realize the benefit of having good management data for optimizing the application.

Myth #3 – SARM is only for production diagnostics

While a lot of the initial discussions about SARM were for performance diagnostics, we have always intended SARM to be a framework that supports the full set of application performance lifecycle activities.

SARM really is just a set of timers that measure the timing of transaction requests, as well as the timing within various points in the “call graph” of the Siebel software stack for processing the requests. SARM doesn’t care whether the timing came from actual user operations while the application is live or from activities generated from pre-production load tests. While in production, the data that it captures can be used for day-to-day monitoring as well as diagnostics, as well as longer-range capacity management.

1 comment:

presta said...

really cool.
I'm glad to hear you're still around...