Building the Right Event Foundation for BSM

by doug on February 1, 2006

in Best Practices, Business Service Management, Complex Events, E2E Service Management, Event Driven Architecture, Event Processing, Events, Implementation

Whether you’re following a route, path, or some other plan to find value with Business Service Management, you’ve got to start someplace. I am a “always begin with the end in mind” type of guy so I often think from way out and decompose backwards from there to map out how I want to get there.

A key component existing within nearly every organization regardless of industry, vertical, country or otherwise is the notion of an ‘event’. Some may call this a message, indicator, notification or something similar.

Events simply communicate a message to something. The message communicated can be good, bad or indifferent, true or false, on or off, verbose or terse. The something on the receiving end of an event can be a human, system, or other application, code or similar logic. A programming focus area called event-driven programming exists if you’d like to investigate the nitty-gritty details.

There are many points of view from which we can have a discussion on events. I’d like to focus initially on the importance of creating the right events with the right data (message) that can be used for building a foundation for BSM (among other very important IT Operations things).

Sources

If the goal of BSM is to align IT to the business, we must identify all of the sources capable of enabling this. Nearly every IT organization has some form of network, server, application or other technology monitoring and management tool. These may be home grown, open source or COTS tools. Inventory what you have available that can help provide information about all components in the organization. Don’t forget to venture outside of the organizational silo that you may be in. I guarantee that there are other groups that have something valuable that they may be able to contribute.

A few of the common sources include the network (router, switch, LAN/WAN, etc.), security (firewall, VPN, etc.), server (hardware, operating system, storage, etc.), and application (database, webserver, appserver, etc.). Sources you may not immediately think of are your change management application or your help desk application. It’s extremely valuable to have events generated for each change request that communicate the lifecycle of a change request for any IT infrastructure component.

One of the key sources that may not yet exist in your IT organization is one that simply relates the detailed information from each source to the functional role or purpose that it plays. This could be a simple asset inventory database or a complete configuration management database (CMDB). This source ideally captures information about every single component within the IT organization and what role it plays in delivering IT and business services.

Perspectives

The idea here is that we need to collect from sources that are representative of all the different perspectives. This could be from the customer’s view, business view, third party view, application view, service view, inside/outside the datacenter view, etc. A good example of this is the business unit’s perspective of the CRM solution is that it’s a portal with all of the interfaces and tools used in one place. The IT group’s perspective is that the CRM solution is made up of a dozen servers, databases, application servers and network components. Each group has a different perspective on what the CRM solution is. As we identify our event sources, be cognizant of the different perspectives and identify events representative of them.

Raw Events

Everything generated from any of your sources should be treated as a raw event. A raw event is simply one that communicates a single piece of data which is usually very precise and in context to what generated it. For example, a server is capable of generating events about its CPU utilization. The event generated when CPU utilization is above the desired threshold communicates a discreet datapoint about the CPU on that server. By default, it doesn’t communicate how that event impacts other things on that server such as the application that runs on that server or the key business process which is enabled by that application running on that server. Raw events require further processing as they move northbound to increase their value and usefulness.

Normalization

The first step in processing raw events into more useful events should include passing them through some sort of normalization routine. The goal here is to develop a uniform event structure where every event, regardless of source or perspective, contains the same fundamental makeup. You will no doubt quickly determine that every event source you identified has some variation in the quality and quantity of events it provides.

Here’s a simple scenario. Let’s assume that you have a useful and functional naming scheme for the components within your datacenter (such as those recommended by Sun Microsystems here). Let’s say that you name a server ERP-APP-01.dmz.mycompany.com. This information would obviously be present in the event as the source or node name that generated the event. This may not be very useful in the long run. One approach would be to ensure that in my normalization routine (which is mapped to my event structure) that I parse out key information and add this into the more useful event. I know based on my naming standard that ‘ERP’ is actually the functional application that runs on this server. This would get mapped into a previously empty field in the event reserved for the application type. The next section in the server name is “APP”. This refers to the type of server in our environment and in this case the server is an application server. This would get mapped into a field reserved for server function or type. You should get the picture.

Node/Source: ERP-APP-01.dmz.mycompany.com | Summary: CPU Utilization High

becomes

The event and the information it now communicates is now much more useful since I know my Problem Management team will want to investigate how many incidents have occurred in my environment with our application servers.

You should put considerable thought and planning into what your normalization rules and routines will be. This is where you should really think out into the future with the end in mind. What do you want to use these events for? What process, workflow, or activity will your events enable? Incident Management? Problem Management? Historical Trend Reporting?

I will write on this topic extensively later and offer more detailed recommendations on event structure and normalization practices.

Enriched Events

I think there is a distinct difference between normalization and enrichment. Normalization as described above can generally take place with data and information you already have. The data and information used in the normalization process is generally static in nature and doesn’t change frequently.

Enrichment on the other hand is the process of leveraging a portion of a normalized event as a trigger for further enrichment or “advanced dynamic normalization”. Enrichment only occurs on normalized events and should be reserved for cases where the event couldn’t be normalized with information already known or that was static in nature. Enrichment often involves some form of integration with other data sources, tools or interfaces. It may even require some form of manual intervention.

An example of enrichment based on our scenario above could be as follows. We now know that a CPU Utilization event was generated from one of our application servers. The help desk application is the official repository for on-call support information for our IT organizations. An enrichment routine has been developed that is triggered based on the missing on-call support group information in our event. Because we know the Application Server group is responsible for this event, the enrichment routine fetches on-call information from the help desk application for the Application Server group and adds this to the previously blank support fields.

becomes

The enrichment routine is used to fetch information that changes frequently based on the type of event, time of day, etc. This information is dynamic in nature and is usually stored in an authoritative source location such as the help desk solution.

Messages

The goal of transforming raw events into normalized, enriched events is to aide in communicating the right message. The message intended for communication may be very detailed information about a single specific component in the IT infrastructure or it may be a summarized message that communicates a message for all of the IT infrastructure components supporting a complex application or service.

Remember the goals of BSM are to clearly communicate a message about how the business may be affected by events within the IT environment. At the very least, every event should clearly communicate what business or customer entity, application, service, process, etc. that it most directly affects. Keep in mind the following bigger picture. This is the alignment desired with BSM in such a way that the most junior systems administrator or help desk technician clearly understands.

The CPU Utilization High event caused the Application Server to slow down. The Application Server is a key component of the SAP ERP application. The SAP ERP application hosts the Customer Billing application for MyCompany.Com. Because of the slow down in the Customer Billing application, the nightly transaction job failed to complete causing a $1M loss in revenue recognition for the month.

To summarize, the business impact of the CPU Utilization High event on this server was directly responsible for $1M loss to the business for the month. The events and messages should communicate the impact to the business in terms that everyone understands, convey the severity of the problem and prompt the appropriate actions.

Topics for Future Discussion:

Event Construction
Normalization in Detail
Enumeration
Naming Schemas
Incident Classifications (Category, Type, Item, etc.)
Severity/Priority/Impact Classifications

Comments on this entry are closed.

John DelRe

We are in the process of undertaking a major BPM initiative. I am looking to develop a Service Management Communication Campaign. The campaign’s goal is to keep the Service Management Initiatives in the limelight by communicating successes,updates, next steps, etc. I would like to speak with you on this matter. Regards, John DelRe AVP Service Management, Bank Of New York 212-815-3384.

Link
Dougie Stevenson

Each event received, in its raw form, is delegated to be a trigger of sorts to transition a state or condition boundary.

For example, you have a Web Server in your environment. In the normal, day to day operation, it takes in requests and posts pages back to clients, maintains connections to a database, and empowers both servlets and CGI programs to run on demand. Normally, everything is cool or in a normal state. When you get an event like High CPU utilization, that event becomes a trigger to illuminate a condition.

I also classify events as direct, observed, or implied. Direct or those that come straight from the function. Observed are events that are generated as part of polled or passively captured data. Implied events are those that are derived specifically from other events.

While not every event you do something with or even care about, for me, when I think of event management in these terms, it is easier for me to envision workflow – Both care and feeding and Incident / problem management.

Link
doug

Thanks for commenting Dougie! You’re a huge part of my inspiration for taking event management concepts to a higher level!

Dougie’s work goes waaay back. I just tried to Google for your similar “You’ve Got Events” work but couldn’t find it. Please share a link to that if you know where it may be hiding on the net.

Doug

Link

Next post: The Role of BPM 2.0 in BSM

Previous post: BSM for MSP