thoughts on business, service and technology operations and management
Random header image... Refresh for more!

Category — Design Patterns

WYNTK on TBSM Design Patterns: Architectural Model for Composite Applications Follow Up

As I was catching up on feed reading today, I came across an article I originally read back in August that can be useful in decomposing custom composite applications and developing representative TBSM service models.(WYNTK on TBSM Design Patterns for COTS and Custom Composite Applications) Brandon Satrom is an Enterprise Architect who is involved with developing enterprise architectures and custom composite applications. He proposes a Composite Application Framework (CAF) which you can review on his blog postings here and here.

Why is this important? You can review his work and references to gain insight into why custom composite applications are built and why they provide value to the business. With this, decomposing them into respective components, layers and tiers may become easier within your environment. From there, you can then see what kinds of visibility you have into those areas in terms of performance, availability, user experience, capacity, reliability and most importantly how all of these things may impact the business in meeting their goals and objectives.

January 16, 2008   1 Comment

All I want for the new year is a BSM Profile for ITM 6.x

One of the foundations of Business Service Management (BSM) is to see things from the business perspective. To get there, one of the best ways to do that is to instrument for BSM at the source. The majority of all server agent deployments for HW/OS monitoring and COTS application agents (DB, AppServer, etc.) are deployed in a pure out of the box (OOB) manner. Most clients will take what their vendor offers up within these OOB configurations as their own “best practices” and leave it at that. Some may take the time to modify thresholds in an effort to control the signal to noise ratio in the event console. This practice generally leads to a “needle in a haystack” approach for event management and a bad reputation for the tool in use for monitoring.

There will always be a need for the “best practice” OOB monitoring offered by vendors in their solutions. The primary audience is the NOC, EOC and support organizations. They speak this language and have been programmed to know what to do when the “95% CPU Utilization” event comes in. The BSM Profile concept is to help organizations move beyond this to focus how these lower level things may impact the business services, applications and activities that help the business meet its goals and objectives.

I want to see things change. I want to see the ability to configure monitoring with a purpose that’s above and beyond the OOB configurations. The establishment of a BSM Profile for managing the server HW/OS and deployed applications enables visibility into what that server or application really exists for - supporting the business.

The ideal BSM Profile for ITM 6.x would include the following:

  1. Custom instrumentation into very specific business service, application and transactions ALIGNED to the LoB, services, applications and key business activities that they enable (and impact).
  2. Custom free form text fields that enable creation of a specific and unique message relative to the above.
  3. Mapping into custom event fields/slots of the ideal BSM Event Format.
  4. Generate purpose built events, KPI/KPM or other data for the purposes of driving a service model or dashboard

In an effort to collaborate on how to create such a BSM Profile, DevCampTivoli has been created. The theme for this event is “Collaborative Development of End-to-End BSM Solutions”. The desired outcome is to come up various approaches for developing a BSM Profile for ITM 6.x, necessary configurations within the Tivoli EIF probe, Netcool/OMNIbus and TBSM 4.x that can be easily customized and implemented at any client. Whatever the DevCampTivoli produces will be freely available to anyone to take, modify and use to improve their BSM deployments.

Take a few minutes to visit DevCampTivoli. This event will be the May 17-18, 2008 which is the weekend before the annual IBM Tivoli User Conference Pulse 2008 in Orlando, FL. The thought and hope is that SME’s and practitioners in ITM, Netcool/OMNIbus and TBSM will already be coming to Pulse 2008 and will be able to come in a couple days earlier to participate.

More to follow…

January 16, 2008   4 Comments

WYNTK on TBSM: TBSM Design Pattern - Architectural Model (COTS and Custom Composite Applications)

The purpose of this TBSM Design Pattern is to create models representative of the key relationships and dependencies within COTS and Custom Composite applications. The implementation of this design pattern should provide a foundation for visibility into the applications by operations support teams, SMEs, developers and IT management. The level of detail expected in models of this type is generally abstracted from executives and the line of business in favor of higher level model design patterns.

One of the “holy grails” of Business Service Management is to have an understanding of the cause and effect relationships between the IT environment, the business services and applications provided and their actual use and/or performance to meet the businesses goals and objectives. Clients are beginning to deploy synthetic and real user monitoring solutions to help provide this insight. IT and the LoB are reporting weekly how well they’re performing and meeting business goals and objectives. This design pattern lays a foundation for being able to relate fine grained IT components to business service and application user experience and performance and the understanding of how one may impact the other. Without this foundation, we will continue to force operations and support teams to seek “the needle in the haystack” when the business reports problems within key business services and applications.

The first goal of modeling within this design pattern is to understand the architectural and implementation characteristics that applications, databases and custom composite applications can take. This is the nuts and bolts; the model that represents deployment onto the servers, the networks and into the datacenters. It’s how load balancers, virtualization, redundancy, clustering and high availability come into play in supporting applications, databases and custom composite applications. For example, an Oracle deployment may be in an Active-Passive, Active-Active (Cluster) or Oracle Grid architecture, deployed on one or more servers within one or more datacenters. It’s one thing to model within this area, but most clients struggle with knowing the state or status of the architecture. Clients must develop an approach for understanding what architecture components are active, online, in standby, load sharing, etc. through improved instrumentation.

The second and most important goal is to continue to improve the level of visibility and understanding of how these components operate, perform and ultimately their role in supporting business services and applications. How is the business service impacted when an application or database has a problem? What happens when one member of the database cluster fails? This doesn’t come from your vendor but from your SMEs, support groups, developers, and the business. The path to this point may be achieved with a simple model or require a fairly sophisticated model depending on your environment. Follow on TBSM design patterns will focus on interacting within this area.

The first challenge a client often has is that their monitoring tools group have not matured their fundamental management and monitoring to a point of having enough visibility into these areas. Clients continue to battle with keeping up with the basics of hardware and operating system changes, new versions, etc. Most clients have invested in some fundamental hardware and operating system monitoring capability or are making use of custom scripts, system logging or similar. They may have set up some basic logfile or process/daemon monitoring, but not yet invested into capabilities that enable deep visibility into applications. More often than not, it’s the SME groups (SysAdmins, Application Support or DBAs) that have the best visibility into these areas. Do not accept “they have their tools and we have our tools” as the answer. There is tremendous value in integrating SME tooling, scripts, etc. into the collective management and monitoring environment, even if it’s only for the purposes of driving these models.

More and more, COTS applications and databases include some capability for internal instrumentation and visibility into state, status, availability and performance characteristics. This is usually enabled through some additional configuration option, module, via an administrative console or a complete stand alone application (Oracle Enterprise Manager). Third party vendors and open source solutions may exist for management and monitoring of these applications or databases. Think agents for things like Exchange, SAP, DB2 or Oracle (IBM Tivoli, Quest Software, etc.) here or specialty applications that tend to fall into the advanced diagnostics and performance tuning areas (Quest Spotlight on Oracle).

Custom composite applications are going to expand the usage of COTS technology to include custom software, application servers, integration techniques (SOA, EAI, etc.), and other distributed and mainframe systems. The same approaches described above should be followed, with more emphasis on working with the various development and support SMEs.

Creating an accurate model is the easy part, bringing it to life requires data points, metrics or other information that can be used to determine state, status, performance, availability and impact. Modeling requires working with the various SME groups to fully understand what has been deployed, its operating characteristics and how it supports the business services and applications.

Once a good understanding has been achieved, the monitoring tools group will need to perform a gap analysis on what they have and don’t have to be able to represent the model in a production TBSM deployment. Do yourselves a favor here and partner up with the SME’s. Work through the politics and find a way to integrate their information and insight into the core management, monitoring and event collection solutions. Identify the gaps you have in your core tools and visibility, identify a plan to fill those gaps so your BSM solutions can provide the expected value.

I generally see a couple different approaches for building out these Architectural Models and they tend to play into the two scenarios above. Most often due to the weaknesses in fundamental instrumentation and monitoring, clients will simply create a template for the server and a template for the COTS application. Each template will generally include the expected “up/down” type status rules looking for incoming events. The COTS application template may have include a few more rules to cover more specifics about the application, but more often than not they are broad based status rules looking for process, daemon and log file messages.

There’s nothing wrong with this approach at all, but it’s a matter of getting to the fine grained visibility that’s difficult with this approach. Do you need to know that a specific transaction, process or table space out of dozens or hundreds is the cause and that it’s only impacting a portion of your business service or is knowing that you have an application or database problem good enough? My operations background says that it’s always best to help folks get to the specific problem as quickly as possible and to not generalize things.

The preferred approach is to break things out into more specific components of the architecture deployed. This approach DEPENDS on instrumentation and visibility into all of these areas. This approach DEPENDS on your ability to generate unique events, metrics, KPIs, etc. that can be directly associated to these areas. This approach gives you the most flexibility to tie discreet components into the specific business service and application areas that thy may most directly support, enable or impact.

Focus on simple templates for the COTS application/database or custom composite application PLUS simple templates for the supporting infrastructure server(s), application/database core components and marry them together based on the behaviors of the application as it relates to the underlying infrastructure components. This is the key here. This is the only way to manage each uniquely and as a whole based on how the overall architecture was designed and implemented. If you can get this right, you will be able to achieve the powerful business service and application models needed for Business Service Management plus the ability to put the right information into the hands of the operations and support staff so they can identify and resolve the problems faster.

Upcoming TBSM Design Patterns will focus on the service, functional/sub-service, and process/transactional design patterns.

**NOTE to Readers: Would examples of these within TBSM or Visio be helpful?**

January 14, 2008   9 Comments

DevCamp Tivoli - Collaborative Development of End-to-End BSM Solutions

Business Service Management (BSM) requires some level of visibility and insight into the core networks, systems, applications, transactions and processes happening across the IT environment. This visibility and insight requires some contextual understanding of how those things support and enable the key business services, applications, transactions, processes and activities that are critical to the business meeting their goals and objectives. The more emphasis on this contextual understanding we can establish directly from the source systems and applications, the easier and more efficient that operations, event and business service management can be in upstream solutions.

My findings and overall assumption is that most fundamental Tivoli monitoring is implemented in such a way that it’s only enabling the SME groups (SysAdmins, EOC/NOC, etc.) to identify, triage and resolve low level problems. I posed a series of questions to the only two Tivoli Monitoring gurus that I know about to try and gauge what could be done to better equip Tivoli Monitoring clients to implement fundamental system HW/OS and application/database monitoring so that it enables a client to implement true BSM solutions upstream. My intent from this dialog was to start a new series of blog postings called “The Top 10 Things and ITM Client Can/Should Do to Enable BSM and How to Implement Them”.

John “The Uber ITM Guru” Willis bit and we had breakfast to discuss. John’s got a lot of great ideas from what can and should be done from the ITM perspective. He mentioned a few of his clients that really get it and what they’ve done in the past to get there. We talked about the realities of client deployments today, politics, keeping up with constant changes and releases in the products and IT environment. Apparently the game really changed from ITM 5.x to ITM 6.x and things really need to be thought of in a different way making use of the Universal Agent. John’s answers continued to amaze me because of the level of effort it sounded like to do something as simple as this.

I kept coming back to a couple simple scenarios:

  • How can I get something as simple as the the operating system name/version, server location, datacenter rack/row embedded in every event coming from an ITM agent?
  • How can I get the business service/application that this server/application/database supports embedded in every event coming from an ITM agent?

We’d been discussing collaboration in the community via wiki’s, blogs, mailing lists, etc. for some time now. We landed on the idea of a scenarios based collaboration event focused on how one could solve real world problems using Tivoli products within the Business Service Management space. Something straight from the experts and practitioners out there. Something that shows what can/should be done from end-to-end using ITM 6.2 (and its dependencies) and TBSM 4.1.1 (and its dependencies) to create real world BSM solutions that any Tivoli client could implement.

Introducing DevCamp Tivoli. Our thoughts are that we’d meet before the annual Tivoli Technical User Conference (TTUC) called IBM Pulse next year. The conference next year is planned for May 18-22 so we’re targeting having this DevCampTivoli on Saturday May 17th. We’re betting on SME’s and Practitioners being able to fly in early for the conference they may already be attending and being able to participate in this event. Whatever the outcome of the DevCampTivoli is, we’d like to present that during a BoF session during the conference and on the OPAL site for everyone’s benefit. Listen to my first podcast ever on this topic with John here. Read over John’s blog posting announcing the event here. Visit the DevCampTivoli website and sign up!

More to come on this as we noodle through the concepts. Visit the site and sign up if you’d like to help out. We’re certainly interested in your input towards scenarios and development approaches within ITM 6.2, Tivoli EIF Probe, Netcool/OMNIbus and TBSM 4.1.1.

November 27, 2007   3 Comments

What You Need to Know on Tivoli Business Service Manager: TBSM Design Patterns Pt. 2

A pure Architectural Model (Infrastructure) isn’t sustainable if your goal is to implement true value oriented business service management solutions. One of the key things that are missing from this design pattern is the contextual information required to understand the impact any of the technology buckets has on other technology buckets, services, applications, IT or the business. We also don’t have any understanding of the role the technology buckets or their contents play in service delivery. In this design pattern, one server is just as important and “faceless” as another. We simply don’t know that when a server turns red if it’s the most critical server in the ERP application or if it’s just a backup file server.

Maturing from the Architecture Model (Infrastructure) requires a few things. First, we MUST ensure that we have a solid foundation in the basics of infrastructure management and monitoring. We must know when an infrastructure component is up, down, performing as expected or not, at capacity or over capacity, operating with a fault or error condition, etc. This is fundamental “monitoring 101″ here. If it blinks green in the datacenter or somewhere in the network, branch office, or elsewhere, you need to know the fundamentals. To mature beyond this technology bucketing approach, the separation of function and purpose from the core component must happen. The application or other component installed upon a hardware/operating system platform must be equally as visible as is the hardware and operating system. If you can’t do this, you may want to think about your overall strategy, roadmap and approach towards business service management.

*soapbox*

This is something that you’ve got to define within your company and religiously ensure that your standards and expectations are followed. It continues to amaze me that most clients still struggle in this area. My guess is that better than 75% of clients I’ve worked with or heard about in the past few years don’t have a solid handle on these “monitoring 101″ fundamentals across their environments. I think this is partly due to the constant change with technology but we as vendors have equal if not more blame here with our constant updates, new releases, new acquisitions, etc. This does more harm than anything as the monitoring tool group’s are scarcely staffed these days and are often seen as low in the “value chain”. Simply put, most vendors don’t make it easy for our clients to get ahead and stay ahead of the other changes in the datacenter.

*soapbox*

As mentioned in the first design patterns post, most clients are starting out with models using broad based rules that match any event by node name. They may create a technology bucket consisting of “Application Servers” and have hardware/operating system and application server events (and others) all contributing to the status of that instance. A better approach to this is to have a template for the hardware/operating system (by type, version, etc.) and a template for the application server (by type, version, etc.). The application server “depends on” the hardware/operating system and is modeled using child dependency rules or other “loose coupling” concepts. I now have immediate separation, understand the relationships and can easily create separate service models, dashboards and scorecards for the hardware/operating system group and application server group as needed (horizontal service model) in addition to the traditional (top down/bottom up service model). SLAs, reporting, root cause, right click tools and integrations are other things much easier to do with this approach.

Each template contains the SPECIFIC rules that MOST DIRECTLY indicate impact or potential impact to the hardware/operating system or application/component installed onto the hardware/operating system. This is where a direct working relationship with the hardware/operating system and application/component SME’s is most beneficial for modeling behaviors on templates. Take the time to document what instrumentation is available from the monitoring tools group and the hardware/operating system and application/component deployment/support groups. I can almost guarantee that they have their own instrumentation and monitoring tools in place. Know what events you have access to, what causes them, what clears them and what they really impact under all scenarios (normal operation, low load, high load, when dependencies (direct/indirect) have problems, etc.). Leverage the instrumentation and events that would be the ones that the SME’s recommend and get woken up in the middle of the night for. Don’t use instrumentation or events such as disk space or other utilization events UNLESS you’re sure it will impact the operation of the application or component installed on the hardware/operating system platform. Iterate and test here until you get it right. Controlling false positives and false negatives begins here!

Using separate templates may not be something you see the need for at this time. The benefits of this approach will be seen as I discuss the additional TBSM Design Patterns. Not to leave you hanging on forever, this design approach enables the greatest flexibility and openness for the future. It will enable “loosely coupled” service models and a high degree of reuse required for modeling end-to-end services and complex composite applications. It will help you to avoid “the big ugly all encompassing model” that many clients struggle with today that’s always red and provides little value. It will allow you to accurately model virtually anything within your IT and business environment in the most realistic manner possible. You should start thinking about how to develop these building blocks that will give you a much finer level of visibility and control in building end-to-end service models for business service management down the road.

Up next, Architectural Models for COTS Applications/Databases and Custom Composite Applications.

October 31, 2007   1 Comment