thoughts on business, service and technology operations and management
Random header image... Refresh for more!

Category — Guest Authors

Guest SME Author Abbas Haider Ali - Burning questions: myCMDB

Abbas Haider Ali has recently joined the Managed Objects team as their new VP of Product Strategy. We’ve been communicating over the past few months and while he doesn’t bring “direct BSM” experience, his vision for shaping and guiding Managed Objects is likely to bring a fresh perspective to a well established company and to the overall BSM marketplace.

I’ve been after some technical practitioners at Managed Objects to participate as guest SME authors on my blog for some time now. They’ve recently launched their own blog, but I hope to still have someone from their team participate here. While Abbas’ initial post here is a bit more “sales/marketing” feeling than I prefer, he’s providing some good information here to “clear the air” on why the launch of the myCMDB application is unique.

Feel free to join in the conversation and ask Abbas and the Managed Objects team anything! I know they’ll respond!

* For a more interesting debate on the myCMDB topic, feel free to visit the ITSkeptic’s lively debate here or Charles Araujo’s post here.
———-

In late June, Managed Objects announced myCMDB and the reception to date has been far-reaching and overwhelmingly positive. We’ve also received a few follow up questions which we’ll do our best to answer here. Thanks for the opportunity, Doug!

What is exactly is myCMDB?

myCMDB is an application which incorporates structured social-networking and community principles to fulfill the promises of the CMDB to IT organizations:

• Improves CMDB data accuracy and accessibility – ensuring the CMDB provides a complete picture about the IT infrastructure
• Introduces next-generation in CMDB interaction – incorporating Web 2.0 principles for greater CMDB usability and personalization
• Delivers new analytics that provide better IT decision support
• Incorporates fast and intuitive search to make CMDB data retrieval easier
• Expands usage of the CMDB to a broad set of IT and business users
• Supports a complete range of control over how the CMDB is updated – from Wikipedia-style complete freedom – to approval based change control

Allow me to underscore the word application. myCMDB is designed to sit on top of an existing CMDB, be that our own product, CMDB360, a home grown CMDB or another product such as BMC’s Atrium.

Why did you build myCMDB?

Quite simply our customers – and those of our competitors – asked us to develop myCMDB. Okay, perhaps not in those exact words of course, but over the course of two years our research and experience has taught us that the biggest barriers to a successful CMDB implementation is the accuracy, currency, and usability of the data – especially across federated sources.

Federation allows IT organizations jumpstart their CMDB projects by taking advantage of the data that they have in areas such as asset management, discovery tools, dependency mapping solutions, element managers, etc. Managed Objects CMDB360 has been highly successful in the market because it delivers on these capabilities. Customers have been telling us that federating data isn’t enough and that it gives them a 70-80% solution – the rest is in people’s heads.

They key elements that are missing include an easy way to populate the missing information (which isn’t stored in any uniform fashion) such as relationships between IT elements, applications, and business services. In addition, the goal of a CMDB project is rarely to build a repository of information just for the sake of having it; rather it should deliver real value to all the parties who would contribute to it. Extracting and finding information in a CMDB typically requires deep understanding about its underlying structure and schema which is a real challenge in increasing adoption. We built myCMDB to address these challenges in the CMDB market.

Isn’t social-networking more about fun than work?

It seems like just yesterday enterprises were tackling the same question about instant messaging: businesses viewed IM as more pleasure than business. Today, I can hardly scope out IBM/Tivoli’s Web site without having a friendly customer service representative offering me assistance via chat. In the end, IM has earned a well-deserved reputation as a tool for connecting with customers and collaborating.

Managed Objects believes that the same is true for social-networking – but allow me the luxury of emphasizing the word “structure.” Structure can be likened to governance in that we clearly anticipate the need to set policies governing who has permission to view what data, or more importantly, to make changes to data.

We foresee the myCMDB world being divided into two broad camps: producers and consumers. At a high level, producers are the domain experts – the people that maintain the trusted federated sources, such as the network managers or application owners. Consumers might range from an executive audience viewing their service – to the change manager conducting an impact analysis – to the administrative assistant trying to get a definitive list of serial numbers for every laptop in his or her division without sending out another unit-wide e-mail with a spread sheet requesting an update.

How specifically will social-networking techniques be applied?

We’ve taken pages from the playbooks of several social-networking techniques and while I’ve described these below individually, in reality many of these are interwoven in myCMDB functionality along with the appropriate security and controls. Here are just a few examples:

Facebook: The community experience afforded by Facebook permits participants to create groups relevant to their area of responsibility. Communities in myCMDB can be based on function, such as database or networks or on geography such as the datacenter in Dallas.

LinkedIn: When a member of your LinkedIn network makes a change to their profile, you are automatically notified of that change. myCMDB has adopted this feature for notification and can be adapted for workflow or approval processes.

Wikipedia: Allowing experts to fill in, tag, correct, update, or delete information that they possess results in an overall complete article or in the case of the CMDB a configuration item (CI). Owners and authorized parties can update information while others can view or link to it.

Del-i-cio-us: Social bookmarking techniques enable a community of myCMDB users to share relevant information more quickly.

Google Finance: One powerful feature of Google Finance is the ability to correlate events – earning calls or scandal – over time to the price of a stock. We’ve applied this feature to CIs for a historical view that correlates changes to impact.

Why is Managed Objects uniquely positioned to deliver this solution?

To understand the significance of the myCMDB application, it’s important to understand how differently Managed Objects approaches CMDB projects with our CMDB360 product.

As noted in this Data Center Journal article, Managed Objects views CMDBs as a component of BSM and to that end, our customers and experience tells us a CMDB should not be a big hunky centralized database – but it should have a centralized service view. “Some CMDB products require IT to extract, transform and load (ETL) data from their existing tools into a centralized database but the challenge here is multi-faceted: 1) it’s expensive, 2) the minute you extract the data its accuracy becomes dated, and 3) the amount of data being jammed into a centralized repository quickly becomes unmanageable. A better solution is federated approach we advocated with application-interface (API) level integration that “points” to the original source in real time or near-real time.”

Managed Objects has built a reputation for being masters of integration over the last decade. Not just SNMP integration but rather bi-directional, API level integration with a powerful engine to reconcile, synchronize and automatically model nearly any federated source or existing IT management tool. This is a marked difference from our competitors and this agnostic approach, based on open and modern technology, provided us with an opportunity to embrace social networking techniques for the purposes of solving a real business problem. Managed Objects just might be the first company to do as much.


Abbas Haider Ali is the vice president of Product Strategy at Managed Objects.

myCMDB Historical Timeline

** An interesting note here that Managed Objects is using the killer MIT Simile Timeline code. I told the IBM Tivoli BSM PM’s that this was something they needed to include in our BSM products over a year ago - no response!

July 10, 2008   4 Comments

What Problems Can Real Time Analytics Solutions Solve?

In my last post, I discussed the issues facing IT Operations today that make real time analytics-based solutions a “must have”. In this post, I’d like to address more specifically the problems they can solve. Since I want to give enough detail on the problems and how real time analytics can help, I’ll spread this over a couple of posts.

“Too many alerts that don’t help me solve problems”

One of the biggest issues facing Operations teams is that when performance problems occur with mission critical applications, they have to expend a massive amount of manual effort to identify and repair them. They sift through the endless stream of alerts from their siloed monitoring solutions and try to humanly correlate them based on tribal knowledge. Much of this effort is because static threshold-based monitoring solutions give them alert storms for perfectly normal behavior and mask abnormalities that are the earliest precursors to problems.

Real time analytics-based solutions solve this by obviating the need for static thresholds. Instead, these solutions learn the normal behavior of every metric being collected. Armed with this understanding of normal, these solutions alert only to the abnormal behaviors that are the true precursors to problems. There is no longer a need to sift through alert storms and try to determine which alerts are relevant to the current problem and which are not. Sophisticated dynamic thresholding algorithms are used to learn the normal behavior down to the most granular level possible using clustering calculations. Algorithmic sophistication is critical as different metrics behave very differently and cannot be modeled from a single algorithm or assumed distribution. These solutions must also have mechanisms to handle seasonal events that may differ greatly from normal behavior, but do not indicate a real problem. Without this level of sophistication, large amounts of false positives result as was seen in early forays into dynamic thresholding that assumed normally distributed data (which IT metric data rarely is). My company’s solution, Integrien Alive, provides mechanisms to import large amounts of historical data very quickly and use it to calculate normal behavior immediately to remove the need for a learning period. Alive also performs topology-based rollup of alerts to provide a smaller total number of alerts with better context. The bottom line is that dynamic threshold-based alerting eliminates a ton of manual effort in problem solving.

“We don’t understand what leads to problems so we are always reactive”

Because the IT Operations team is relying on human correlation of alerts after a problem occurs, they are always in a reactive state. In some cases they may have tribal knowledge of the patterns of behavior of certain problems that gives them a heads up, however even in these cases it is most often too late to do anything before the problem occurs. Some Operations teams attempt to capture their tribal knowledge in correlation rules. These rules may help for awhile, however as the business or infrastructure changes they soon become obsolete, resulting in more manual effort to manage them. The problem is the sheer number of devices and metrics being managed in today’s infrastructures. There is no way a human can correlate hundreds of thousands (even millions) of metrics from tens of thousands of devices. This is another problem that real time analytics solutions are built to solve.

Armed with the knowledge of normal, these solutions can correlate previous alert and metric behaviors and predict future abnormal behaviors based on currently observed abnormalities. For example, the solution can alert to a problem in the application server tier based on the amount of devices in that tier that are performing abnormally. The Level 3 application expert receiving this alert is also provided with predictions of future abnormal behavior. In this case, the alert indicates that a key database performance indicator will be breached in 15 minutes or less with 86% probability, bringing down the database. The Level 3 application expert forwards this information to the DBA, who then makes a quick configuration change that avoids the database crash. This type of automated correlation allows a proactive approach to problem solving that isn’t possible with manual methods.

My company’s Integrien Alive solution takes correlation one step further, allowing users to set performance key indicators at the business, user experience, or IT infrastructure level. When these key indicators are breached, Alive captures a model of the building pattern of abnormalities that led to the problem, up to an hour before it occurred. These problem models (called Problem FingerprintsTM) focus troubleshooting efforts and reduce Mean Time To Identify (MTTI) and Mean Time To Repair (MTTR) the first time the problem occurs by indicating exactly which tiers of the application (and what specific metrics) are performing abnormally. Once these models have been captured, Alive can scan real-time metric data for a return of that pattern. If a problem pattern defined in one of the Problem Fingerprints is matched with high enough probability, Alive sends a predictive alert informing the Operations team of the looming problem, the probability the problem will occur, when it is likely to occur, what to look for and how it was solved previously.

As we’ve seen in this discussion, real time analytics solutions are all about:

  • large reductions in the manual effort associated with static threshold-based alerting
  • increased focus for troubleshooting efforts to reduce MTTI/MTTR
  • predictive alerting to allow proactive performance management

In my next post we’ll discuss additional problems solved by real time analytics. We’ll also delve into BSM and how these solutions are an essential catalyst to achieving it.

April 3, 2008   1 Comment

Why Real Time Analytics?

First of all, I’d like to thank Doug for inviting me to the conversation. I’m really looking forward to discussing how solutions such as Integrien’s will be an essential part of the future of Business Service Management. We’ll look at what problems these solutions address, how they do what they do and real life implementation and operationalization issues. Now to the topic at hand! Why Real Time Analytics? Why now? 

IT Operations executives are facing a dichotomous situation today. On one hand, their infrastructures are growing rapidly. I was recently talking with the Senior Director of Operations at a large social networking site and he told me that his server infrastructure (at the time 15K servers) was growing at 6% a week! They are also dealing with increasing complexity due to new technologies like virtualization and SOAs. Look at virtualization alone. While it can provide the tremendous cost benefits of server consolidation, it increases the complexity of the management problem considerably. You now have to deal with the hypervisor, virtual machines and guest operating systems as problem sources, in addition to the physical servers, O/S and applications. Lets not even get into the issue of dynamically moving VMs… 

On the other hand, Operations executives are being asked to reduce their budgets or at the very least keep them flat. Consider that historically, increasing infrastructure and complexity have been handled by throwing more bodies at it. Hmmm…, if 70% or more of the Operations budget is currently labor spend and you have to reduce or keep it flat, how can you scale to meet the needs of the business? 

There is an obvious parallel with what happened in manufacturing 30+ years ago with the advent of Total Quality Management (TQM) practices. Increasing complexity and scale in the manufacturing process was making it impossible to keep up with the quality and just-in-time delivery demands imposed by customers. Deterministic, rules-based methods of analyzing manufacturing lines that relied on trying to identify and measure every variable to stop problems from slipping by were no longer working. Some manufacturers (such as Toyota) started to take a different approach using advanced probability and statistics, collecting a subset of variables and looking at possible outcomes to get more proactive in approaching problems. The manufacturers that adopted these techniques gained a competitive advantage that persists to this day.

IT Operations is in a similar situation today. Alerting based on static monitoring thresholds and the collection of more metric data at faster intervals simply hasn’t provided a proactive approach to business service performance and availability issues. It also requires massive manual effort of IT staff to process the alert storms and perform manual or rule-based correlation to solve problems. Consider as well that in a system with thousands of servers and hundreds of thousands, even millions of metrics, the correlation problem is humanly unsolvable. Manual efforts can no longer scale in the face of increasing infrastructure and complexity.

That is why real time analytics-based solutions that leverage existing monitoring infrastructures are a necessity in today’s environment. I’ll go into the requirements for these solutions in a future post, however, the basic premise is to use advanced statistics and probability to learn normal system behaviors, only alert to abnormal behaviors that are true precursors to problems and perform advanced correlation techniques to predict future abnormalities and specific performance and availability problems. This new approach will be a competitive advantage for the companies who adopt it now, just as it was for the early adopters of similar approaches in manufacturing years earlier.

February 27, 2008   5 Comments

SME Guest Author: Steve Henning

One of the goals that I have for this blog is to complement my thoughts and views with other like minded people. I started this blog out over two years ago with an invitation to other folks internal to the IBM Tivoli Business Service Management (BSM) team to contribute via this blog but they never have been able to commit to the five or ten minutes to share their thoughts. I have always envisioned that the comments that folks leave would be a significant part of the “spirit” that this blog has. I think we’ve had some good conversations via the comment threads recently but I’m still always looking for more.

With that, I’m introducing what I hope will be the first of many SME Guest Authors for my blog. My goal here is simple, to enable others to share their thoughts on business, service and technology operations and management. I’ve laid out some “ground rules” for these SME Guest Authors and set them free. Free to discuss how emerging technologies and products align with the goals and objectives of Business Service Management. Free to talk about how practitioners can be truly successful. Free to offer practical implementation knowledge and insight. Free to make the sales and marketing slicks “come to life” and become something believable, implementable and manageable over the lifecycle (and free from the sales and marketing hype you all deal with weekly).

I’m introducing Steve Henning. Steve is currently the VP of Product for an exciting company called Integrien. While I do not know as much as I’d like to about their company, technology or products, I do know that what they bring to the market is something that could dramatically improve the status quo within the typical operations environment. I believe that having capabilities such as theirs are desperately needed within any maturing Business Service Management solution and will play a key role in the next generation of Business Service Management solutions in the future. I’ve invited Steve to share his thoughts and ideas on the technology and capabilities in this market segment. I expect that the conversations will be straight up, down to earth, transparent and honest and something that we can all learn from. Please welcome Steve to the conversation!

February 27, 2008   No Comments