thoughts on business, service and technology operations and management
Random header image... Refresh for more!

Category — Monitoring

OpenNMS Replacing and/or Complementing Netcool/OMNIbus & Impact

The weekly source for hot IT management news and gossip is the IT Management Podcast hosted by Cote’ of Redmonk and John M Willis of Zabovo. This week’s episode featured OpenNMS’s Tarus Balog.

Tarus dropped a few interesting tidbits throughout the conversation around Network Management about a couple very large IBM Tivoli Netcool clients that were moving from or complementing their existing architecture with OpenNMS. One was a large telecommunications company in Italy (Telecom Italia?) and another a very large mobile telecommunications company in Switzerland named Swisscom.

This led to some discussions around product scalability, licensing models, etc. Tarus didn’t have any specifics to share other than one requirement for OpenNMS to handle event storms of 2K-3K per second. He said they’re working through architecture approaches to ensure that their backend databases can continue to scale in ways similar to Netcool/OMNIbus’s in-memory database.

Tarus also mentioned capabilities in OpenNMS on par with what Netcool/Impact offers. I believe he called them Automations. It’d be neat to hear more on this and if they’ll have a library of data source interfaces/integrations similar to Netcool/Impact.

Everything that Tarus and the OpenNMS team does is ultimately driven back into the main code tree for all to take advantage of. The OpenNMS DevCamp kicks off in a week or two where the foundations for OpenNMS 2.0 will be worked on. This is taking place in my backyard down at GA Tech if I recall correctly.

Congrats to the OpenNMS team for your entrance into the telco space with these clients. I also really want to learn more about your Papa John’s deployment and if I heard glimpses of Business Service Management (BSM) there or if you were just using that as an example!

July 18, 2008   3 Comments

What are the Top 5 critical things a DBA should care about?

For an Oracle or DB2 database? Thoughts? Reasons why?

What about the manager of a DBA group or other higher level manager? What DBA specific information should be aggregated to present at this higher level?

June 17, 2008   No Comments

Does a “Proactive/Predictive” Tool make for a “Proactive/Predictive” Organization?

Just some rambling thoughts here…feel free to join in.

Is another tool what’s really required here? What should/could be done in domain specific resource monitoring solutions that addresses the problems at the edge? Should I really be monitoring everything that comes out of the box in a default configuration? Why do I have all of these profiles, situations, thresholds, events, etc. in the first place? Do I even now what I’m monitoring and why?

What if I have a multi-vendor, multi-sourced environment where I may or may not have visibility? What if I don’t have a CMDB or other source of topology, relationships and dependencies? What if I don’t even know the state and status of the applications, databases or services to begin with? What will I be able to do with investments into these technologies?

What if I have adopted a “manager of managers” concept where I have a consolidated operations eventing environment with feeds from across the entire business environment (facilities, plant, IT, datacenter, logistics, telephony, manufacturing, contact centers, etc.)? Shouldn’t this dynamic “learning” and “thresholding” concept be really applied at this level for some sort of “intelligent event management” free from manual intervention, policies, codebooks, etc? How about the context of the business calendar and schedule merged with the IT operations calendar and schedule? I doubt that this can all be “learned” magically.

If I invest in a BMC ProactiveNet, Netuitive or Integrien (or other fundamental dynamic “learning” or “trending” tool - my favorite was a company called Premonitia - now defunct, based on research from accoustic modelling of whales and shrimp IIRC), how will I recognize and measure the value from that investment? How should the operations environment change to adopt the promises of the “secret sauce” within these emerging technology areas? Will IT operations and second/third tier support teams need to change the ways they work today? If so, how? Does IT operations know how to respond to a future state that hasn’t occurred or someone stating that a service is “slow”? I think most operations and support teams are still in their infancy here.

I’m all for emerging technologies that speak towards making the lives of the folks on the front line better and for sensing, isolating and resolving issues within complex IT environments before they impact the business services, but will investing in these tools really improve the status quo within the typical operations environment? The Next Generation Operations Center, Command Center, Service Management Center or whatever we want to call it must be enabled with these types of technology, but also must prepared to think, operate and respond differently than they do today.

How are you changing? Will you change? Where’s your value proposition? Is it at the front line, second/third line of the support process, at the LoB? Is it about efficiencies in workflow? Do more, with less? Automation? Availability? Becoming proactive? Do you know the real root causes prompting your interests in this technology? What are your vendors doing about it? What is your monitoring tools group doing about it? Should they be doing something different?

Please share your thoughts on how best to operationalize and really recognize value from your investments into these technologies or what you’re doing to address the real root causes of the symptoms this technology addresses.

June 3, 2008   13 Comments

Oracle Enterprise Manager?

This sounds interesting, and they’re making some technology acquisitions to beef it up. Is anyone using it? Is this something that’d fit into the traditional EMS/NMS tools group portfolio or is it something that’d only be in the DBA group? My guess is the latter, which generally means it’s a “SNIP” or Stand Alone Non-Integrated Product and not contributing information, data, events, etc. into the broader operations management and monitoring ecosphere.

William?

-snip-

About Oracle Enterprise Manager 10g

Spanning applications, middleware, and database management, Oracle Enterprise Manager 10g delivers integrated enterprise management for Oracle and non-Oracle systems to more than 21,000 customers worldwide. Employing a unique “top-down” approach to managing application and IT infrastructure resources, Oracle Enterprise Manager allows customers to focus on what matters for its business — greater agility, better service quality and lower operational costs. With a broad set of administration, configuration management, provisioning, end-to-end monitoring, service level management, and security capabilities, Oracle Enterprise Manager 10g helps customers manage service levels, proactively isolate business exceptions before they become emergencies, and remediate issues at any level spanning application and heterogeneous infrastructure — all within one management solution. Learn more at http://www.oracle.com/enterprisemanager.

April 12, 2008   6 Comments

Video of my BarCampESM - Where’s the Beef? Presentation

Thanks to whurley, I’m now “online” in video format. The presentation is available here.

Let me know what you think. I’m I way out in left field here? Will an Open ESM player step up and “own” BSM in the SMB space or battle the “Big4″? I think it’s possible and it would certainly raise the bar for the incumbents!

January 28, 2008   4 Comments

WYNTK on TBSM: TBSM Design Pattern - Architectural Model (COTS and Custom Composite Applications)

The purpose of this TBSM Design Pattern is to create models representative of the key relationships and dependencies within COTS and Custom Composite applications. The implementation of this design pattern should provide a foundation for visibility into the applications by operations support teams, SMEs, developers and IT management. The level of detail expected in models of this type is generally abstracted from executives and the line of business in favor of higher level model design patterns.

One of the “holy grails” of Business Service Management is to have an understanding of the cause and effect relationships between the IT environment, the business services and applications provided and their actual use and/or performance to meet the businesses goals and objectives. Clients are beginning to deploy synthetic and real user monitoring solutions to help provide this insight. IT and the LoB are reporting weekly how well they’re performing and meeting business goals and objectives. This design pattern lays a foundation for being able to relate fine grained IT components to business service and application user experience and performance and the understanding of how one may impact the other. Without this foundation, we will continue to force operations and support teams to seek “the needle in the haystack” when the business reports problems within key business services and applications.

The first goal of modeling within this design pattern is to understand the architectural and implementation characteristics that applications, databases and custom composite applications can take. This is the nuts and bolts; the model that represents deployment onto the servers, the networks and into the datacenters. It’s how load balancers, virtualization, redundancy, clustering and high availability come into play in supporting applications, databases and custom composite applications. For example, an Oracle deployment may be in an Active-Passive, Active-Active (Cluster) or Oracle Grid architecture, deployed on one or more servers within one or more datacenters. It’s one thing to model within this area, but most clients struggle with knowing the state or status of the architecture. Clients must develop an approach for understanding what architecture components are active, online, in standby, load sharing, etc. through improved instrumentation.

The second and most important goal is to continue to improve the level of visibility and understanding of how these components operate, perform and ultimately their role in supporting business services and applications. How is the business service impacted when an application or database has a problem? What happens when one member of the database cluster fails? This doesn’t come from your vendor but from your SMEs, support groups, developers, and the business. The path to this point may be achieved with a simple model or require a fairly sophisticated model depending on your environment. Follow on TBSM design patterns will focus on interacting within this area.

The first challenge a client often has is that their monitoring tools group have not matured their fundamental management and monitoring to a point of having enough visibility into these areas. Clients continue to battle with keeping up with the basics of hardware and operating system changes, new versions, etc. Most clients have invested in some fundamental hardware and operating system monitoring capability or are making use of custom scripts, system logging or similar. They may have set up some basic logfile or process/daemon monitoring, but not yet invested into capabilities that enable deep visibility into applications. More often than not, it’s the SME groups (SysAdmins, Application Support or DBAs) that have the best visibility into these areas. Do not accept “they have their tools and we have our tools” as the answer. There is tremendous value in integrating SME tooling, scripts, etc. into the collective management and monitoring environment, even if it’s only for the purposes of driving these models.

More and more, COTS applications and databases include some capability for internal instrumentation and visibility into state, status, availability and performance characteristics. This is usually enabled through some additional configuration option, module, via an administrative console or a complete stand alone application (Oracle Enterprise Manager). Third party vendors and open source solutions may exist for management and monitoring of these applications or databases. Think agents for things like Exchange, SAP, DB2 or Oracle (IBM Tivoli, Quest Software, etc.) here or specialty applications that tend to fall into the advanced diagnostics and performance tuning areas (Quest Spotlight on Oracle).

Custom composite applications are going to expand the usage of COTS technology to include custom software, application servers, integration techniques (SOA, EAI, etc.), and other distributed and mainframe systems. The same approaches described above should be followed, with more emphasis on working with the various development and support SMEs.

Creating an accurate model is the easy part, bringing it to life requires data points, metrics or other information that can be used to determine state, status, performance, availability and impact. Modeling requires working with the various SME groups to fully understand what has been deployed, its operating characteristics and how it supports the business services and applications.

Once a good understanding has been achieved, the monitoring tools group will need to perform a gap analysis on what they have and don’t have to be able to represent the model in a production TBSM deployment. Do yourselves a favor here and partner up with the SME’s. Work through the politics and find a way to integrate their information and insight into the core management, monitoring and event collection solutions. Identify the gaps you have in your core tools and visibility, identify a plan to fill those gaps so your BSM solutions can provide the expected value.

I generally see a couple different approaches for building out these Architectural Models and they tend to play into the two scenarios above. Most often due to the weaknesses in fundamental instrumentation and monitoring, clients will simply create a template for the server and a template for the COTS application. Each template will generally include the expected “up/down” type status rules looking for incoming events. The COTS application template may have include a few more rules to cover more specifics about the application, but more often than not they are broad based status rules looking for process, daemon and log file messages.

There’s nothing wrong with this approach at all, but it’s a matter of getting to the fine grained visibility that’s difficult with this approach. Do you need to know that a specific transaction, process or table space out of dozens or hundreds is the cause and that it’s only impacting a portion of your business service or is knowing that you have an application or database problem good enough? My operations background says that it’s always best to help folks get to the specific problem as quickly as possible and to not generalize things.

The preferred approach is to break things out into more specific components of the architecture deployed. This approach DEPENDS on instrumentation and visibility into all of these areas. This approach DEPENDS on your ability to generate unique events, metrics, KPIs, etc. that can be directly associated to these areas. This approach gives you the most flexibility to tie discreet components into the specific business service and application areas that thy may most directly support, enable or impact.

Focus on simple templates for the COTS application/database or custom composite application PLUS simple templates for the supporting infrastructure server(s), application/database core components and marry them together based on the behaviors of the application as it relates to the underlying infrastructure components. This is the key here. This is the only way to manage each uniquely and as a whole based on how the overall architecture was designed and implemented. If you can get this right, you will be able to achieve the powerful business service and application models needed for Business Service Management plus the ability to put the right information into the hands of the operations and support staff so they can identify and resolve the problems faster.

Upcoming TBSM Design Patterns will focus on the service, functional/sub-service, and process/transactional design patterns.

**NOTE to Readers: Would examples of these within TBSM or Visio be helpful?**

January 14, 2008   7 Comments

Sending IBM Tivoli Monitoring Situation Events to Tivoli Netcool/Impact

Our third Netcool/Impact OPAL contribution!

Available here.

-snip-

This integration will send IBM Tivoli Monitoring Situation Events to Netcool/Impact for event enrichment, advanced data analysis and correlation, and for notifications and escalations.

The IBM Tivoli Monitoring Situation Events are sent to Netcool/Impact via Web Services. Upon reception of the event Netcool/Impact will run one or more Netcool/Impact policies. These policies make use of Netcool/Impacts wide array of Data Source Adapters and Netcool/Impact Policy Language to perform event enrichment, advanced data analysis, and to perform notifications and escalations.

After the policy is finished, Netcool/Impact can write the results back to the IBM Tivoli Monitoring Universal Message Console where the results can be viewed by support staff.

July 13, 2007   No Comments