Posts tagged as:

Usability

Performance, scalability, user experience, responsiveness, quality of experience, etc. are often a challenge for many modern web or Java based applications. There are countless resources available on the Internet that talk about design, deployment, tuning, profiling, testing, monitoring and managing each of these areas. What it all boils down to is anything and everything that the end user (or administrator) deals with that either adds to a pleasant user experience or a painful one.

It’s easy to jump out and point the finger at the TBSM software as the culprit for poor user experience. I’d like to point out a few things that you should keep in mind as you go through the motions of diagnosing potential performance, usability or other similar issues with TBSM. I’ve seen the gammut of performance, scalability and user experience issues since the Netcool/RAD 2.0 days. I’ve seen tremendous improvements made, especially since the acquisition by IBM Tivoli. There’s a significant focus on TBSM performance and scalability with each major release having formal performance goals set prior to development. These become the basis for performance and scalability testing completed by a dedicated team for each release.

Theoretical use cases and testing only goes so far unfortunately. Verification and test teams do their best to make sure everything works in accordance to their scenarios and use cases. The performance and testing teams strongly urge you to submit information that can be used to improve the performance testing use cases so that they’re more closely aligned with how TBSM is used in the field. Contact me if you’re interested in contributing here.

The bottom line comes down to this. If you do not establish expected performance and user experience baselines and targets, you’ll pull your hair out and never be happy with your investments or worse your end users won’t be. Document and test frequently the performance and user experience to ensure that it meets the needs of those who will use TBSM. Put yourself in their shoes, their location and use their laptop/desktop to experience things as they would. As part of your development, test and release to production process, run through your established and documented tests. Capture the timings of your established and documented “flows” and renderings. Don’t guess at performance or user experience. Measure it, then manage it!

I’m very fond of the efforts of the Apdex initiative. Visit their site and review the documents to see if this makes sense in your environment. Also keep an eye out for a new book from O’Reilly called “Complete Web Monitoring” (WWW). I was a reviewer for this book and it’s got a lot of great content!

We’ve recently published our formal performance tuning recommendations for TBSM v4.2 on the TBSM developerWorks Wiki. The formal TBSM Performance Report is available under NDA from your IBM Tivoli account team. The TBSM v42 manuals provide insight and guidance as well here.

That said, I focus in on the following areas when I see or hear of performance, scalability and user experience issues with TBSM. If you suspect a performance or user experience issue with your TBSM environment, I’d suggest capturing some of these things to send along with your PMR.

Architecture and Design

  • What is the fundamental deployed architecture for TBSM? What options have been deployed? How are things deployed in terms of software configuration, high availability, load balancing, security, integrations, etc.? Are multiple products installed on the same servers?
  • Are you using a front end server load balancer? Is it configured and optimized to direct users to the best server?
  • What has been designed/implemented within TBSM? What features and capabilities are being used? What are the intended design outcomes and expectations?
  • How does TBSM fit into the overall network, security and application architectures?
  • What security or access controls are enforced for the administration and user access of TBSM?
  • Is the network connectivity between the administrator/user segments and TBSM optimized?
  • A sound architecture doesn’t make up for a poorly designed solution within the software.

Systems Build and Provisioning

  • What are the core system resources provisioned per server? How much CPU, Memory, Local Disk, Other Disk, Network?
  • What is the operating system in use? What fixpacks, service packs, updates, etc. have been applied?
  • Is TBSM deployed into a virtualized systems environment? Is it supported? Is it optimized? Are the “VM Police” enforcing resource controls?
  • Are high performance systems and network components being used? (Disk, Network, Memory, CPU, Bus, etc.)
  • Is DNS working well within your environment? How do you measure and manage DNS performance?

Core TBSM Installation, Integrations and Tuning

  • Have you tuned the core TBSM and TIP JVM environment?
  • Are the data sources TBSM is integrating with (and their network, systems and applications) optimized for the queries and data exchanges you’re doing via TBSM data sources, data fetchers, events, web services, XML, TCR ODBCs, etc?
  • Have you optimized the TBSM – TADDM integration? Are you filtering out what you don’t need?
  • Are the reports, charts or graphs you’ve developed optimized? Are the SQL queries as efficient as possible? Have you optimized the databases/datasources you’re querying (views, etc.)?
  • Are TCR reports scheduled to run during off-peak hours?
  • Are you managing events in the ObjectServer? Are they deleting? Do you have timeouts or “reaper” automations where needed?
  • Are you integrated with an external authentication repository (LDAP, AD, OMNIbus)? Are authentication lookups and responses performing as expected? How do you know?

Netcool/OMNIbus Configurations and Tuning

  • Are you managing events in the ObjectServer? Do you have the events that you *really* need? Are you deleting what you don’t need?
  • Are you using custom triggers/automations? Are they operating efficiently and achieving the desired results?
  • Have you implemented custom granularity settings? Are they operating as expected and are you achieving the desired results?

Netcool/WebTop Configurations and Tuning

  • How many concurrent users are interacting with Netcool/WebTop components?
  • Has Netcool/WebTop’s server runtime JVM been optimized?
  • Do you have complex pages being viewed by users? How many map, AEL, LEL, TableView portlets do you have on the page?
  • Are you using efficient filters and views? Are they simple or complex?
  • Are you refreshing portlets efficiently?
  • Are you using restriction filters in WebTop?

TBSM Solution Development

  • Do you have a solid template modeling standard?
  • Are you making efficient use of template rules?
  • Are you matching events, metrics, KPIs, etc. in the most efficient way? Are your datasources (events, etc) optimized to support your desired template rules?
  • How frequently are you collecting data via your data sources, data fetchers, event readers, etc?
  • Is caching being used appropriately?
  • Have you optimized the refresh intervals within TBSM?
  • Do you *really* need every single device in your environment in TBSM?
  • Are you applying segmentation and grouping approaches so you do not have *lots* of instances at the same level?
  • Do you really need to show the entire service tree everytime when they log in?
  • Have you created optimized view definitions that control the levels up/down to what’s really needed?
  • Are you launching the *right* number of pages within your view? Do you need to launch them all?
  • Do you launch the *right* view when logging in? Have you considered launching something different (“lighter”) and then allowing the end user to manually launch more detailed views later?
  • Do you *really* want the default TBSM look and feel?
  • Do you *really* need everything in your IT environment in TBSM? Do you *really* know what the most important things are? How about focusing here?
  • What’s the right number and mixture of portlets on a page?
  • How many simultaneous users are logging into TBSM?
  • How many simultaneous users are logging into TBSM and launching the same view, interacting with the same page, portlet, chart, report, etc.?

Potential TBSM Performance and Scalability Impacting Dimensions

  • # Simultaneously Logged in and Active Users
  • # Pages Simultaneously Launched in a View (directly related to above)
  • # and types of Portlets on Pages (directly related to above)
  • Datafetcher use, quantity, efficiency
  • Quality of event structure
  • Autopopulation rule use, quantity, efficiency
  • ESDA rule use, quantity, efficiency
  • Status rule use, quantity, efficiency
  • Inefficient use of the CLASS discriminator in status rules (related to event normalization, rules above)
  • Quantity of events in OMNIbus
  • # and complexity of TIP Charts (number in use, complexity)
  • # and complexity of TBSM Charts (number in use, complexity)
  • # and complexity of TCR Reports (number in use, complexity)
  • Custom canvas use (number in use, complexity)
  • Service Tree Scorecard use (number of columns, quantity, custom policy)
  • Use of internal policies associated with any type of rule, visualization control, etc. (inefficient IPL use)

End User Workstation/Laptop Build and Provisioning

  • What is the standard build for end user workstations or laptops? How much CPU, Memory, Disk?
  • What’s the network connection? WiFi, 10M/100M/1000Mb Ethernet?
  • Are end user environments cluttered with multiple programs running at the same time? (especially Java based programs)
  • Are firewall, virus scanner or other security, access control or content screening programs running?
  • Are remote access programs such as Citrix, Remote Desktop, Terminal Services, X, VNC, etc. used to access TBSM?

End User Browser Version and Tuning

  • Have you determined the ideal browser type (IE, FF) and version to use?
  • Have you determined the ideal JRE type (Sun, IBM) and version to use?
  • Have you determined the ideal browser + JRE + Workstation/Laptop combination to use?
  • Have you tuned the end user JRE in accordance with the TBSM recommendations?
  • Have you established a baseline for end user performance, for each key use case or scenario, from each key end user environment and location?

End Users

  • Do you teach your end users how they should use the developed solutions in TBSM?
  • Do you teach your end users what the expected performance and user experience should be?
  • Do you teach your end users how TBSM really works?
  • Do you teach your end users when things turn red, yellow, green, update, change, refresh, etc?
  • If you don’t, do you think they are going to form the right expectations???

Resources

There are a number of “must have” tools that the typical TBSM administrator should have at their disposal when investigating TBSM performance, scalability and user experience concerns. Be sure to obtain the appropriate approvals to install and use these tools. They may not be appropriate for your use, but they do provide tremendous insight. Request waivers and approvals for them as needed!

Microsoft Support Professionals Toolkit for Windows

The User Mode Process Dumper (userdump) dumps any running Win32 processes memory image on the fly, without attaching a debugger, or terminating target processes.

The Desktop Heap Monitor is a tool that examines usage of desktop heap.

Reclaim Memory by Mastering Windows’ Task Manager

Increase Firefox Speed and Decrease Firefox Memory Usage +20 Tips

Minimize Firefox Memory Usage

Network Performance Tools

Wireshark
Wireshark Tutorials

Firebug: Firebug integrates with Firefox to put a wealth of web development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page.

Firebug Lite (For IE and Safari)

Internet Explorer Developer Toolbar (Similar to Firebug for Firefox)

Fiddler is a Web Debugging Proxy which logs all HTTP(S) traffic between your computer and the Internet. Fiddler allows you to inspect all HTTP(S) traffic, set breakpoints, and “fiddle” with incoming or outgoing data. Fiddler includes a powerful event-based scripting subsystem, and can be extended using any .NET language. Fiddler is freeware and can debug traffic from virtually any application, including Internet Explorer, Mozilla Firefox, Opera, and thousands more.

neXpert is an add-on to Fiddler which aids in performance testing web applications. neXpert was created to reduce the time it takes to look for performance issues with Fiddler and to create a deliverable that can be used to educate development teams.

HttpWatch is an HTTP viewer and debugger that integrates with IE and Firefox to provide seamless HTTP and HTTPS monitoring without leaving the browser window.

External URL Testing Also consider use of ITCAM for Transactions v7!

Client-side Java Console Tracing Console, Trace Logging

Java Performance Resources: Java Passion, Java Performance Tuning, Glassbox

TIP, ICS and eWAS tools:

** I am currently investigating options for understanding performance within TIP, ISC and eWAS. These may / may not work, exist or be supported at this time. **

ITCAM for WAS

ITM Agent Configurations (MOSWOS) for TIP: How to build an ITM agent for monitoring TIP, ISC, eWAS?

Websphere – Tivoli Performance Viewer (TPV) – This is embedded in the v6.x + Administrative Console

This seems like an obvious choice here, I need to find out how to install it into the TIP, ISC, eWAS environment.

From TPV, you can view current activity and summary reports, or log Performance Monitoring Infrastructure (PMI) performance data. TPV provides a simple viewer for the performance data collected by the Performance Monitoring Infrastructure.

Typical metrics available include:

Average response time: Include statistics, for example, servlet or enterprise beans response time. Response time statistics indicate how much time is spent in various parts of WebSphere Application Server and might quickly indicate where the problem is (for example, the servlet or the enterprise beans).

Number of requests (transactions): Enables you to look at how much traffic is processed by WebSphere Application Server, helping you to determine the capacity that you have to manage. As the number of transactions increase, the response time of your system might be increasing, showing the need for more system resources or the need to retune your system to handle increased traffic.

Number of live HTTP sessions:
The number of live HTTP sessions reflects the concurrent usage of your site. The more concurrent live sessions, the more memory is required. As the number of live sessions increase, you might adjust the session time-out values or the Java virtual machine (JVM) heap available.

Web server thread pools: Interpret the Web server thread pools, the Web container thread pools, and the Object Request Broker (ORB) thread pools, and the data source or connection pool size together. These thread pools might constrain performance due to their size. The thread pools setting can be too small or too large, therefore causing performance problems. Setting the thread pools too large impacts the amount of memory that is needed on a system or might cause too much work to flow downstream if downstream resources cannot handle a high influx of work. Setting thread pools too small might also cause bottlenecks if the downstream resource can handle an increase in workload.

The Web and Enterprise JavaBeans (EJB) thread pools

Database and connection pool size

Java virtual memory (JVM): Use the JVM metric to understand the JVM heap dynamics, including the frequency of garbage collection. This data can assist in setting the optimal heap size. In addition, use the metric to identify potential memory leaks.

CPU, I/O, System Paging: You must observe these system resources to ensure that you have enough system resources, for example, CPU, I/O, and paging, to handle the workload capacity.


TopRunner WebSphere Resource Analyzer (TRWRA)

TRWRA tool helps an SA to tune and identify application server’s performance problem running within WebSphere server. It can monitor multiple servers on a node or many servers on many nodes in a cluster.

What is TRWRA tool?

TRWRA tool is written 100% Java and using IBM WebSphere AdminClient (PMI API) version 6.1+ (required). This tool right now is only running on Linux or Unix. TRWRA tool can monitor and display resources

• JDBC Connection Pools
• JVM Runtime
• Servlet Session Manager
• Thread Pools
• –>Default
• –>HAManager.thread.pool
• –>Message Listener
• –>Object Request Broker
• –>Process Discovery
• –>SOAPConnectorThreadPool
• –>TCPChannelDCS
• –>WebContainer
• Web Applications
• Transaction Manager

on a window terminal (just like a top command) and it also logs resources information in a file for later use or plot a graph. It can monitor resources on all servers on a node or all servers on all nodes within a cluster at the same time. If one of a server in a node or a cluster stop/crash/error during the monitoring session, this tool will tell you which server in which node has a problem.

{ 4 comments }

Evaluating a BSM Solution: Measuring Effectiveness

by Robin Harwani on February 23, 2009

In my first post, we talked about what is wrong with current solutions followed by a post of sharing my experience of making BSM happen (realizing/implementing it). Then I side tracked for a post to share a really great research invention by folks at IBM and its relevance in BSM (Strategic Capability Network).

In this post, I intend to share insights from my experience of evaluating BSM/SQM for clients to gauge effectiveness, and performance of the solution .  I am sure most consultants on the ground might have encountered this situation when they were hired to evaluate someone else’s BSM solution and recommend changes to make it WORK!!

Measuring effectiveness of a BSM solution is not easily quantifiable as it involves multiple factors which are not just statistical but are also related to organization structure, architectural implications, rational behind decisions, culture, process, usability analysis and ecosystem of the company. Guess what, to do all the aforementioned -  I was given 4 weeks + 1 week for planning. The planning week was  the most challenging with debates on what factors/indicators to include and which ones to leave out. Eventually the following were the priorities: measure the usability, effectiveness, completeness (coverage) and accuracy.

After researching endlessly on how to accomplish this WE came to an agreement on using the following approach to measure holistic performance of the BSM solution:

Performance = Complexity Process *  Team *  Tools  [1]

Let us take these terms one at a time, I have explained these factors with an real examples and the lesson I learnt from these incidents:

>> Complexity:  Does an executive really care about memory on server displayed on executive dashboards?  Are the indicators really accurate and reliable? If yes, How much? These are some of the indicators which are measured very seldom.  Complexity is also driven by context and environment we deal with; for this we measured utilization, ease of information accessibility for stakeholders, number of influenced decisions/quarter, time to address issues (before vs. after) and some other subjective quantitative indicators.

Real incident: While evaluating  BSM built by this great Service Assurance team, we found the dashboards for a production support teams (of various silos’) had fault management metrics which made no sense to the users. Of Course, no one used it!! Only change we did to make this dashboard a hit was changing the metric terms and status aggregation pattern(auto-population logic and SLA rules). In this case, accuracy and reliability really contributed to the complexity to the users who were too skeptic about using an interface which did not even use the language they understood to check on the applications they supported.  This change was not a big development effort, it was only adapting to the environment and reducing the accidental complexity by streamlining the process of displaying domain driven language. 

Lesson learnt: Well defined processes will reduce the planned and accidental complexity; measure the effectiveness with the organizational awareness of how to use the solution.  

>> Team:  How much information is easily accessible to the stakeholder? Is every category of stakeholder considered in the solution? Does everyone think this “Dashboard” is of any value or Do they prefer some other medium to achieve the same objective? In all the above cases, we need to adapt to the environment and put forward a balanced approach.

Real incident: One enviornment where I was working on a solution Executives had imposed Netcool for an Operations team which was used to custom built tools and situation was that of a RIOT!! Users complained for months that Netcool did not show accurate information on device status which they used to get out of the old custom tool.  Everyone in the Service Assurance team shooed them away :) After talking to them, I realized that they had a point. The old tool used to report after pinging the server but also when the server came up, it would check for sysuptime and if the report if the server was unavailable due to power outage or some other reason. Poor users did not know the logic or the details behind the homemade tool.

Lesson learnt:If they (users) are using it, their is a valid reason, look for it!! Hammer will take you only so far. Balance personalization with layering and tiering the solution so that everyone gets what (information) they need, the way they need it, and when they need it. Most importantly, BSM is not about changing the organization 180 degrees, its about increasing productivity and reporting the information for making the best business decisions.

>> Tools are not only critical to task accomplishment but are also related to the overall organization productivity.  Caution: Imposition of tools is not BSM!! Personalization is the only way BSM can really be a successful offering. In my experience, implementations where a team selects what suits them the best and communicates information upstream to the enterprise instance have been much more successful and used.  Ample experiences are already out there for tools but the lesson that I learnt out of it was that, we should not look for silver bullets when evaluating tools. It is best left to the users as to which tool they are comfortable with.

>> And finally, Performance: Although some of my friends will argue that performance is not a holistic term; we took a objective approach rather than a subjective one to ensure that WE had statistics to back our results.  This helped us immensely!! 

All and all, evaluating a BSM was much more challenging than building it because of the merging/conflicting visions and principles followed while original implementation of the solution. I think underscores the need for standards and guidelines for BSM solutions. (Remember: Only when X.733 was put in, we knew how to define events in a standardized way. ) I am not lobbying for enforcement (via standards) but the Industry really needs at least some vendor neutral guidelines to retain the value, vision and capabilities  for Business Service Management Solution.

References:

[1]  Grady Booch has used the definition of performance in his famous speech at 9th Annul Turings Lecture  :

{ 2 comments }

Business Service Management Strategy Tip of the Week #4

Why should I have a Business Service Management (BSM) Strategy?
In our previous posts, we talked about how important it is to establish a BSM Strategy for long term BSM success.. We then talked about the first two core components of your BSM Strategy. The first component is a personal and intimate definition for what [...]

Read the full article →

TBSM v4.1.1 IF 007 Available

A new IF is available for TBSM v4.1.1 addressing a few new areas (don’t see mine in there!). This depends on IF 001 and supersedes IF 004, 005 and 006. IF 007 can be downloaded here.
These are the new issues addressed:

IZ15914 INCONSISTENT SERVICE NAME TRUNCATIONS IN SERVICE TREE
Service name truncation is not consistent [...]

Read the full article →

Customizing Tivoli Business Service Manager v4 and TADDM Integration

We have the start of a pretty good integration between TBSM v4 and TADDM, Tivoli’s Application Discovery and Dependency Mapping product. I say it’s a start because it has a looong way to go to get to where it needs to be for the typical client’s use. There are plenty of challenges with this [...]

Read the full article →

Tivoli Netcool/Impact 4.0 Performance Testing Guide

A new Tivoli Netcool/Impact contribution on OPAL detailing internal performance testing results and a guide to testing the performance of your Tivoli Netcool/Impact v4.0 server.
Available here.

Read the full article →

What You Need to Know (WYNTK) on Tivoli Business Service Manager (TBSM) 4.1: Services

Service Instances, Instances or just Services as they are now called are the unique instantiations of templates within TBSM 4.1. Services are the “living things” within TBSM 4.1 and enable us to create the unique relationships and linkages to the underlying data sources.
Building Services starts with naming. Just like I mentioned for your [...]

Read the full article →

Tivoli Business Service Manager (TBSM 4.1) Interim Fix 0002

Normally should have scrolled up on the TBSM 4.1 support site, they released a nice interim fix for TBSM 4.1 about a month ago.
Here’s a snip from the README. You can get the package here.
SERVICE TREE SORT
This fix provides the ability to sort the Service Tree in ascending or descending order. The user [...]

Read the full article →