Tivoli Business Service Manager (TBSM) Support Tip of the Week #11

by doug on May 8, 2009

in Best Practices, BSM, Business Service Management, IBM, Implementation, TBSM, Tivoli, Uncategorized, Usability, User Experience, Value

Performance, scalability, user experience, responsiveness, quality of experience, etc. are often a challenge for many modern web or Java based applications. There are countless resources available on the Internet that talk about design, deployment, tuning, profiling, testing, monitoring and managing each of these areas. What it all boils down to is anything and everything that the end user (or administrator) deals with that either adds to a pleasant user experience or a painful one.

It’s easy to jump out and point the finger at the TBSM software as the culprit for poor user experience. I’d like to point out a few things that you should keep in mind as you go through the motions of diagnosing potential performance, usability or other similar issues with TBSM. I’ve seen the gammut of performance, scalability and user experience issues since the Netcool/RAD 2.0 days. I’ve seen tremendous improvements made, especially since the acquisition by IBM Tivoli. There’s a significant focus on TBSM performance and scalability with each major release having formal performance goals set prior to development. These become the basis for performance and scalability testing completed by a dedicated team for each release.

Theoretical use cases and testing only goes so far unfortunately. Verification and test teams do their best to make sure everything works in accordance to their scenarios and use cases. The performance and testing teams strongly urge you to submit information that can be used to improve the performance testing use cases so that they’re more closely aligned with how TBSM is used in the field. Contact me if you’re interested in contributing here.

The bottom line comes down to this. If you do not establish expected performance and user experience baselines and targets, you’ll pull your hair out and never be happy with your investments or worse your end users won’t be. Document and test frequently the performance and user experience to ensure that it meets the needs of those who will use TBSM. Put yourself in their shoes, their location and use their laptop/desktop to experience things as they would. As part of your development, test and release to production process, run through your established and documented tests. Capture the timings of your established and documented “flows” and renderings. Don’t guess at performance or user experience. Measure it, then manage it!

I’m very fond of the efforts of the Apdex initiative. Visit their site and review the documents to see if this makes sense in your environment. Also keep an eye out for a new book from O’Reilly called “Complete Web Monitoring” (WWW). I was a reviewer for this book and it’s got a lot of great content!

We’ve recently published our formal performance tuning recommendations for TBSM v4.2 on the TBSM developerWorks Wiki. The formal TBSM Performance Report is available under NDA from your IBM Tivoli account team. The TBSM v42 manuals provide insight and guidance as well here.

That said, I focus in on the following areas when I see or hear of performance, scalability and user experience issues with TBSM. If you suspect a performance or user experience issue with your TBSM environment, I’d suggest capturing some of these things to send along with your PMR.

Architecture and Design

What is the fundamental deployed architecture for TBSM? What options have been deployed? How are things deployed in terms of software configuration, high availability, load balancing, security, integrations, etc.? Are multiple products installed on the same servers?
Are you using a front end server load balancer? Is it configured and optimized to direct users to the best server?
What has been designed/implemented within TBSM? What features and capabilities are being used? What are the intended design outcomes and expectations?
How does TBSM fit into the overall network, security and application architectures?
What security or access controls are enforced for the administration and user access of TBSM?
Is the network connectivity between the administrator/user segments and TBSM optimized?
A sound architecture doesn’t make up for a poorly designed solution within the software.

Systems Build and Provisioning

What are the core system resources provisioned per server? How much CPU, Memory, Local Disk, Other Disk, Network?
What is the operating system in use? What fixpacks, service packs, updates, etc. have been applied?
Is TBSM deployed into a virtualized systems environment? Is it supported? Is it optimized? Are the “VM Police” enforcing resource controls?
Are high performance systems and network components being used? (Disk, Network, Memory, CPU, Bus, etc.)
Is DNS working well within your environment? How do you measure and manage DNS performance?

Core TBSM Installation, Integrations and Tuning

Have you tuned the core TBSM and TIP JVM environment?
Are the data sources TBSM is integrating with (and their network, systems and applications) optimized for the queries and data exchanges you’re doing via TBSM data sources, data fetchers, events, web services, XML, TCR ODBCs, etc?
Have you optimized the TBSM – TADDM integration? Are you filtering out what you don’t need?
Are the reports, charts or graphs you’ve developed optimized? Are the SQL queries as efficient as possible? Have you optimized the databases/datasources you’re querying (views, etc.)?
Are TCR reports scheduled to run during off-peak hours?
Are you managing events in the ObjectServer? Are they deleting? Do you have timeouts or “reaper” automations where needed?
Are you integrated with an external authentication repository (LDAP, AD, OMNIbus)? Are authentication lookups and responses performing as expected? How do you know?

Netcool/OMNIbus Configurations and Tuning

Are you managing events in the ObjectServer? Do you have the events that you *really* need? Are you deleting what you don’t need?
Are you using custom triggers/automations? Are they operating efficiently and achieving the desired results?
Have you implemented custom granularity settings? Are they operating as expected and are you achieving the desired results?

Netcool/WebTop Configurations and Tuning

How many concurrent users are interacting with Netcool/WebTop components?
Has Netcool/WebTop’s server runtime JVM been optimized?
Do you have complex pages being viewed by users? How many map, AEL, LEL, TableView portlets do you have on the page?
Are you using efficient filters and views? Are they simple or complex?
Are you refreshing portlets efficiently?
Are you using restriction filters in WebTop?

TBSM Solution Development

Do you have a solid template modeling standard?
Are you making efficient use of template rules?
Are you matching events, metrics, KPIs, etc. in the most efficient way? Are your datasources (events, etc) optimized to support your desired template rules?
How frequently are you collecting data via your data sources, data fetchers, event readers, etc?
Is caching being used appropriately?
Have you optimized the refresh intervals within TBSM?
Do you *really* need every single device in your environment in TBSM?
Are you applying segmentation and grouping approaches so you do not have *lots* of instances at the same level?
Do you really need to show the entire service tree everytime when they log in?
Have you created optimized view definitions that control the levels up/down to what’s really needed?
Are you launching the *right* number of pages within your view? Do you need to launch them all?
Do you launch the *right* view when logging in? Have you considered launching something different (“lighter”) and then allowing the end user to manually launch more detailed views later?
Do you *really* want the default TBSM look and feel?
Do you *really* need everything in your IT environment in TBSM? Do you *really* know what the most important things are? How about focusing here?
What’s the right number and mixture of portlets on a page?
How many simultaneous users are logging into TBSM?
How many simultaneous users are logging into TBSM and launching the same view, interacting with the same page, portlet, chart, report, etc.?

Potential TBSM Performance and Scalability Impacting Dimensions

# Simultaneously Logged in and Active Users
# Pages Simultaneously Launched in a View (directly related to above)
# and types of Portlets on Pages (directly related to above)
Datafetcher use, quantity, efficiency
Quality of event structure
Autopopulation rule use, quantity, efficiency
ESDA rule use, quantity, efficiency
Status rule use, quantity, efficiency
Inefficient use of the CLASS discriminator in status rules (related to event normalization, rules above)
Quantity of events in OMNIbus
# and complexity of TIP Charts (number in use, complexity)
# and complexity of TBSM Charts (number in use, complexity)
# and complexity of TCR Reports (number in use, complexity)
Custom canvas use (number in use, complexity)
Service Tree Scorecard use (number of columns, quantity, custom policy)
Use of internal policies associated with any type of rule, visualization control, etc. (inefficient IPL use)

End User Workstation/Laptop Build and Provisioning

What is the standard build for end user workstations or laptops? How much CPU, Memory, Disk?
What’s the network connection? WiFi, 10M/100M/1000Mb Ethernet?
Are end user environments cluttered with multiple programs running at the same time? (especially Java based programs)
Are firewall, virus scanner or other security, access control or content screening programs running?
Are remote access programs such as Citrix, Remote Desktop, Terminal Services, X, VNC, etc. used to access TBSM?

End User Browser Version and Tuning

Have you determined the ideal browser type (IE, FF) and version to use?
Have you determined the ideal JRE type (Sun, IBM) and version to use?
Have you determined the ideal browser + JRE + Workstation/Laptop combination to use?
Have you tuned the end user JRE in accordance with the TBSM recommendations?
Have you established a baseline for end user performance, for each key use case or scenario, from each key end user environment and location?

End Users

Do you teach your end users how they should use the developed solutions in TBSM?
Do you teach your end users what the expected performance and user experience should be?
Do you teach your end users how TBSM really works?
Do you teach your end users when things turn red, yellow, green, update, change, refresh, etc?
If you don’t, do you think they are going to form the right expectations???

Resources

There are a number of “must have” tools that the typical TBSM administrator should have at their disposal when investigating TBSM performance, scalability and user experience concerns. Be sure to obtain the appropriate approvals to install and use these tools. They may not be appropriate for your use, but they do provide tremendous insight. Request waivers and approvals for them as needed!

Microsoft Support Professionals Toolkit for Windows

The User Mode Process Dumper (userdump) dumps any running Win32 processes memory image on the fly, without attaching a debugger, or terminating target processes.

The Desktop Heap Monitor is a tool that examines usage of desktop heap.

Reclaim Memory by Mastering Windows’ Task Manager

Increase Firefox Speed and Decrease Firefox Memory Usage +20 Tips

Minimize Firefox Memory Usage

Network Performance Tools

Wireshark
Wireshark Tutorials

Firebug: Firebug integrates with Firefox to put a wealth of web development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page.

Firebug Lite (For IE and Safari)

Internet Explorer Developer Toolbar (Similar to Firebug for Firefox)

Fiddler is a Web Debugging Proxy which logs all HTTP(S) traffic between your computer and the Internet. Fiddler allows you to inspect all HTTP(S) traffic, set breakpoints, and “fiddle” with incoming or outgoing data. Fiddler includes a powerful event-based scripting subsystem, and can be extended using any .NET language. Fiddler is freeware and can debug traffic from virtually any application, including Internet Explorer, Mozilla Firefox, Opera, and thousands more.

neXpert is an add-on to Fiddler which aids in performance testing web applications. neXpert was created to reduce the time it takes to look for performance issues with Fiddler and to create a deliverable that can be used to educate development teams.

HttpWatch is an HTTP viewer and debugger that integrates with IE and Firefox to provide seamless HTTP and HTTPS monitoring without leaving the browser window.

External URL Testing Also consider use of ITCAM for Transactions v7!

Client-side Java Console Tracing Console, Trace Logging

Java Performance Resources: Java Passion, Java Performance Tuning, Glassbox

TIP, ICS and eWAS tools:

** I am currently investigating options for understanding performance within TIP, ISC and eWAS. These may / may not work, exist or be supported at this time. **

ITCAM for WAS

ITM Agent Configurations (MOSWOS) for TIP: How to build an ITM agent for monitoring TIP, ISC, eWAS?

Websphere – Tivoli Performance Viewer (TPV) – This is embedded in the v6.x + Administrative Console

This seems like an obvious choice here, I need to find out how to install it into the TIP, ISC, eWAS environment.

From TPV, you can view current activity and summary reports, or log Performance Monitoring Infrastructure (PMI) performance data. TPV provides a simple viewer for the performance data collected by the Performance Monitoring Infrastructure.

Typical metrics available include:

Average response time: Include statistics, for example, servlet or enterprise beans response time. Response time statistics indicate how much time is spent in various parts of WebSphere Application Server and might quickly indicate where the problem is (for example, the servlet or the enterprise beans).

Number of requests (transactions): Enables you to look at how much traffic is processed by WebSphere Application Server, helping you to determine the capacity that you have to manage. As the number of transactions increase, the response time of your system might be increasing, showing the need for more system resources or the need to retune your system to handle increased traffic.

Number of live HTTP sessions: The number of live HTTP sessions reflects the concurrent usage of your site. The more concurrent live sessions, the more memory is required. As the number of live sessions increase, you might adjust the session time-out values or the Java virtual machine (JVM) heap available.

Web server thread pools: Interpret the Web server thread pools, the Web container thread pools, and the Object Request Broker (ORB) thread pools, and the data source or connection pool size together. These thread pools might constrain performance due to their size. The thread pools setting can be too small or too large, therefore causing performance problems. Setting the thread pools too large impacts the amount of memory that is needed on a system or might cause too much work to flow downstream if downstream resources cannot handle a high influx of work. Setting thread pools too small might also cause bottlenecks if the downstream resource can handle an increase in workload.

The Web and Enterprise JavaBeans (EJB) thread pools

Database and connection pool size

Java virtual memory (JVM): Use the JVM metric to understand the JVM heap dynamics, including the frequency of garbage collection. This data can assist in setting the optimal heap size. In addition, use the metric to identify potential memory leaks.

CPU, I/O, System Paging: You must observe these system resources to ensure that you have enough system resources, for example, CPU, I/O, and paging, to handle the workload capacity.

TopRunner WebSphere Resource Analyzer (TRWRA)

TRWRA tool helps an SA to tune and identify application server’s performance problem running within WebSphere server. It can monitor multiple servers on a node or many servers on many nodes in a cluster.

What is TRWRA tool?

TRWRA tool is written 100% Java and using IBM WebSphere AdminClient (PMI API) version 6.1+ (required). This tool right now is only running on Linux or Unix. TRWRA tool can monitor and display resources

• JDBC Connection Pools
• JVM Runtime
• Servlet Session Manager
• Thread Pools
• –>Default
• –>HAManager.thread.pool
• –>Message Listener
• –>Object Request Broker
• –>Process Discovery
• –>SOAPConnectorThreadPool
• –>TCPChannelDCS
• –>WebContainer
• Web Applications
• Transaction Manager

on a window terminal (just like a top command) and it also logs resources information in a file for later use or plot a graph. It can monitor resources on all servers on a node or all servers on all nodes within a cluster at the same time. If one of a server in a node or a cluster stop/crash/error during the monitoring session, this tool will tell you which server in which node has a problem.

Comments on this entry are closed.

sanil

Doug, what a fantastic and informative post. Thanks a million for this. It will help me debug a lot of issues in our current troublesome env and give some focussed direction on how to go about debugging TBSM related issues. The links on thrid party web tools is especially interesting!

Link
Wade Dobson

# Simultaneously Logged in and Active Users ~ I have concerns about the number of active users, I have attempted to find the documentation that tells me how to limit them. But for redhat 5, all I am seeing is mention of setting the Ulimit to limit the amount of threads any one process or user can have at one time. I thought there was a way to actually tell TBSM how many active sessions you want to allow?

Link

Next post: Interesting Links for May 8th

Previous post: Interesting Links for May 6th