Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


WebSphere Application Server Monitoring

News

See also

Book
Reviews
Recommended Links

Introductions and Tutorials

IBM Documentation

Monitoring

Humor Etc
 

There are three main dimensions of Websphere monitoring: 

  1. Monitoring end user response time Monitoring end user response time is an external perspective of how the overall Web site performs from an end user view and identifies how long the response time is for an end user. From this perspective, it is important to understand the load and response time on your site. To monitor at this level, many industry monitoring tools, for example, Tivoli Monitoring for Transaction Performance, support you to inject and monitor synthetic transactions, helping you identify when your Web site experiences a problem.
     
  2. Monitoring overall system health Monitoring overall system health is of fundamental importance to understand the health of every system involved that includes Web servers, application servers, databases, back end systems, and any other systems critical to running your Web site. If any system has a problem, it might have a rippling effect and cause slow servlet response time. IBM and several other business partners leverage the WebSphere APIs to capture this kind of performance data and to incorporate this data into an overall 24-by-7 monitoring solution across multiple products. WebSphere Application Server provides Performance Monitoring Infrastructure (PMI) data to help monitor the overall health of the WebSphere Application Server environment. PMI provides average statistics on WebSphere Application Server resources, application resources, and system metrics. Many statistics are available in WebSphere Application Server, and you might want to understand the ones that most directly measure your site to detect problems
     
  3. Monitoring application flow This topic gives you a basic strategy for monitoring with an understanding of the application view. This information provides an understanding of how the application flow satisfies the end user request.

JVM heap usage

The JVM™ heap size allocated for the application server should be tuned and the heap usage should be continuously monitored. Verbose Garbage Collection (GC) tracing can be enabled for the application server without any significant overhead. It is a good practice to keep this on for monitoring purposes. The GC overhead should not be more than 5%; if it does go above this amount, the application should be reviewed.

If the JVM heap runs out of memory, the verbose GC traces and JVM heap dumps should be analyzed. There are number of tools that can be used to analyze the JVM Heap, and these are listed in more detail in chapter 4, "Diagnostic tools for WebSphere for z/OS", in WebSphere for z/OS Problem Determination Means and Tools, REDP-6880. Depending on the results of the analysis, the problem may or may not be application related.

High CPU utilization

The CPU utilization of the WebSphere Application Server address space may be high due to some kind of looping application code or other reason and this could slow down the rest of the work being processed by the server. System monitors and RMF information should be used to identify the source. Otherwise, profiling tools like Eclipse TPTP can pinpoint the problem to specific applications and their components. If this does not reveal the cause of the problem, an MVS™ console dump of the address space experiencing the high CPU activity should be taken for analysis. It is important to maximize the MVS systrace before taking the console dump.

High response time

The average response time of the application may not meet your goals. If this is the case, you should try to isolate which particular application request or requests are experiencing the delay. You should try to determine whether the delay is in the application request processing or the server processing. If it is determined that it is in the application, you would need to know how much time is spent in each phase of the particular request. There are application profiling tools that can be used to break down the transaction to see how much time each method is taking and this can be reviewed with the application developer to address the response time problem.

If the delay is in getting the application requests picked up by the WebSphere Application Server servant, then the WebSphere Application Server WLM goals should also be reviewed using the RMF Workload Activity reports. The number of threads per servant and number of servants may also be a contributing factor depending on workload. If the number of users of the application drastically increased and the response time went up, it could be that the application needs more resources to handle the new workload.

In WebSphere Application Server for z/OS, the default action is to ABEND the servant address space when a timeout occurs for an application request. The ABEND code is EC3 with a reason code of the form 0413000x. The server automatically takes an SVCDUMP in this case and the dump can be analyzed to determine which thread timed out and what it was doing when the timeout occurred. It also can give information about how long the request was processing in the servant and how long it took to get to the servant from the controller. If it spent most of its time waiting in the WLM queue to be picked up by the servant, then WLM goals should be looked at. If it spent all the time being processed in the servant, then that gives details on what the application is doing and can be further investigated using application profiling tools. Either way, it gives a clear indication of where the time was spent.  

Monitoring overall system health

Monitoring overall system health is fundamentally important to understanding the health of every system involved with your system. This includes Web servers, application servers, databases, back-end systems, and any other systems critical to running your Web site.

Before you begin

If any system has a problem, it might cause the servlet is slow message to appear. IBM and several other business partners leverage the WebSphere APIs to capture performance data and to incorporate it into an overall 24-by-7 monitoring solution. WebSphere Application Server provides Performance Monitoring Infrastructure (PMI) data to help monitor the overall health of the WebSphere Application Server environment. PMI provides average statistics on WebSphere Application Server resources, application resources, and system metrics. Many statistics are available in WebSphere Application Server, and you might want to understand the ones that most directly measure your site's resources to detect problems.

About this task

To monitor overall system health, monitor the following statistics at a minimum:

Average response time Include statistics, for example, servlet or enterprise beans response time. Response time statistics indicate how much time is spent in various parts of WebSphere Application Server and might quickly indicate where the problem is (for example, the servlet or the enterprise beans).

Number of requests (transactions) Enables you to look at how much traffic is processed by WebSphere Application Server, helping you to determine the capacity that you have to manage. As the number of transactions increase, the response time of your system might be increasing, showing the need for more system resources or the need to retune your system to handle increased traffic.

Number of live HTTP sessions The number of live HTTP sessions reflects the concurrent usage of your site. The more concurrent live sessions, the more memory is required. As the number of live sessions increase, you might adjust the session time-out values or the Java virtual machine (JVM) heap available.

Web server thread pools Interpret the Web server thread pools, the Web container thread pools, and the Object Request Broker (ORB) thread pools, and the data source or connection pool size together. These thread pools might constrain performance due to their size. The thread pools setting can be too small or too large, therefore causing performance problems. Setting the thread pools too large impacts the amount of memory that is needed on a system or might cause too much work to flow downstream if downstream resources cannot handle a high influx of work. Setting thread pools too small might also cause bottlenecks if the downstream resource can handle an increase in workload. The Web and Enterprise JavaBeans (EJB) thread pools Database and connection pool size Java virtual memory (JVM) Use the JVM metric to understand the JVM heap dynamics, including the frequency of garbage collection. This data can assist in setting the optimal heap size. In addition, use the metric to identify potential memory leaks.

CPU You must observe these system resources to ensure that you have enough system resources, for example, CPU, I/O, and paging, to handle the workload capacity. I/O System paging

o monitor several of these statistics, WebSphere Application Server provides the Performance Monitoring Infrastructure to obtain the data, and provides the Tivoli Performance Viewer (TPV) in the administrative console to view this data.

Procedure

  1. Enable PMI through the administrative console to begin data collection.
  2. Use TPV or third-party performance monitoring and management solutions to monitor performance.
  3. Extend monitoring capabilities by developing your own monitoring applications or extending PMI.
The Performance Monitoring Infrastructure (PMI) uses a client-server architecture.

The server collects performance data from various WebSphere Application Server components. A client retrieves performance data from one or more servers and processes the data. WebSphere Application Server, Version 6 supports the JavaTM 2 Platform, Enterprise Edition (J2EE) Management Reference Implementation (JSR-77).

In WebSphere Application Server, Version 6 and later, PMI counters are enabled, based on a monitoring or instrumentation level. The levels are None, Basic, Extended, All, and Custom. These levels are specified in the PMI module XML file. Enabling the module at a given level includes all the counters at the given level plus counters from levels below the given level. So, enabling the module at the extended level enables all the counters at that level plus all the basic level counters as well.

JSR-077 defines a set of statistics for J2EE components as part of the StatisticProvider interface. The PMI monitoring level of Basic includes all of the JSR-077 specified statistics. PMI is set to monitor at a Basic level by default.

As shown in the following figure, the server collects PMI data in memory. This data consists of counters such as servlet response time and data connection pool usage. The data points are then retrieved using a Web client, a Java client, or a Java Management Extensions (JMX) client. WebSphere Application Server contains Tivoli Performance Viewer, a Java client which displays and monitors performance data. See the Monitoring performance with Tivoli Performance Viewer (TPV) , Third-party performance monitoring and management solutions , and Developing your own monitoring applications topics for more information on monitoring tools.

The figure shows the overall PMI architecture. On the right side, the server updates and keeps PMI data in memory. The left side displays a Web client, a Java client, and a JMX client retrieving the performance data.

Old News ;-)

[Aug 2, 2007] OpenSMART - The Open (SystemSource) Monitoring and Reporting Tool

Extract from the documentation follows (OpenSmart has an excellent documentation !)

10.7. Monitoring WebSphere Application Server.

What should you configure in osagent.conf.xml to monitor your WebSphere application server / J2EE application ?

10.7.1. Checks for monitoring WebSphere application server

10.7.2. Example configuration for monitoring WebSphere

<!-- ... -->
<PROC>
  <PROCESS>
    <!-- check the node agent (listed as process in ps -ef) -->
    <PROCNAME>/usr/WebSphere/nodeagent/bin/java</PROCNAME>
    <ERRORLEVEL>ERROR</ERRORLEVEL>
    <DESCRIPTION>
      The NodeAgent of the application server isn't running.
      This is maybe a bad sign - but you can try to restart it by
      /usr/WebShere/Deployment/bin/startNode.sh
    </DESCRIPTION>
  </PROCESS>
  <!-- check the websphere jvm (listed as process in ps -ef) -->
    <PROCNAME>/usr/WebSphere/appsrv/bin/java</PROCNAME>
    <ERRORLEVEL>ERROR</ERRORLEVEL>
    <DESCRIPTION>
    WebSphere isn't running - look at your SystemOut.log / SystemErr.log
    and restart it - if possible
    </DESCRIPTION>
  </PROCESS>
</PROC>

<SOCKETS>
  <CHECK4SOCKET>
    <!-- Maybe you have do define the right interface definition for
    your system http port, https port, admin console, servlet container etc. -->
    <INTERFACE>0.0.0.0</INTERFACE>
    <PORT>9080</PORT>
    <ERRORLEVEL>FATAL</ERRORLEVEL>
    <DESCRIPTION>Socket of servlet container doen't exists - check logfiles</DESCRIPTION>
  </CHECK4SOCKET>
  <!-- More sockets you want to check -->
</SOCKETS>

<WEBAPP>
  <APP2CHECK>
    <!-- Test your servlet container if the response is correct, also report
the reponse time (this will be done by default) -->
    <APPNAME>Web-Application-1</APPNAME>
    <HOST>www.myownwebserver.de</HOST>
    <PORT>80</PORT>
    <HTTPRQST>GET / HTTP/1.0</HTTPRQST>
    <TEXT2CHECK>if you can read this, the servlet works correctly</TEXT2CHECK>
    <ERRORLEVEL>ERROR</ERRORLEVEL>
  </APP2CHECK>
  <!-- More webapps you want to check -->
</WEBAPP> 

<LOGS>
  <LOGFILE>
    <LOGFILENAME>/usr/WebSphere/appsrv/logs/SystemOut.log</LOGFILENAME>
    
    <!-- java exceptions -->
    <LOGFILTER>
      <REGEX>.*Exceptions in thread.*</REGEX>
      <ERRORLEVEL>ERROR</ERRORLEVEL>
    </LOGFILTER>

    <!-- user logged off the app isn't important -->
    <LOGFILTER><REGEX>user.*logged off.*</REGEX></LOGFILTER>
    <LOGFILTER><REGEX>user.*no longer active in.*</REGEX></LOGFILTER>

    <LOGFILTER>
      <REGEX>.*</REGEX><!-- Everything unknown -->
      <PRIORITY>1000</PRIORITY>
      <!-- That is AFTER all default priorities! -->
      <ERRORLEVEL>WARNING</ERRORLEVEL>
    </LOGFILTER>
  </LOGFILE>
  <!-- More logs you want to check -->
</LOGS>

 

Recommended Links

IBM - WebSphere Studio Application Monitor - Product Overview

Analyzing WebSphere Application Server Logs Using the WebSphere Studio Log Analysis Tool

Websphere portal webbook

IBM WebSphere® Application Server for Distributed Platforms and z/OS®: An Administrator's Guide

 


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February 28, 2008