An executing HPSS server should be monitored to ensure that it remains running to provide services to HPSS. SSM monitors each executable server and informs the SSM user of the server's abnormal termination or error conditions. The subsections in this section describe how to monitor a server's state and status and how to monitor a server's managed objects to diagnose and resolve problems.
As soon as a server is configured, SSM starts to monitor its execution status. If the server is up and running, SSM establishes a connection with the server. SSM reports the server execution and connection status in the Status field of the HPSS Servers window. The reported status will be one of the following:
CONNECTED . Server is up and running and communicating normally with SSM.
UP/UNCONNECTED . Server is up and running (according to the Startup Daemon) but SSM cannot connect to it. Server cannot be completely controlled and monitored through SSM.
DOWN . Server is down. SSM can be used to start up the server.
INDETERMINATE . The server's state cannot be determined by SSM and the Startup Daemon is either not running or not connected to SSM.
CHECK CONFIG . SSM detected an incomplete or inconsistent configuration for the server. The server's configuration should be carefully reviewed to see that it is correct and complete.
NOT EXECUTABLE . The server is configured as non-executable. (Note: the server could have been already running) SSM will not monitor the server's status.
In addition to the above status values, the Status field also reports the transient status for the server as the result of the user request on the server as follows:
STARTING... The server is being started.
STOPPING... The server is being shut down.
HALTING... The server is being halted.
REINITING... The server is reinitializing.
CONNECTING... SSM is trying to establish a connection to the server.
REPAIRING... The server is repairing its states and statuses.
A server that is configured to execute and is running should have a CONNECTED status. If its status is anything other than CONNECTED (excluding the transient status values), the following actions should be taken:
1. The server status is UP/UNCONNECTED . Monitor the server status closely for a few minutes. SSM will periodically try to establish a connection with the server. The Force Connect feature described in Section 1.14 can be used to speed up the process.
2. The server status is DOWN . Use the server start up feature described in Section 1.6 to start up the server. The Startup Daemon will ensure that only one instance of the server is executing.
3. The server status is INDETERMINATE . Verify whether the server is running. If the server is not running, start it up. If the server is running, ensure that the Startup Daemon configured for the same node is running and has a connection to SSM. If the Startup Daemon is not running, start it up using the /etc/rc.hpss script. Otherwise, use the Force Connect operation described in Section 1.14 to establish the connections for the server and the Startup Daemon. If this does not correct the server's status, review the HPSS Alarms and Events window to search for problems that the server and SSM may have reported. In addition, delog the HPSS logs for the server's and SSM's log messages to help determine the problems.
SSM will continuously monitor the servers' execution and connection status and update the Status fields when there are any changes.
If a server is configured to execute and is not running, SSM will report it as a an error. Therefore, if a server is not intended to run for an extended period, its
Executable
flag should be set to OFF. SSM will stop monitoring the server and will not report the server-not-running condition as a critical error. This will also help reduce the work load for the SSM servers.
An HPSS server that is running and connected to SSM will report any error conditions that it is experiencing by setting its operational state to an error value. SSM reports this information in the OpState field on the HPSS Servers window as follows:
ENABLED . The server operates normally.
DISABLED . The server is not operational, usually due to a shutdown/halt request.
SUSPECT . The server may have a problem.
MINOR . The server encountered a problem that does not seriously affect the HPSS operation.
MAJOR . The server encountered a problem that may seriously impact the overall HPSS operation.
BROKEN . The server encountered a critical error and shut itself down.
UNKNOWN . The server's state is unknown to SSM. SSM should be able to obtain an Operational State for the server but cannot because SSM cannot communicate with it, and the server has not set its state to DISABLED or BROKEN . The server may or may not be running. The reason may simply be that the server is marked executable but is not running.
INVALID . SSM has obtained an Operational State from the server, but its value is not recognized as a valid one. This may indicate a bug in the server or a communications problem which is garbling data.
NONE . The server does not have an Operational State (because the server is not executable, for example), or its Operational State is, by design, unobtainable (as for certain server types which do not report their states to SSM).
SSM will continuously monitor the server's operational state and update the OpState field in the HPSS Servers window when there are any changes. When this field indicates that there is an error, the user should select the entry of the server with the error and click on the Server Info... button to view the server's managed object window for a possible indication of the error if SSM can communicate with the server.
A server that is running and connected to SSM will allow the SSM user to view and update its information. This section describes the server execution statuses and configuration information. Other information maintained by the servers are described in Chapter 7 Monitoring HPSS Information.
A typical HPSS server allows the SSM users to control its execution and monitor its server-related data through the Basic Server Information window and the server specific information windows. These windows are described in the following subsections.
Most HPSS servers allow the SSM user to view the server's states and statuses as well as changing its Administrative State to lock its service, shut it down, and repair its states and statuses.
On a normal condition, the server states and statuses reported on the Server Information window are as follows:
Administrative State: Unlocked
Communication Status: Normal
However, when the server is experiencing errors or encountering abnormal conditions, it will change the appropriate states and statuses to error values, notify SSM of the changes, and issue an alarm to SSM. Refer to Section 1.16.2 for more information on how to track a problem using the HPSS alarms and the HPSS logging services.
From the HPSS Servers window (Figure 1-1) select the appropriate server entry and then click on the Basic... button from the Server Info button group. The Basic Server Information window is displayed as shown in Figure 1-3. Refer to the window's help file for more information on the individual fields as well as the supported operations available from the window.
The Startup Daemon and the SSM System Manager do not have their own Basic Server Information windows.
The Basic Server Information window will not be available for viewing if the server's operational state is: BROKEN, NOT ENABLED, or UNKNOWN . (i.e. the Basic Server Information window is only available if the operational state is MAJOR or MINOR .)
Figure 1-3 Server Information WindowThe majority of the HPSS servers also allow the SSM user to view and change their server specific data through the SSM windows. A server specific information window displays all the essential data maintained by the server. A typical server allows authorized users to change the value of the fields and use them immediately in the current execution.
There are server specific information windows for the following servers:
To display a server specific information window, from the Basic Server Information window
(Figure 1-3), click on the
Show Type-specific Info
button. The server specific information window will be displayed. This window can also be brought up from the HPSS Servers window, by selecting the appropriate server entry and then clicking on the
Type-specific...
button in the
Server Info
button group. An example of one of these windows is shown in Figure 1-4. A unique window is used to display the data for each type of HPSS server; Figure 1-4 shows the Name Server Information window. Refer to the window's help file for more information on the individual fields as well as the supported operations available from the window.
Any changes made on a server's information window will be effective only during the server's current execution. To make the changes permanent, make the identical changes on the server's configuration window (Chapter 6 in the HPSS Installation Guide).
If the server's operational state is BROKEN or UNKNOWN , the Specific Server Information window will not be available for viewing.
Figure 1-4 Name Server Information Window