Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Installation of Oracle SGE Execution Host

News Enterprise Unix System Administration Recommended Links Installation Planning Usage of NFS Installation of the Master Host
SGE cheat sheet   qconf qsub qalter qstat
Starting and Killing SGE Daemons SGE Queues Configuring Hosts From the Command Line SGE Submit Scripts Humor Etc

Introduction

The execution host installation creates the appropriate directory hierarchy required by sge_execd. In some versions of SGE it starts the sge_execd daemon on the execution host. In others it should be done manually.

You can automate the installation of execution host for multiple hosts using GUI installation:  just add as many hosts as you wish and they will be installed one by one in one batch. 

If prerequisites are met and everything is checked this allow to install SGE  of  execution hosts on all or substantial part of nodes of the cluster.

Before installing an execution host, you first need to install and configure the master.

Installation consist of two major steps

Pre-installation checklist

On execution host you need to check the following six preconditions:

  1. Register and patch the server

  2. Configure NTP.  Check using ntpdate -u ntp1.firm.com

  3. Update or install java  (usually not needed if version of OS is patched)
  4. Share common directory from the master host via NFS

  5. Create passwordless login from the master host to execution host

  6. Add SGE services to /etc/services

Update or install java

Usually java is already installed. But you still need to verify that. In case it is not, you need to install it:

yum install java
Loaded plugins: rhnplugin, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package java-1.6.0-openjdk.x86_64 1:1.6.0.0-1.27.1.10.8.el5_8 set to be updated
--> Processing Dependency: tzdata-java for package: java-1.6.0-openjdk
--> Processing Dependency: libgif.so.4()(64bit) for package: java-1.6.0-openjdk
--> Running transaction check
---> Package giflib.x86_64 0:4.1.3-7.3.3.el5 set to be updated
---> Package tzdata-java.x86_64 0:2012c-1.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================
 Package                     Arch            Version                                Repository                     Size
========================================================================================================================
Installing:
 java-1.6.0-openjdk          x86_64          1:1.6.0.0-1.27.1.10.8.el5_8            rhel-x86_64-server-5           36 M
Installing for dependencies:
 giflib                      x86_64          4.1.3-7.3.3.el5                        rhel-x86_64-server-5           39 k
 tzdata-java                 x86_64          2012c-1.el5                            rhel-x86_64-server-5          181 k

Transaction Summary
========================================================================================================================
Install       3 Package(s)
Upgrade       0 Package(s)

Total download size: 36 M
Is this ok [y/N]: y
Downloading Packages:
(1/3): giflib-4.1.3-7.3.3.el5.x86_64.rpm                                                         |  39 kB     00:00
(2/3): tzdata-java-2012c-1.el5.x86_64.rpm                                                        | 181 kB     00:01
(3/3): java-1.6.0-openjdk-1.6.0.0-1.27.1.10.8.el5_8.x86_64.rpm                                   |  36 MB     01:31
------------------------------------------------------------------------------------------------------------------------
Total                                                                                   339 kB/s |  36 MB     01:50
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : giflib                                                                                           1/3
  Installing     : tzdata-java                                                                                      2/3
  Installing     : java-1.6.0-openjdk                                                                               3/3

Installed:
  java-1.6.0-openjdk.x86_64 1:1.6.0.0-1.27.1.10.8.el5_8

Dependency Installed:
  giflib.x86_64 0:4.1.3-7.3.3.el5                            tzdata-java.x86_64 0:2012c-1.el5 

Share common directory from the master host

Most SGE installation share the whole /sge directory from the master host. It should be mounted under the same name on the execution host. See Usage of NFS in Grid Engine.

For large installations you can share less to improve efficiency. If you the fail to share at least $SGE_ROOT/$SGE_CELL/common directory from qmaster host, you will not able to install execution hosts on nodes other than the qmaster host.

/sge 10.194.186.254(rw,no_root_squash) 10.194.181.26(rw,no_root_squash) 

In the latter case you need to restart the NFS daemon on qmaster host to reread export file:

# service nfs restart
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

Create passwordless login from the master host to execution host

Create passwordless login environment.

Tip: If you already have configured it just copy file authorized_hosts from already configured execution host.

cd /root/.ssh
scp sge01:/root/.ssh/authorized_hosts  . 
Check ssh access from the master host  to the node on which you install the execution host (b5 in the example below):
[0]root@m17: # ssh b5
The authenticity of host 'b5 (10.194.181.46)' can't be established.
RSA key fingerprint is 18:35:6e:96:11:77:27:fc:ac:1c:8e:46:36:2b:ae:2b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'b5,10.194.181.46' (RSA) to the list of known hosts.
Last login: Thu Jul 26 08:29:41 2012 from sge_master.firma.net

Add SGE services to /etc/services

Update /etc/services . You need to add two ports that are used by SGE

vi /etc/services

add lines (typically people use the default ports 6444 and 6445, but your mileage may vary)

sge_qmaster     6444/tcp                # Grid Engine Qmaster Service
sge_qmaster     6444/udp                # Grid Engine Qmaster Service
sge_execd       6445/tcp                # Grid Engine Execution Service
sge_execd       6445/udp                # Grid Engine Execution Service

Installation of SGE client

  1. On the execution host: verify that SGE directory is NFS mounted. We assume you will be using a NFS-mounted directory (we will assume that it is /sge) and it is already mounted as required by prerequisites:

    cd /sge && ls
    	
  2. On execution host and master host: verify the $SGE_ROOT directory setting for your shell session.On the execution host: If the $SGE_ROOT environment variable is not set, set it by typing:

    # SGE_ROOT=/sge; export SGE_ROOT

    To confirm that you have set the $SGE_ROOT environment variable, type:

    # echo $SGE_ROOT
  3. On the master host: host change directory (cd) to the installation directory, $SGE_ROOT

    cd $SGE_ROOT
  4. On the master host:

    Add host IP to the /etc/hosts (RHEL puts long name as host name which is not very convenient for SGE purposes)

  5. On the master host: add the host to the list of execution hosts (this is not strictly nessesary)

    qconf -ae

    The -ae option (add execution host) displays an editor that contains a configuration template for an execution host. The editor is either the default vi editor or the editor that corresponds to the EDITOR environment variable.

    In this template you specify the hostname, which should be the name of an execution host we wnat to configure. In VI screen change the name and save the template. See the host_conf(5) man page for a detailed description of the template entries to be changed.

      1 hostname              template
      2 load_scaling          NONE
      3 complex_values        NONE
      4 user_lists            NONE
      5 xuser_lists           NONE
      6 projects              NONE
      7 xprojects             NONE
      8 usage_scaling         NONE
      9 report_variables      NONE
  6. On the master host: export display to your workstation/PC. You need to start X11 on your workstation (for example Exceed, if you use Windows workstation) For example:
    export DISPLAY=10.14.17.7:0; echo $DISPLAY
  7. On the master host: launch the GUI installer by executing the command
    ./start_gui_installer

    That should start installer in X session on your workstation/PC.

    Click Next

    Click Next. You will see select host screen

  8. On the execution host: Register execution daemon and start it. Ensure proper environment after reboot:

    NOTE: You can automate steps listed below by creating a small script:

    #!/bin/bash
    #
    # Post install operations for SGE execution host
    #
    . /$SGE_ROOT/default/common/settings.sh
    
    # Add sgeexecd.$SGE_CLUSTER_NAME (or whatever is your cluster name) to default services on level 3 and 5
    chkconfig sgeexecd.$SGE_CLUSTER_NAME on
    
    # On the execution host: start the sge_execd service
    service sgeexecd.$SGE_CLUSTER_NAME start
    
    # add nessesary commands to /etc/profile
    echo ". /$SGE_ROOT/default/common/settings.sh" >> /etc/profile

  9. On the master host: Specify a queue for this host. That can be done by either adding it to existing queue or copying existing queue, renaming it and saving under new name.

    To add a new queue using existing queue as a template use commands

    1. # qconf -sq c32.q > m40a.q 
    2. Change in template four parameters 
      hostlist              lus 
      processors            32
      slots                 32
      shell                 /bin/bash
      pe_list               ms 
      vi m40a.q 
    3. Write back from the file under different name
      qconf -Aq m40a.q 
      root@lus17 added "m40a.q" to cluster queue list

    See Creating and modifying SGE Queues

  10. Verify that the execution host has been declared with the command

    qconf -sel

    which lists all execution hosts.

    You can also use qconf -se <hostname> to see parameters configured (usually only hostname is configured) See Configuring Hosts From the Command Line

  11. On the execution host: Reboot execution host and verify that the NFS correctly mounted on reboot

Tip: For details about how you can verify that the execution host has been set up correctly, see How to Verify that the Daemons are Running on the Execution Hosts.

  1. Log in to the execution hosts on which you ran the execution host installation procedure.
  2. Verify that the daemons are running by typing one of the following commands, depending on the operating system you are running.
    • On BSD-based UNIX systems, type the following command.
      % ps -ax | grep sge
    • On systems running a UNIX System 5--based operating system (such as the Solaris Operating System), type the following command.
      % ps -ef | grep sge
  3. Verify the daemons are running by looking for the sge_execd string in the output.

    Specifically, you should see that the sge_execd daemon is running.

  4. If you do not see similar output, the daemon required on the execution host is not running. Restart the daemon by hand. For example for Linux you can use service command:
    /sbin/service sgeexecd.p6444 start

Top Visited
Switchboard
Latest
Past week
Past month