Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Creating and modifying SGE Queues

News Grid Engine Recommended Links Listing all existing queues Adding a new queue Modifying queue parameters Removing a queue
SGE Jobs queue_conf - Grid Engine queue configuration file format slots queue attribute Excluding SGE execution host from scheduling Restricting number of slots per server Slot hacking SGE_types
qconf qdel qhold qalter -- Change Job Priority qmod qsub qstat
Some interesting queue params Most important optiona of qconf Resource Quotas  SGE Consumable Resources Prolog and epilog scripts SGE Submit Scripts Submitting parallel OpenMPI jobs
Troubleshooting Excluding SGE queue from scheduling SGE queue states and state codes   Some interesting SGE queue attributes SGE Resource Quota Sets sge_conf
ulimit problem with Infiniband in SGE SGE hostgroups Execution hosts Parallel environment      
Monitoring Queues and Jobs Starting and Killing SGE Daemons SGE cheat sheet Tips History Humor Etc

Introduction

A queue is a container for a class of jobs. It operates in conjunction with Parallel environment (or set of parallel environments). Each queue has a name (qname) and the list of execution hosts (hostlist). List of execution hosts should be provided with spaces, not commas.  It also defines how namy slots are availble on each host belonging to a queues. Slots are treated within each host as a consumable resource -- if job is scheduled on the particular host(s), another job will be scheduled to a different hosts, unless this host still have required number of slots for another job. 

SGE creates one default queue during installation -- all.q. Other queues need to be created manually. Frequently used operations include:

Show:

Modify

Delete

Excluding SGE queue from scheduling

Most important options of qconf

The command qconf is used to configure and modify parameters of each queue. Among the most important parameters used:

Show:

Add

Modify

Delete

Files:

   $SGE_ROOT/$SGE_CELL/common/act_qmaster         Grid Engine master host file
   $SGE_ROOT/$SGE_CELL/spool/qmaster/cqueues/    Queues Configuration Directory

 

Number of slots per per host

The key parameter in queue if number of slots per host. It an be defined iether

When you submit your job or script with qsub command SGE can check the list of requirements for particular job such as number of cores requested, amount of memory needed, run-time, etc. Then it will try to find all the machines which meets your requirements, and if there are more then necessary to run the job, remove any heavily-loaded machines from that list.

For example, if the machine m32 with 32 cores has a small job running on 16 cores and server m40 with 40 cores has a larger job running in 28 cores then the next job that requires 12 cores will be submitted to m32 server, despite the fact that it will "pack" 40 core server better. If the next job is 16 core job it will enter the wait state. 

Additional parameters that SGE can take into account during scheduling of a job include but are not limited to:

Every parameter in a cluster queue definition can be redefined on per-host basis. For example,  the slots parameter can be defined per host and in this case as we mentioned above it defines the maximum number of slots that a queue can run simultaneously on the particular host. In this case it is typically set to the number of cores on the execution host.

NOTE: in you want to limit the number of cores used on each host in the queue it is better to set this limit is PE, not slots parameter of the queue. In slots parameter it is prudent always specify real number of cores for each host. 

Queues in SGE are autonomous entities The danger of oversubsctiption

Queues in SGE are autonomous entities and does not communicate between themselves about which host they use. This is actually might be considered to be a design fault as often you need to use different  queues, but with mutual blocking of execution of jobs on the same hosts.

If you have two different queues with intersecting sets of hosts, blocking of scheduling to the same host occurs only on the level of load of the host -- if load on the host is above certain threshold new job defined in load_threshold attribute of the queues, then no new jobs are placed on the host (see Some interesting SGE queue params). 

Mutual block of hosts between two of more queues can be achieved using Resource Quota Sets (RQS).

So generally set of host for each queue should be disjoint. Overlapping queues are possible if you run computational tasks and can determine that the host is busy based of the average load. This parameter set in queues by default to 1.75 can be lowed to, say, 0.33, to prevent oversubscription of nodes.  There is also another, more complex,  way of  preventing oversubscription in two queues for the same hosts.

A SGE "job" is simply a Unix-shell script that executes of a remote host. You need to create environment for running the script yourself as SGE does not replicate environment from submission host to execution host. This is typically done in  SGE Submit Scripts

Typical operations

 

Listing all existing queues

To show all configured queues and the parameters for a given queue use the qconf -sql command, e.g.:

qconf -sql

all.q
c12.q
c32.q
m12a.q
m40a.q
If you don't use database then queries are plain files:
qconf -sql
all.q
long.q
short.q
ls -lA $SGE_ROOT/$SGE_CELL/spool/qmaster/cqueues/
-rw-r--r--  1 sgeadmin sgeadmin 1324 Apr 26 16:26 all.q
-rw-r--r--  1 sgeadmin sgeadmin 1331 Apr 26 17:01 background.q
-rw-r--r--  1 sgeadmin sgeadmin 1333 Apr 26 17:01 long.q

If you use database the data are in $SGE_ROOT/$SGE_CELL/spool/qmaster/spooldb database

If there is a need to disable a particular queue for some reason, e.g scheduling that node for maintenance, use qmod -d Q  where Q  is the queue name. You will need to be a SGE manager in order to disable a queue like the root  account. You can also use wildcards to select a particular range of queues.

You can get info about individual queues using the command

qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi
rerun                 FALSE
slots                 1,[b08=4],[b09=4]
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Adding a new queue

You can add a new queue using two methods:

Using an existing queue as a template

To add a new queue using existing queue as a template use commands

  1. # qconf -sq c32.q > m40a.q 
  2. Change in template four parameters qconf (hostlist, processore, slots, shell andpe-list)
    hostlist b52 b53 b55 b56 
    processors 32
    slots 32
    shell /bin/bash
    pe_list               ms 
    vi m40a.q 
  3. Write back from the file under different name
    qconf -Aq m40a.q 
    root@lus17 added "m40a.q" to cluster queue list

Using a default template

Above we used parameter -Aq fname of the command qconf which means read the file fname content and write is as a new queue definition.

To add a new queue using default template use the command qconf -aq your_new_queue_name

qconf retrieves the default queue configuration (see queue_conf(5)) and executes vi (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the queue configuration.

Upon exit from the editor, the queue is registered with sge_qmaster(8). A minimal configuration requires only that the queue name and queue hostlist be set. Requires root or manager privileges.

Generally you need to change in template four parameters

Parameter pe-list is especially important as it specifies the parallel environment.

You can compare two queues by first writing them into a file (is you store parameters in database they are not readable as plain files) and them using diff command:

Example:

qconf -sq all.q > /tmp/all.q
qconf -sq long.a > /tmp/long.q
diff /tmp/all.q /tmp/long.q
1c1
< qname              all.q
---
> qname              long.q
13c13
< pe_list            make mpi
---
> pe_list            mpi
15c15
< slots              1,[merlin08=4],[merlin09=4]
---
> slots              4
35,36c35,36
< s_rt               INFINITY
< h_rt               INFINITY
---
> s_rt               12:00:00
> h_rt               12:30:00

You can extract selected fields:

qconf -sq background.q | egrep '(_rt|_cpu)'
min_cpu_interval      00:05:00
s_rt                  300:00:00
h_rt                  300:30:00
s_cpu                 INFINITY
h_cpu                 INFINITY

The queue configurations are stored as text files in the directory $SGE_ROOT/$SGE_CELL/spool/qmaster/cqueues/, e.g.:

cat /sge/default/spool/qmaster/cqueues/long.q
qname              long.q
hostlist           @allhosts
seq_no             0
load_thresholds    np_load_avg=1.75
suspend_thresholds NONE
nsuspend           1
suspend_interval   00:05:00
priority           0
min_cpu_interval   00:05:00
processors         UNDEFINED
qtype              BATCH INTERACTIVE
ckpt_list          NONE
pe_list            mpi
rerun              FALSE
slots              4
tmpdir             /tmp
shell              /bin/bash
prolog             NONE
epilog             NONE
shell_start_mode   unix_behavior
starter_method     NONE
suspend_method     NONE
resume_method      NONE
terminate_method   NONE
notify             00:00:60
owner_list         NONE
user_lists         NONE
xuser_lists        NONE
subordinate_list   NONE
complex_values     NONE
projects           NONE
xprojects          NONE
calendar           NONE
initial_state      default
s_rt               12:00:00
h_rt               12:30:00
s_cpu              INFINITY
h_cpu              INFINITY
s_fsize            INFINITY
h_fsize            INFINITY
s_data             INFINITY
h_data             INFINITY
s_stack            INFINITY
h_stack            INFINITY
s_core             INFINITY
h_core             INFINITY
s_rss              INFINITY
h_rss              INFINITY
s_vmem             INFINITY
h_vmem             INFINITY

Modifying queue parameters

Modifications to a queue configuration do not affect an active queue, taking effect on next invocation of the queue (i.e., the next job).

qconf -mq queue_name 

This command retrieves the current configuration for the specified queue, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the sge_qmaster(8). Refer to queue_conf(5) for details on the queue configuration format.

Requires root or manager privilege.

Removing a queue

qconf -dq queue_name

Removes the specified queue(s). Active jobs will be allowed to run to completion. Requires root or manager privileges. 

 List of all options

Unless denoted otherwise, the following options and the corresponding operations are available to all users with a valid account.

-Aattr obj_spec fname obj_instance,...
<add to object attributes>
Similar to -aattr (see below) but takes specifications for the object attributes to be enhanced from file named fname. As opposed to -aattr, multiple attributes can be enhanced. Their specification has to be enlisted in fname following the file format of the corresponding object (see queue_conf(5) for the queue, for example).
Requires root/manager privileges.
-Acal fname
<add calendar>
Adds a new calendar definition to the Grid Engine environment. Calendars are used in Grid Engine for defining availability and unavailability schedules of queues. The format of a calendar definition is described in calendar_conf(5).

The calendar definition is taken from the file fname. Requires root/ manager privileges.

-Ackpt fname
<add ckpt. environment>
Add the checkpointing environment as defined in fname (see checkpoint(5)) to the list of supported checkpointing environments. Requires root or manager privileges.
-Aconf file_list
<add configurations>
Add the configurations (see ge_conf(5)) specified in the files enlisted in the comma separated file_list. The configuration is added for the host that is identical to the file name.
Requires root or manager privileges.
-Ae fname
<add execution host>
Add the execution host defined in fname to the Grid Engine cluster. The format of the execution host specification is described in host_conf(5). Requires root or manager privileges.
-Ahgrp file
<add host group config>
Add the host group configuration defined in file. The file format of file must comply to the format specified in hostgroup(5). Requires root or manager privileges.
-Arqs fname <add RQS configuration>
Add the resource quota set (RQS) defined in the file named fname to the Grid Engine cluster. Requires root or manager privileges.
-Ap fname
<add PE configuration>
Add the parallel environment (PE) defined in fname to the Grid Engine cluster. Requires root or manager privileges.
-Aprj fname
<add new project>
Adds the project description defined in fname to the list of registered projects (see project(5)). Requires root or manager privileges.
-Aq fname
<add new queue>
Add the queue defined in fname to the Grid Engine cluster. Requires root or manager privileges.
-Au fname
<add an ACL>
Add the user access list (ACL) defined in fname to Grid Engine. User lists are used for queue usage authentication. Requires root/manager/operator privileges.
-cb

This parameter can be used since Grid Engine version 6.2u5 in combination with the command line switch -sep. In that case the output of the corresponding command will contain information about the added job to core binding functionality.

If -cb switch is not used then -sep will behave as in GE version 6.2u4 and below.

Please note that this command-line switch will be removed from Grid Engine with the next major release.

-Dattr obj_spec fname obj_instance,...

<del. from object attribs>
Similar to -dattr (see below) but the definition of the list attributes from which entries are to be deleted is contained in the file named fname. As opposed to -dattr, multiple attributes can be modified. Their specification has to be enlisted in fname following the file format of the corresponding object (see queue_conf(5) for the queue, for example).
Requires root/manager privileges.

-Mattr obj_spec fname obj_instance,...
<mod. object attributes>
Similar to -mattr (see below) but takes specifications for the object attributes to be modified from file named fname. As opposed to -mattr, multiple attributes can be modified. Their specification has to be enlisted in fname following the file format of the corresponding object (see queue_conf(5) for the queue, for example).
Requires root/manager privileges.
-Mc fname
<modify complex>
Overwrites the complex configuration by the contents of fname. The argument file must comply to the format specified in complex(5). Requires root or manager privilege.
-Mcal fname
<modify calendar>
Overwrites the calendar definition as specified in fname. The argument file must comply to the format described in calendar_conf(5). Requires root or manager privilege.
-Mckpt fname
<modify ckpt. environment>
Overwrite an existing checkpointing environment with the definitions in fname (see checkpoint(5)). The name attribute in fname has to match an existing checkpointing environment. Requires root or manager privileges.
-Mconf file_list
<modify configurations>
Modify the configurations (see ge_conf(5)) specified in the files enlisted in the comma separated file_list. The configuration is modified for the host that is identical to the file name.
Requires root or manager privileges.
-Me fname
<modify execution host>
Overwrites the execution host configuration for the specified host with the contents of fname, which must comply to the format defines in host_conf(5). Requires root or manager privilege.
-Mhgrp file
<modify host group config.>
Allows changing of host group configuration with a single command. All host group configuration entries contained in file will be applied. Configuration entries not contained in file will be deleted. The file format of file must comply to the format specified in hostgroup(5).
-Mrqs fname [mrqs_name]
<modify RQS configuration>
Same as -mrqs (see below) but instead of invoking an editor to modify the RQS configuration, the file fname is considered to contain a changed configuration. The name of the rule set in fname must be the same as rqs_name. If rqs_name is not set, all rule sets are overwritten by the rule sets in fname Refer to ge_resource_quota(5) for details on the RQS configuration format. Requires root or manager privilege.
-Mp fname
<modify PE configuration>
Same as -mp (see below) but instead of invoking an editor to modify the PE configuration the file fname is considered to contain a changed configuration. Refer to ge_pe(5) for details on the PE configuration format. Requires root or manager privilege.
-Mprj fname
<modify project config.>
Same as -mprj (see below) but instead of invoking an editor to modify the project configuration the file fname is considered to contain a changed configuration. Refer to project(5) for details on the project configuration format. Requires root or manager privilege.
-Mq fname
<modify queue configuration>
Same as -mq (see below) but instead of invoking an editor to modify the queue configuration the file fname is considered to contain a changed configuration. Refer to queue_conf(5) for details on the queue configuration format. Requires root or manager privilege.
-Msconf fname
<modify scheduler configuration from file>
The current scheduler configuration (see sched_conf(5)) is overridden with the configuration specified in the file. Requires root or manager privilege.
-Mstree fname
<modify share tree>
Modifies the definition of the share tree (see share_tree(5)). The modified sharetree is read from file fname. Requires root or manager privileges.
-Mu fname
<modify ACL>
Takes the user access list (ACL) defined in fname to overwrite any existing ACL with the same name. See access_list(5) for information on the ACL configuration format. Requires root or manager privilege.
-Muser fname
<modify user>
Modify the user defined in fname in the Grid Engine cluster. The format of the user specification is described in user(5). Requires root or manager privileges.
-Rattr obj_spec fname obj_instance,...
<replace object attribs>
Similar to -rattr (see below) but the definition of the list attributes whose content is to be replace is contained in the file named fname. As opposed to -rattr, multiple attributes can be modified. Their specification has to be enlisted in fname following the file format of the corresponding object (see queue_conf(5) for the queue, for example).
Requires root/manager privileges.
-aattr obj_spec attr_name val obj_instance,...
<add to object attributes>
Allows adding specifications to a single configuration list attribute in multiple instances of an object with a single command. Currently supported objects are the queue, the host, the host group, the parallel environment, the resource quota sets and the checkpointing interface configuration being specified as queue , exechost , hostgroup , pe , rqs or ckpt in obj_spec. For the obj_spec queue the obj_instance can be a cluster queue name, a queue domain name or a queue instance name. Find more information concerning different queue names in sge_types(1). Depending on the type of the obj_instance this adds to the cluster queues attribute sublist the cluster queues implicit default configuration value or the queue domain configuration value or queue instance configuration value. The queue load_thresholds parameter is an example of a list attribute. With the -aattr option, entries can be added to such lists, while they can be deleted with -dattr, modified with -mattr, and replaced with -rattr.
For the obj_spec rqs the obj_instance is a unique identifier for a specific rule. The identifier consists of a rule-set name and either the number of the rule in the list, or the name of the rule, separated by a /
The name of the configuration attribute to be enhanced is specified with attr_name followed by val as a name=value pair. The comma separated list of object instances (e.g., the list of queues) to which the changes have to be applied are specified at the end of the command.
The following restriction applies: For the exechost object the load_values attribute cannot be modified (see host_conf(5)).
Requires root or manager privileges.
-acal calendar_name
<add calendar>
Adds a new calendar definition to the Grid Engine environment. Calendars are used in Grid Engine for defining availability and unavailability schedules of queues. The format of a calendar definition is described in calendar_conf(5).

With the calendar name given in the option argument qconf will open a temporary file and start up the text editor indicated by the environment variable EDITOR (default editor is vi(1) if EDITOR is not set). After entering the calendar definition and closing the editor the new calendar is checked and registered with ge_qmaster(8). Requires root/manager privileges.

-ackpt ckpt_name
<add ckpt. environment>
Adds a checkpointing environment under the name ckpt_name to the list of checkpointing environments maintained by Grid Engine and to be usable to submit checkpointing jobs (see checkpoint(5) for details on the format of a checkpointing environment definition). Qconf retrieves a default checkpointing environment configuration and executes vi(1) (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the checkpointing environment configuration. Upon exit from the editor, the checkpointing environment is registered with ge_qmaster(8). Requires root/manager privileges.
-aconf host,...
<add configuration>
Successively adds configurations (see ge_conf(5)) For the hosts in the comma separated file_list. For each host, an editor ($EDITOR indicated or vi(1)) is invoked and the configuration for the host can be entered. The configuration is registered with ge_qmaster(8) after saving the file and quitting the editor.
Requires root or manager privileges.
-ae [host_template]
<add execution host>
Adds a host to the list of Grid Engine execution hosts. If a queue is configured on a host this host is automatically added to the Grid Engine execution host list. Adding execution hosts explicitly offers the advantage to be able to specify parameters like load scale values with the registration of the execution host. However, these parameters can be modified (from their defaults) at any later time via the -me option described below.
If the host_template argument is present, qconf retrieves the configuration of the specified execution host from ge_qmaster(8) or a generic template otherwise. The template is then stored in a file and qconf executes vi(1) (or the editor indicated by $EDITOR if the EDITOR environment variable is set) to change the entries in the file. The format of the execution host specification is described in host_conf(5). When the changes are saved in the editor and the editor is quit the new execution host is registered with ge_qmaster(8). Requires root/manager privileges.
-ah hostname,...
<add administrative host>
Adds hosts hostname to the Grid Engine trusted host list (a host must be in this list to execute administrative Grid Engine commands, the sole exception to this being the execution of qconf on the ge_qmaster(8) node). The default Grid Engine installation procedures usually add all designated execution hosts (see the -ae option above) to the Grid Engine trusted host list automatically. Requires root or manager privileges.
-ahgrp group
<add host group config.>
Adds a new host group with the name specified in group. This command invokes an editor (either vi(1) or the editor indicated by the EDITOR environment variable). The new host group entry is registered after changing the entry and exiting the editor. Requires root or manager privileges.
-arqs [rqs_name]
<add new RQS>
Adds one or more Resource Quota Set (RQS) description under the names rqs_name to the list of RQSs maintained by Grid Engine (see ge_resource_quota(5) for details on the format of a RQS definition). Qconf retrieves a default RQS configuration and executes vi(1) (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the RQS configuration. Upon exit from the editor, the RQS is registered with ge_qmaster(8). Requires root or manager privileges.
-am user,...
<add managers>
Adds the indicated users to the Grid Engine manager list. Requires root or manager privileges.
-ao user,...
<add operators>
Adds the indicated users to the Grid Engine operator list. Requires root or manager privileges.
-ap pe_name
<add new PE>
Adds a Parallel Environment (PE) description under the name pe_name to the list of PEs maintained by Grid Engine and to be usable to submit parallel jobs (see ge_pe(5) for details on the format of a PE definition). Qconf retrieves a default PE configuration and executes vi(1) (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the PE configuration. Upon exit from the editor, the PE is registered with ge_qmaster(8). Requires root/manager privileges.
-at thread_name <activates thread in master>
Activates an additional thread in the master process. thread_name might be either "scheduler" or "jvm". The corresponding thread is only started when it is not already running. There might be only one scheduler and only one jvm thread in the master process at the same time.
-aprj
<add new project>
Adds a project description to the list of registered projects (see project(5)). Qconf retrieves a template project configuration and executes vi(1) (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the new project. Upon exit from the editor, the template is registered with ge_qmaster(8). Requires root or manager privileges.
-aq [queue_name]
<add new queue>
Qconf retrieves the default queue configuration (see queue_conf(5)) and executes vi(1) (or $EDITOR if the EDITOR environment variable is set) to allow you to customize the queue configuration. Upon exit from the editor, the queue is registered with ge_qmaster(8). A minimal configuration requires only that the queue name and queue hostlist be set. Requires root or manager privileges.
-as hostname,...
<add submit hosts>
Add hosts hostname to the list of hosts allowed to submit Grid Engine jobs and control their behavior only. Requires root or manager privileges.
-astnode node_path=shares,...
<add share tree node>
Adds the specified share tree node(s) to the share tree (see share_tree(5)). The node_path is a hierarchical path ([/]node_name[[/.]node_name...]) specifying the location of the new node in the share tree. The base name of the node_path is the name of the new node. The node is initialized to the number of specified shares. Requires root or manager privileges.
-astree
<add share tree>
Adds the definition of a share tree to the system (see share_tree(5)). A template share tree is retrieved and an editor (either vi(1) or the editor indicated by $EDITOR) is invoked for modifying the share tree definition. Upon exiting the editor, the modified data is registered with ge_qmaster(8). Requires root or manager privileges.
-Astree fname
<add share tree>
Adds the definition of a share tree to the system (see share_tree(5)) from the file fname. Requires root or manager privileges.
-au user,... acl_name,...
<add users to ACLs>
Adds users to Grid Engine user access lists (ACLs). User lists are used for queue usage authentication. Requires root/manager/operator privileges.
-Auser fname
<add user>
Add the user defined in fname to the Grid Engine cluster. The format of the user specification is described in user(5). Requires root or manager privileges.
-auser
<add user>
Adds a user to the list of registered users (see user(5)). This command invokes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) for a template user. The new user is registered after changing the entry and exiting the editor. Requires root or manager privileges.
-clearusage
<clear sharetree usage>
Clears all user and project usage from the sharetree. All usage will be initialized back to zero.
-cq wc_queue_list
<clean queue>
Cleans queue from jobs which haven't been reaped. Primarily a development tool. Requires root/manager/operator privileges. Find a description of wc_queue_list in sge_types(1).
-dattr obj_spec attr_name val obj_instance,...
<delete in object attribs>
Allows deleting specifications in a single configuration list attribute in multiple instances of an object with a single command. Find more information concerning obj_spec and obj_instance in the description of -aattr
-dcal calendar_name,...
<delete calendar>
Deletes the specified calendar definition from Grid Engine. Requires root/manager privileges.
-dckpt ckpt_name
<delete ckpt. environment>
Deletes the specified checkpointing environment. Requires root/manager privileges.
-dconf host,...
<delete local configuration>
The local configuration entries for the specified hosts are deleted from the configuration list. Requires root or manager privilege.
-de host_name,...
<delete execution host>
Deletes hosts from the Grid Engine execution host list. Requires root or manager privileges.
-dh host_name,...
<delete administrative host>
Deletes hosts from the Grid Engine trusted host list. The host on which ge_qmaster(8) is currently running cannot be removed from the list of administrative hosts. Requires root or manager privileges.
-dhgrp group <delete host group configuration>
Deletes host group configuration with the name specified in group. Requires root or manager privileges.
-drqs rqs_name_list
<delete RQS>
Deletes the specified resource quota sets (RQS). Requires root or manager privileges.
-dm user[,user,...]
<delete managers>
Deletes managers from the manager list. Requires root or manager privileges. It is not possible to delete the admin user or the user root from the manager list.
-do user[,user,...]
<delete operators>
Deletes operators from the operator list. Requires root or manager privileges. It is not possible to delete the admin user or the user root from the operator list.
-dp pe_name
<delete parallel environment>
Deletes the specified parallel environment (PE). Requires root or manager privileges.
-dprj project,...
<delete projects>
Deletes the specified project(s). Requires root/manager privileges.
-dq queue_name,...
<delete queue>
Removes the specified queue(s). Active jobs will be allowed to run to completion. Requires root or manager privileges.
-ds host_name,...
<delete submit host>
Deletes hosts from the Grid Engine submit host list. Requires root or manager privileges.
-dstnode node_path,...
<delete share tree node>
Deletes the specified share tree node(s). The node_path is a hierarchical path ([/]node_name[[/.]node_name...]) specifying the location of the node to be deleted in the share tree. Requires root or manager privileges.
-dstree
<delete share tree>
Deletes the current share tree. Requires root or manager privileges.
-du user,... acl_name,...
<delete users from ACL>
Deletes one or more users from one or more Grid Engine user access lists (ACLs). Requires root/manager/operator privileges.
-dul acl_name,...
<delete user lists>
Deletes one or more user lists from the system. Requires root/manager/operator privileges.
-duser user,...
<delete users>"
Deletes the specified user(s) from the list of registered users. Requires root or manager privileges.
-help

Prints a listing of all options.

-k{m|s|e[j] {host,...|all}}

<shutdown Grid Engine>
Note: The -ks switch is deprecated, may be removed in future release. Please use the -kt switch instead.
Used to shutdown Grid Engine components (daemons). In the form -km ge_qmaster(8) is forced to terminate in a controlled fashion. In the same way the -ks switch causes termination of the scheduler thread. Shutdown of running ge_execd(8) processes currently registered is initiated by the -ke option. If -kej is specified instead, all jobs running on the execution hosts are aborted prior to termination of the corresponding ge_execd(8). The comma separated host list specifies the execution hosts to be addressed by the -ke and -kej option. If the keyword all is specified instead of a host list, all running ge_execd(8) processes are shutdown. Job abortion, initiated by the -kej option will result in dr state for all running jobs until ge_execd(8) is running again.
Requires root or manager privileges.
-kt thread_name <terminate master thread>"
Terminates a thread in the master process. Currently it is only supported to shutdown the "scheduler" and the "jvm" thread. The command will only be successful if the corresponding thread is running.
-kec {id,...|all}
<kill event client>
Used to shutdown event clients registered at ge_qmaster(8). The comma separated event client list specifies the event clients to be addressed by the -kec option. If the keyword all is specified instead of an event client list, all running event clients except special clients like the scheduler thread are terminated. Requires root or manager privilege.
-mattr obj_spec attr_name val obj_instance,...
<modify object attributes>
Allows changing a single configuration attribute in multiple instances of an object with a single command. Find more information concerning obj_spec and obj_instance in the description of -aattr
-mc
<modify complex>
The complex configuration (see complex(5)) is retrieved, an editor is executed (either vi(1) or the editor indicated by $EDITOR) and the changed complex configuration is registered with ge_qmaster(8) upon exit of the editor. Requires root or manager privilege.
-mcal calendar_name
<modify calendar>
The specified calendar definition (see calendar_conf(5)) is retrieved, an editor is executed (either vi(1) or the editor indicated by $EDITOR) and the changed calendar definition is registered with ge_qmaster(8) upon exit of the editor. Requires root or manager privilege.
-mckpt ckpt_name
<modify ckpt. environment>
Retrieves the current configuration for the specified checkpointing environment, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the ge_qmaster(8). Refer to checkpoint(5) for details on the checkpointing environment configuration format. Requires root or manager privilege.
-mconf [host,...|global]
<modify configuration>
The configuration for the specified host is retrieved, an editor is executed (either vi(1) or the editor indicated by $EDITOR) and the changed configuration is registered with ge_qmaster(8) upon exit of the editor. If the optional host argument is omitted or if the special host name global is specified, the global configuration is modified. The format of the configuration is described in ge_conf(5).
Requires root or manager privilege.
-me hostname
<modify execution host>
Retrieves the current configuration for the specified execution host, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the changed configuration with ge_qmaster(8) upon exit from the editor. The format of the execution host configuration is described in host_conf(5). Requires root or manager privilege.
-mhgrp group <modify host group configuration>
The host group entries for the host group specified in group are retrieved and an editor (either vi(1) or the editor indicated by the EDITOR environment variable) is invoked for modifying the host group configuration. By closing the editor, the modified data is registered. The format of the host group configuration is described in hostgroup(5). Requires root or manager privileges.
-mrqs [rqs_name]
<modify RQS configuration>
Retrieves the resource quota set (RQS)configuration defined in rqs_name, or if rqs_name is not given, retrieves all resource quota sets, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the ge_qmaster(8). Refer to ge_resource_quota(5) for details on the RQS configuration format. Requires root or manager privilege.
-mp pe_name
<modify PE configuration>
Retrieves the current configuration for the specified parallel environment (PE), executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the ge_qmaster(8). Refer to ge_pe(5) for details on the PE configuration format. Requires root or manager privilege.
-mprj project
<modify project>
Data for the specific project is retrieved (see project(5)) and an editor (either vi(1) or the editor indicated by $EDITOR) is invoked for modifying the project definition. Upon exiting the editor, the modified data is registered. Requires root or manager privileges.
-mq queuename
<modify queue configuration>
Retrieves the current configuration for the specified queue, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the ge_qmaster(8). Refer to queue_conf(5) for details on the queue configuration format. Requires root or manager privilege.
-msconf
<modify scheduler configuration>
The current scheduler configuration (see sched_conf(5)) is retrieved, an editor is executed (either vi(1) or the editor indicated by $EDITOR) and the changed configuration is registered with ge_qmaster(8) upon exit of the editor. Requires root or manager privilege.
-mstnode node_path=shares,...
<modify share tree node>
Modifies the specified share tree node(s) in the share tree (see share_tree(5)). The node_path is a hierarchical path ([/]node_name[[/.]node_name...]) specifying the location of an existing node in the share tree. The node is set to the number of specified shares. Requires root or manager privileges.
-mstree
<modify share tree>
Modifies the definition of the share tree (see share_tree(5)). The present share tree is retrieved and an editor (either vi(1) or the editor indicated by $EDITOR) is invoked for modifying the share tree definition. Upon exiting the editor, the modified data is registered with ge_qmaster(8). Requires root or manager privileges.
-mu acl_name
<modify user access lists>
Retrieves the current configuration for the specified user access list, executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable) and registers the new configuration with the ge_qmaster(8). Requires root or manager privilege.
-muser user
<modify user>
Data for the specific user is retrieved (see user(5)) and an editor (either vi(1) or the editor indicated by the EDITOR environment variable) is invoked for modifying the user definition. Upon exiting the editor, the modified data is registered. Requires root or manager privileges.
-purge queue attr_nm,... obj_spec
<purge divergent attribute settings>
Delete the values of the attributes defined in attr_nm from the object defined in obj_spec. Obj_spec can be "queue_instance" or "queue_domain". The names of the attributes are described in queue_conf(1).
This operation only works on a single queue instance or domain. It cannot be used on a cluster queue. In the case where the obj_spec is "queue@@hostgroup", the attribute values defined in attr_nm which are set for the indicated hostgroup are deleted, but not those which are set for the hosts contained by that hostgroup. If the attr_nm is '*', all attribute values set for the given queue instance or domain are deleted.
The main difference between -dattr and -purge is that -dattr removes a value from a single list attribute, whereas -purge removes one or more overriding attribute settings from a cluster queue configuration. With -purge, the entire attribute is deleted for the given queue instance or queue domain.
-rattr obj_spec attr_name val obj_instance,...
<replace object attributes>
Allows replacing a single configuration list attribute in multiple instances of an object with a single command. Find more information concerning obj_spec and obj_instance in the description of -aattr .
Requires root or manager privilege.
-rsstnode node_path,...
<show share tree node>
Recursively shows the name and shares of the specified share tree node(s) and the names and shares of its child nodes. (see share_tree(5)). The node_path is a hierarchical path ([/]node_name[[/.]node_name...]) specifying the location of a node in the share tree.
-sc
<show complexes>
Display the complex configuration.
-scal calendar_name
<show calendar>
Display the configuration of the specified calendar.
-scall
<show calendar list>
Show a list of all calendars currently defined.
-sckpt ckpt_name
<show ckpt. environment>
Display the configuration of the specified checkpointing environment.
-sckptl
<show ckpt. environment list>
Show a list of the names of all checkpointing environments currently configured.
-sconf [host,...|global]
<show configuration>
Print the global or local (host specific) configuration. If the optional comma separated host list argument is omitted or the special string global is given, the global configuration is displayed. The configuration in effect on a certain host is the merger of the global configuration and the host specific local configuration. The format of the configuration is described in ge_conf(5).
-sconfl
<show configuration list>
Display a list of hosts for which configurations are available. The special host name global refers to the global configuration.
-sds
<show detached settings>
Displays detached settings in the cluster configuration.
-se hostname
<show execution host>
Displays the definition of the specified execution host.
-sel
<show execution hosts>
Displays the Grid Engine execution host list.
-secl
<show event clients>
Displays the Grid Engine event client list.
-sep
<show licensed processors>
Note: Deprecated, may be removed in future release.

Displays a list of virtual processors. This value is taken from the underlying OS and it depends on underlying hardware and operating system whether this value represents sockets, cores or supported threads.

If this option is used in combination with -cb parameter then two additional columns will be shown in the output for the number of sockets and number of cores. Currently SGE will enlist these values only if the corresponding operating system of execution host is Linux under kernel >= 2.6.16, or Solaris 10. Other operating systems or versions might be supported with the future update releases. In case these values won't be retrieved, '0' character will be displayed.

-sh
<show administrative hosts>
Displays the Grid Engine administrative host list.
-shgrp group
<show host group config.>
Displays the host group entries for the group specified in group.
-shgrpl
<show host group lists>
Displays a name list of all currently defined host groups which have a valid host group configuration.
-shgrp_tree group <show host group tree>
Shows a tree like structure of host group.
-shgrp_resolved group <show host group hosts>
Shows a list of all hosts which are part of the definition of host group. If the host group definition contains sub host groups than also these groups are resolved and the hostnames are printed.
-srqs [rqs_name_list]
<show RQS configuration>
Show the definition of the resource quota sets (RQS) specified by the argument.
-srqsl
<show RQS-list>
Show a list of all currently defined resource quota setss (RQSs).
-sm
<show managers>
Displays the managers list.
-so
<show operators>
Displays the operator list.
-sobjl obj_spec attr_name val
<show object list>
Shows a list of all configuration objects for which val matches at least one configuration value of the attributes whose name matches with attr_name.

Obj_spec can be "queue" or "queue_domain" or "queue_instance" or "exechost". Note: When "queue_domain" or "queue_instance" is specified as obj_spec matching is only done with the attribute overridings concerning the host group or the execution host. In this case queue domain names resp. queue instances are returned.

Attr_name can be any of the configuration file keywords enlisted in queue_conf(5) or host_conf(5). Also wildcards can be used to match multiple attributes.

Val can be an arbitrary string or a wildcard expression.

-sp pe_name
<show PE configuration>
Show the definition of the parallel environment (PE) specified by the argument.
-spl
<show PE-list>
Show a list of all currently defined parallel environments (PEs).
-sprj project
<show project>
Shows the definition of the specified project (see project(5)).
-sprjl
<show project list>
Shows the list of all currently defined projects.
-sq wc_queue_list
<show queues>
Displays one or multiple cluster queues or queue instances. A description of wc_queue_list can be found in sge_types(1).
-sql
<show queue list>
Show a list of all currently defined cluster queues.
-ss
<show submit hosts>
Displays the Grid Engine submit host list.
-ssconf
<show scheduler configuration>
Displays the current scheduler configuration in the format explained in sched_conf(5).
-sstnode node_path,...
<show share tree node>
Shows the name and shares of the specified share tree node(s) (see share_tree(5)). The node_path is a hierarchical path ([/]node_name[[/.]node_name...]) specifying the location of a node in the share tree.
-sstree
<show share tree>
Shows the definition of the share tree (see share_tree(5)).
-sst
<show formatted share tree>
Shows the definition of the share tree in a tree view (see share_tree(5)).
-sss
<show scheduler status>
Currently displays the host on which the Grid Engine scheduler is active or an error message if no scheduler is running.
-su acl_name
<show user ACL>
Displays a Grid Engine user access list (ACL).
-sul
<show user lists>
Displays a list of names of all currently defined Grid Engine user access lists (ACLs).
-suser user,...
<show user>
Shows the definition of the specified user(s) (see user(5)).
-suserl
<show users>
Shows the list of all currently defined users.
-tsm
<trigger scheduler monitoring>
The Grid Engine scheduler is forced by this option to print trace messages of its next scheduling run to the file $SGE_ROOT/$SGE_CELL/schedd_runlog. The messages indicate the reasons for jobs and queues not being selected in that run. Requires root or manager privileges.

Note, that the reasons for job requirements being invalid with respect to resource availability of queues are displayed using the format as described for the qstat(1) -F option (see description of Full Format in section OUTPUT FORMATS of the qstat(1) manual page.

Environmental Variables

Files

Grid Engine master host file

$SGE_ROOT/$SGE_CELL/common/act_qmaster

Queues Configuration Directory

See also

   $SGE_ROOT/$SGE_CELL/spool/qmaster/cqueues/    

 


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).


[Sep 19, 2014] Bug in Univa Grid engine

There are some differences between Univa and Sun Grid Engine 6.2u7 queue structure:

You can't directly import a queue from Oracle grid engine to Univa grid engine. The structure of the queue is slightly different.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: July 28, 2019