Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Grid Engine Config Tips

News	Grid engine	Recommended Links	Installation Planning	Installation of the Master Host	Installation of the Execution Hosts	SGE Queues
Grid Engine Tuning	Changing SGE spool to local directory	Optimizing usage of NFS in Grid Engine	Grid Engine Config Tips	SGE Parallel Environment	Configuring Hosts From the Command Line	SGE Commands Reference
Backup of SGE configuration	SGE implementations	Migrating from sge 6.2u5 to Son of Grid engine 8.1.7	Migrating from Univa 8.1.7 to Son of Grid engine 8.1.7
Enterprise Unix System Administration	Perl Admin Tools and Scripts	Duke University Tools for SGE	Simple Unix Backup Tools	Sysadmin Horror Stories	Humor	Etc

To reboot execution hosts, you need to ensure they’re empty and avoid races with job submission.

Most commands in SGE are structured using the following template of options

-A|a (add)
-D|d (delete)
-M|m (modify)
-s (show)

Capitalized argument means ‘read in from file’. Lowercase means ‘do it interactively’. All SGE commands generally follow this structure

The command-line user interface is a set of ancillary programs (commands) that enable you to do the following tasks:

Manage queues
Submit and delete jobs
Check job status
Suspend or enable queues and jobs

Handy aliases

Here are some handy aliases I find useful in my ~/.bashrc file:

alias qcall='qconf -mq all.q'
alias qerrors='qstat -f -explain E'
alias qsummary='qstat -g c'
alias qclear='qmod -c "*"'

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20141110 : Enabling qstat -j infomation ( softpanorama.org, Nov 10, 2014 )
20140920 : Slot hacking in SGE ( Sep 20, 2014 )
20140920 : Grid Engine Config Tips by Gavin Burris ( Grid Engine Config Tips, )
20140920 : Grid Engine Configuration Recipes ( Grid Engine Configuration Recipes, )

Old News ;-)

[Nov 10, 2014] Enabling qstat -j infomation

[Sep 20, 2014] Slot hacking in SGE

Typically cluster is fully loaded with job almost all the time and there is a difficulty in submitting small (1-2 min) testing jobs running on more then ne node.

bioteam.net

1. Same queue structure as before (see bioteam.net)

2. Attach "slots=2" as a host resource on all nodes

3. Submit test jobs to all queues

The wizard solution:
􀀁 qconf -aattr exechost complex_values slots=2 <host>
􀀁 What did we do?
􀀁 Slot limits "solve" the oversubscription problem
􀀁 Still have these problems:
􀀁 FIFO job execution
􀀁 Priority is handled by OS after SGE scheduling
􀀁 We can still do better (stay tuned)…

Grid Engine Config Tips by Gavin Burris

Posted 17th December 2011

Here are some of the Grid Engine configuration steps we should take on a new install. I recommend doing all of these from the very beginning, to prevent changes that may confuse or break user workflow.

There is one thing we must always do with a new compute cluster, and that is enable hard memory limits. Users are usually not too keen on any kind of limit, because jobs will eventually run into them. Once the realization is made that limits ensures node stability and uptime, users will demand them. Without limits, one bad job can crash a node and bring down many other jobs.

To enable hard memory limits, we modify the complex configuration to make h_vmem requestable.
# qconf -mc
h_vmem              h_vmem     MEMORY      <=    YES         YES        1g       0
Once this complex is set, it is a good idea to define a default option for qsub in the $SGE_ROOT/default/common/sge_request file. When enabling h_vmem, we should also set a default value for h_stack. h_vmem sets a limit on virtual memory, while h_stack sets a limit on stack space for binary execution. Without a sufficient value for h_stack, programs like Python, Matlab or IDL will fail to start. Here, we are also binding each job to a single core.
-binding linear:1
-q all.q
-l h_vmem=1g
-l h_stack=128m
If we want to manually set values for each individual node, like slots and memory, a for-loop is very helpful.
# qconf -rattr exechost complex_values slots=8,num_proc=8,h_vmem=8g node01
# for ((I=1; I <= 16 ; I++)); do 
> NODE=`printf "node%02d\n" $I`
> MEM=`ssh $NODE 'free -b |grep Mem |cut -d" " -f 5'`
> SWAP=`ssh $NODE 'free -b |grep Swap |cut -d" " -f 4'`
> VMEM=`echo $MEM+$SWAP|bc`
> qconf -rattr exechost complex_values slots=8,num_proc=8,h_vmem=$VMEM $NODE
> done
To submit a job with a 4 gig limit, use the -l command line option.
$ qsub -l h_vmem=4g -l h_stack=256m myjob.sh
To see available memory, use qstat.
$ qstat -F h_vmem
It is also a good idea to place limits on the amount of memory any single process on the login node may allocate, in the /etc/security/limits.conf file. This example will limit any user in the clusterusers group to 4 gigs per process. Anything larger should be ran via qlogin. When adding new users, make sure to add them to this now default group.
# limit any process to 4GB = 1024*1024*4KB = 4194304
@clusterusers      hard    rss             4194304
@clusterusers      hard    as              4194304
There should also be a limit on how many jobs a single user can queue at once. If a user must submit over 2000 jobs simultaneously, they may want to consider a more manageable workflow utilizing array jobs.
# qconf -mconf
max_u_jobs 2000
If we want to limit the number of jobs a single user can have in the running state simultaneously:
# qconf -msconf
 max_reservation 128
 maxujobs 128
If the queue will be accepting multi-slot parallel jobs, slot reservation should be enabled to prevent starvation. Otherwise, single-slot jobs will constantly fill in space ahead of the big job. This can be done by submitting multi-slot jobs with the "-R y" option.
To enable a simple fairshare policy between all users, there are only three options to check:
# qconf -mconf
enforce_user auto
auto_user_fshare 100
# qconf -msconf
weight_tickets_functional 10000
To be a bit more verbose, we should collect some job scheduler info.
# man sched_conf
# qconf -msconf
 schedd_job_info true
Now we can see why or why not a job is scheduled.
$ qstat -j 427997
$ qacct -j 427997
If we plan to allow graphical GUI programs in the queue, we must setup a qlogin wrapper script with proper X11 forwarding.
# vim /usr/global/sge/qlogin_wrapper
# chmod +x /usr/global/sge/qlogin_wrapper
qlogin_wrapper:
#!/bin/sh
HOST=$1
PORT=$2
shift
shift
echo /usr/bin/ssh -Y -p $PORT $HOST
/usr/bin/ssh -Y -p $PORT $HOST
Set the qlogin wrapper and ssh shell:
# qconf -mconf
 qlogin_command /usr/global/sge/qlogin_wrapper
 qlogin_daemon /usr/sbin/sshd -i
If we have a floating license server with a limited number of seats, we will want to configure a consumable complex resource. When a user submits a job, the qsub option '-l idl=1' must be used. In this example, the number of jobs that specify idl will be limited to 15 at any one time.
# qconf -mc
matlab ml INT <= YES YES 0 0
idl idl INT <= YES YES 0 0
# qconf -me global
complex_values matlab=10,idl=15
If we want to have multiple queues across the same hosts, we can define a policy so that nodes do not become oversubscribed.
# qconf -arqs
{
    name         limit_slots_to_cores_rqs
    description  Prevents core oversubscription across queues. 
    enabled      TRUE    
    limit        hosts {*} to slots=$num_proc
}

Grid Engine Configuration Recipes

Dave Love
2013-08-30
Table of Contents
Script Execution

Unix behaviour

Modules environment

Parallel Environments

Heterogeneous/Isolated Node Groups

JSVs

Wildcarded PEs

Checking for Windows-style line endings in job scripts

Scheduling Policies

Host Group Ordering

Fill Up Hosts

Avoiding Starvation (Resource Reservation/Backfilling)

Fair Share

Resource Management

Slot Limits

Memory Limits

Licence Tokens

Killing Detached Processes

Core Binding

Administration

Maintenance Periods

Rebooting execution hosts
Broken/For-testing Hosts

This is a somewhat-random collection of commonly-required configuration recipes. It is written mainly from the point of view of high performance computing clusters, and some of the configuration suggestions may not be relevant in other circumstances or for old versions of gridengine. See also the other howto documents. Suggestions for additions/corrections are welcome (to d.love @ liverpool.ac.uk).

Script Execution

Unix behaviour

Set shell_start_mode to unix_behavior in your queue configurations to get the normally-expected behaviour of starting job scripts with the shell specified in the initial #! line.

Modules environment

A side-effect of unix_behaviour is usually not getting the normal login environment, specifically not with the module command available for those sites that use environment modules. At least for use with the bash shell, add the following to the site sge_request file to avoid users having to source modules.sh etc. in job scripts, assuming sessions from which jobs are submitted have modules available:
-v module -v MODULESHOME -v MODULEPATH
This may not work with other shells

Parallel Environments

Heterogeneous/Isolated Node Groups

Suppose you have various sets of compute nodes over which you want to run parallel jobs, but each job must be confined to a specific set. Possible reasons are that you have

significantly heterogeneous nodes (different CPU speeds, architectures, core numbers, etc.),

groups of nodes which have restricted access, e.g. dedicated to a user group, controlled by an ACL,

or islands of connectivity on your MPI fabric(s) which are either actually isolated or have slow communication boundaries over switch layers.

Then you'll want to define multiple parallel environments and host groups. There will typically be one PE and one host group (with possibly an ACL) for each node type or island. The PEs will all be the same, unless you want a different fixed allocation_rule for each, but with different names. The names need to be chosen so that you can conveniently use wildcard specifications for them. Normally the names will all have the same name base, e.g. mpi-…. As an example, for different architectures, with different numbers of cores which get exclusive use for any tightly integrated MPI:
$ qconf -sp mpi-8
pe_name            mpi-8
slots              99999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    8
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE
qsort_args         NONE
$ qconf -sp mpi-12
pe_name            mpi-12
slots              99999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    12
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE
qsort_args         NONE
with corresponding host groups @quadcore and @hexcore for each type of dual-processor box. Those PEs are assigned one-to-one to host groups, ensuring that jobs can't run across the groups (since parallel jobs are always granted a unique PE by the scheduler, whereas they can be split across queues).
$ qconf -sq parallel
...
seq_no    2,[@quadcore=3],[@hexcore-eth=4],...
...
pe_list   NONE,[@quadcore=make mpi-8 smp],[@hexcore=make mpi-12 smp],...
...
slots     0,[@quadcore=8],[@hexcore=12],...
...
Now the PE naming comes in useful, since you can submit to a wildcarded PE, -pe mpi-*, if you're not fussy about the PE you actually get. See Wildcarded PEs for the next step.

Suppose you want to retain the possibility of running across all the PEs (assuming they're not isolated). Then you can define an extra PE, say allmpi, which isn't matched by the wildcard.

Note SGE 8.1.1+ allows general PE wildcards (actually patterns), as documented, fixing the bug which meant that only * was available in older versions. The correct treatment might be useful with such configurations, e.g. selecting mpi-[1-4].

JSVs

See jsv(1) and jsv_script_interface(3) for documentation on job submission verifiers, as well as the examples in $SGE_ROOT/util/resources/jsv/.

See also the Resource Reservation section.

Wildcarded PEs

When you would use a wildcarded PE as above, for convenience and abstraction, you can use a JSV to write the wildcard pattern. This JSV fragment from jsv_on_verify in Bourne shell re-writes -pe mpi to -pe mpi-*:
if [ $(jsv_is_param pe_name) = true ]; then
pe=$(jsv_get_param pe_name)
...
case $pe in
    mpi)
    jsv_set_param pe_name "$pe-*"
    pe="$pe-*"
    modified=1
    ...
Suppose you want to retain the possibility of running across all the PEs (assuming the groups aren't isolated). Then you can define an extra PE, say allmpi, which isn't re-written by the JSV.

Checking for Windows-style line endings in job scripts

Users sometimes transfer job scripts from MS Windows systems in binary mode, so that they end up with CRLF line endings, which typically fail, often with a rather obscure failure to execute if shell_start_mode is set to unix_behavior. The following fragment from jsv_on_verify in a shell script JSV prevents submitting such job scripts
# Avoid having to use literal ^M
ctrlM=$(printf "\15")
...
jsv_on_verify () {
...
  cmd=$(jsv_get_param CMDNAME)
  case $(jsv_get_param b) in y|yes) binary=y;; esac
  [ "$cmd" != NONE -a "$cmd" != STDIN -a "$binary" != y ] &&
      [ -f "$cmd" ] &&
      grep -q "$ctrlM\$" "$cmd" &&
      # Can't use multi-line messages, unfortunately.
      jsv_reject "\
  Script has Windows-style line endings; transfer in text mode or use dos2unix"
...
Scheduling Policies

See sched_conf(5) for detailed information on the scheduling configuration.

Host Group Ordering

To change scheduling so that hosts in different host groups are preferentially used in some defined order, set queue_sort_method to seqno:
$ qconf -ssconf
...
queue_sort_method seqno
...
and define the ordering in the relevant queue(s) as required with seq_no):
$ qconf -sq ...
...
seq_no 10,[@group1=4],[@group2=3],...
...
It is possible to use seqno, for instance, to schedule serial jobs preferentially to one 'end' of the hosts and parallel jobs to the other 'end'.

Fill Up Hosts

To schedule preferentially to hosts which are already running a job, as opposed to the default of roughly round robin according to the load level, change the load_formula:
$ qconf -ssconf
...
queue_sort_method load
load_formula slots
...
assuming the slots consumable is defined on each node.

Reasons for compacting jobs onto a few nodes as possible include avoiding fragmentation (so that parallel jobs which require complete nodes have a better chance of being fitted in), or being able to power down complete unused nodes.

Note Scheduling is done according to load values reported at scheduling time, without lookahead, so that it only takes effect over time.

Since the load formula is used to determine scheduling when hosts are equal according to queue_sort_method, you can schedule to the preferred host groups by seqno as above, and still compact jobs onto the nodes using slots in the load formula, as above, i.e. with this configuration:
$ qconf -ssconf
...
queue_sort_method seqno
load_formula slots
...
Avoiding Starvation (Resource Reservation/Backfilling)

To avoid "starvation" of larger, higher-priority jobs by smaller, lower-priority ones (i.e. the smaller ones always run in front of the larger ones) enable resource reservation by setting max_reservation to a reasonable value (maybe around 100), and arrange that relevant jobs are submitted with -R y, e.g. using a JSV. Here is a JSV fragment suitable for client side use, to add reservation to jobs over a certain size, assuming that PE slots is the only relevant resource:
    if [ $(jsv_is_param pe_name) = true ]; then
        pe=$(jsv_get_param pe_name)
        pemin=$(jsv_get_param pe_min)
...
        # Check for an appropriate pe_min with no existing reservation.
        if [ $(jsv_is_param R) = false ]; then
            if [ $pemin -ge $pe_min_reserve ]; then
                jsv_set_param R y
                modified=1
            fi
        fi
Note For "backfilling" (shorter jobs can fill the gaps before reservations) to work properly with jobs which do not specify an h_rt value at submission, the scheduler default_duration must be set to a value other then the default infinity, e.g. to the longest runtime you allow.

To monitor reservations, set MONITOR=1 in sched_conf(5) params and use qsched(1) after running process_scheduler_log; qsched -a summarizes all the current reservations.

Fair Share

It is often required to provide a fair share of resources in some sense, whereby heavier users get reduced priority. There are two SGE policies for this. The share tree policy assigns priorities based on historical usage with a specified lifetime, and the functional policy only takes current usage into account, i.e. is similar to the share tree with a very short decay time. (It isn't actually possible to use the share tree with a lifetime less than one hour.) Use one of the other, but not both to avoid confusion.

With both methods, ensure that the default scheduler parameters are changed so that weight_ticket is a lot larger than weight_urgency and weight_priority or set the latter two to zero if you don't need them. Otherwise it is possible to defeat the fair share by submitting with a high priority (-p) or with resources with a high urgency attached. See sge_priority(5) for details.

You may also want to set ACCT_RESERVED_USAGE in execd_params to use effectively 'wall clock' time in the accounting that determines the shares.

Functional

For simple use of the functional policy, add
weight_tickets_functional 10000
to the default scheduling parameters (qconf -msconf) and define a non-zero fshare for each user (qconf -muser). If you use enforce_user auto in the configuration,
auto_user_fshare 1000
could be used to set up automatically-created users (new ones only).

Warning enforce_user auto implies not using CSP security, which typically is not wise.

Share Tree

See share_tree(5).

To make a simple tree, use qconf -Astree with a file with contents similar to:
id=0
name=Root
type=0
shares=1
childnodes=1
id=1
name=default
type=0
shares=1000
childnodes=NONE
and give the share tree policy a high weight, like (qconf -msconf):
weight_tickets_share 10000
If you have auto-creation of users (see the warning above), you probably want to ensure that they are preserved with:
auto_user_delete_time 0
The share tree usage decays with a half-life of 7 days by default; modify halftime (specified in hours) to change it.

Resource Management

Slot Limits

You normally want to prevent over-subscription of cores on execution hosts by limiting the slots allocated on a host to its core (or actually processor) count - where "processors" might mean hardware threads. There are multiple ways of doing so, according to taste, administrative convenience, and efficiency.

If you only have a single queue, you can get away with specifying the slot counts in the queue definition (qconf -mconf), e.g. by host group
   slots 0,[@hexcore=12],[@quadcore=8]...
but with multiple queues on the same hosts, you may need to avoid over-subscription due to contributions from each queue.

An easy way for an inhomogeneous cluster is with the following RQS (with qconf -arqs), although it may lead to slow scheduling in a large cluster:
{
   Name         host-slots
   description  restrict slots to core count
   enabled      true
   limit        hosts {*} to slots=$num_proc
}
This would probably be the best solution if num_proc, the processor count, is variable by turning hardware threads on and off.

Alternatively, with a host group for each hardware type, you can use a set of limits like
   limit        hosts {@hexcore} to slots=12
   limit        hosts {@quadcore} to slots=8
which will avoid the possible scheduling inefficiency of the $num_proc dynamic limit.

Finally, and possibly the most foolproof way in normal situations is to set the complex on each host, e.g.
$ for n in 8 16; do qconf -mattr exechost complex_values slots=$n \
   `qconf -sobjl exechost load_values "*num_proc=$n*"`; done
Memory Limits

Normally it is advisable to prevent jobs swapping. To do so, make the h_vmem complex consumable, and give it a default value that is (probably slightly less than) the lowest memory/core that you have on execution hosts, e.g.:
$ qconf -sc | grep h_vmem
h_vmem              h_vmem       MEMORY      <=      YES         YES        2000m    0
(See complex(5) and the definition of memory_specifier.)

Also set h_vmem to an appropriate value on each execution host, leaving some head-room for system processes, e.g. (with bash-style expansion):
$ qconf -mattr exechost complex_values h_vmem=31.3G node{1..32}
Then single-process jobs can't over-subscribe memory on the hosts-at least the jobs on their own-and multi-process ones can't over-subscribe long term (see below).

Jobs which need more than the default (2000m per slot above) need to request it at job submission with -l h_vmem=…, and may end up under-subscribing hosts' slots to get enough memory in total.

Each process is limited by the system to the requested memory (see setrlimit(2)), and attempts to allocate more will fail. If it is a stack allocation, the program will typically die; if it is an attempt to malloc(3) too much, well-written programs should report an allocation failure. Also, the qmaster tracks the total memory accounted to the job, and will kill it if allocated memory exceeds the total requested.

These mechanisms are not ideal in the case of MPI-style jobs, in particular. The rlimit applied is the h_vmem request multiplied by the slot count for the job on the host, but it is for each process in the job-the limit does not apply to the process tree as a whole. This means that MPI processes, for instance, can over-subscribe in the PDC_INTERVAL before the execd notices, and out-of-memory system errors may still occur. Future use of memory control groups will help address this on Linux.

Note Killing by qmaster due to the memory limit may occur spuriously, at least under Linux, if the execd over-accounts memory usage. Older SGE versions, and possibly newer ones on old Linux versions, use the value of VmSize that Linux reports (see proc(5)); that includes cached data, and takes no account of sharing. The current SGE uses a more accurate value if possible (see execd_params USE_SMAPS). Also, if a job maps large files into memory (see mmap(2)), that may cause it to fail due to the rlimit, since that counts the memory mapped data, at least under Linux. A future version of SGE is expected to provide control over using the rlimit.

Note Suspended jobs contribute to the h_vmem consumed on the host, which may need to be taken into account if you allow jobs to preempt others by suspension.

Note Setting h_vmem can cause trouble with programs using pthreads(7), typically appearing as a segmentation violation. This is apparently because the pthreads runtime (at least on GNU/Linux) defines a per-thread stack from the h_vmem limit. The solution is to specify a reasonable value for h_stack in addition; typically a few 10s to 100 or so of MB is a good value, but may depend on the program.

Note There is also an issue with recent OpenJDK Java. It allegedly tries to allocate 1/4 of physical memory for the heap initially by default, which will fail with typical h_vmem on recent systems. The (only?) solution is to use java -XmxN explicitly, with N derived from h_vmem.

Licence Tokens

For managing Flexlm licence tokens, see Olesen's method. This could be adapted to similar systems, assuming they can be interrogated suitably. There's also the licence juggler for multiple locations.

Killing Detached Processes

If any of a job's processes detach themselves from the process tree under the shepherd, they are not killed directly when the job terminates. Use ENABLE_ADDGRP_KILL to turn on finding and killing them at job termination. It will probably be on by default in a future version.

Core Binding

Binding processes to cores (or 'CPU affinity') is normally important for performance on 'modern' systems (in the mainstream at least since the SGI Origin). Assuming cores are not over-subscribed, a good default (since SGE 8.0.0c) is to set a default in sge_request(5) of
-binding linear:slots
The allocated binding is accessible via SGE_BINDING in the job's environment, which can be assigned directly to GOMP_CPU_AFFINITY for the benefit of the GNU OpenMP implementation, for instance. If you happen to use OpenMPI, good defaults matching the SGE -binding are (at least for OpenMPI 1.6):
rmaps_base_schedule_policy = core
orte_process_binding = core
Administration

Maintenance Periods

Rejecting Jobs

In case you want to drain the system, adding $SGE_ROOT/util/resources/jsv/jsv_reject_all.sh as a server JSV will reject all jobs at submission with a suitable message.

Down Time

If you want jobs to stay queued, there are two approaches to avoid starting ones that might run into a maintenance period, assuming you enforce a runtime limit and the maintenance won't start any sooner than that period: a calendar and an advance reservation.

Calendar

You can define a calendar for the shutdown period and attach it to all your queues, e.g.
# qconf -scal shutdown
calendar_name    shutdown
year             6.9.2013-9.9.2013=off
week             NONE
# qconf -mattr queue calendar shutdown serial parallel
root@head modified "serial" in cluster queue list
root@head modified "parallel" in cluster queue list
Note To get the scheduler to look ahead to the calendar, you need to enable resource reservation (issue #493) but that reservation may interact poorly with calendars (issue #722), but it's not clear whether this is still a problem.

Advance reservation

Define a fake PE with allocation_rule 1 and access only by the admin ACL, say, and attach it to all your hosts, possibly via a new queue if you already have a complex pe_list setup:
$ qconf -sp all
slots              99999
user_lists         admin
...
allocation_rule    1
...
$ qconf -sq shutdown
qname                 shutdown
hostlist              @allhosts
...
pe_list               all
...
Now you can make an advance reservation (assuming max_advance_reservations allows it, and you're in arusers as well as admin):
$ qrsub -l exclusive -pe all $(qselect -pe all|wc -l) -a 201309061200 -d 201309091200
Rebooting execution hosts

To reboot execution hosts, you need to ensure they're empty and avoid races with job submission. Thus, submit a job which requires exclusive access to the host and then does a reboot. Since you want to avoid root being able to run jobs for security reasons, use sudo(1) with appropriate settings to allow password-less executions of the commands by the appropriate users. You want to comment out Defaults requiretty from /etc/sudoers, add !requiretty to the relevant policy line, or use -pty y on the job submission. It is cleanest to shut down the execd before the reboot.

The job submission parameters will depend on what is allowed to run on the hosts in question, but assuming you can run SMP jobs on all hosts (some might not be allowed serial jobs), a suitable job might be
qsub -pe smp 1 -R y -N boot-$1 -l h=$node,exclusive -p 1024 -l h_rt=60 -j y <<EOF
/usr/bin/sudo /sbin/service sgeexecd.ulgbc5 softstop
/usr/bin/sudo /sbin/reboot
EOF
where $node is the host in question, and we try to ensure the job runs early by using a resource reservation and a high priority.

Broken/For-testing Hosts

Administrative Control

A useful tactic for dealing with hosts which are broken, or possibly required for testing and not available to normal users is to make a host group for them, say @testing (qconf -ahgrp testing), and restrict access to it only to admin users with an RQS rule like
   limit users {!@admin} hosts {@testing} to slots=0
It can also be useful to have a host-level string-valued complex (say comment or broken) with information on the breakage, say with a URL pointing to your monitoring/ticketing system. A utility script can look after adding to the host group, setting the complex and, for instance, assigning downtime (in Nagios' terms) for the host in your monitoring system.

Alternatively the RQS could control access on the basis of the broken complex rather than using host group separately.

A monitoring system like Nagios (which has hooks for such actions and is allowed admin access to the SGE system) can set the status as above when it detects a problem.

Using a restricted host group or complex is more flexible than disabling the relevant queues on the host, as sometimes recommended; that stops you running test jobs on them and can cause confusion if queues are disabled for other reasons.

Using Alarm States

As an alternative to explicitly restricting access as above, one can put a host into an alarm state to stop it getting jobs. This can be done by defining an appropriate complex and a load formula involving it, along with a suitable load sensor. The sensor executes periodic tests, e.g. using existing frameworks, and sets the load value high via the complex if it detects an error. However, since it takes time for the load to be reported, jobs might still get scheduled for a while after the problem occurs.

Running tests could also be done in the prolog potentially to set the queue into an error state before trying to run the job. However, that is queue-specific, and the prolog only runs on parallel jobs' master node.

Copyright © 2012, 2013, Dave Love, University of Liverpool

Licence GFDL (text), GPL (code).
Last updated 2014-02-27 15:36:43 GMT

[Jun 13, 2014] Other ideas

HowTo:

We want MPI jobs to eat all of the Infiniband on a node, so that no two MPI jobs can run on the same node. However, we want to be able to have a bunch of instances of the same job on the same node. Solution: Its complicated, but see Daniel Templeton's blog for how to do this.

[Jun 13, 2014] Brown CS Using GridEngine

Helpful Hints
Current Working Directory
To ensure that your job runs in the directory from which you submit it (and to ensure that it's standard output and error files land there) use the -cwd option:
   % qsub -cwd runme
Running Now
If you want GridEngine to run your job now or else fail, give it the -now option:
   % qsub -now y runme
Embedding Options
You don't have to remember all the qsub options you need for every job you run. You can embed them in your script:
% cat runme
#!/bin/sh
#
#  Execute from the current working directory
#$ -cwd
#
#  This is a long-running job
#$ -l inf
#
#  Can use up to 6GB of memory
#$ -l vf=6G
#
~/project/sim
With all the options in the script, executing it is simple:
   % qsub runme
You can, of course, still use command-line arguments to augment or override embedded options.
Mail Notification
To receive email notifications about your job, use the "-m" option:
   % qsub -m as runme
In the example above, you will get mail if the job aborts or is suspended. The mail options are:
   a - abort
   b - begin
   e - exit
   s - suspend
Deleting Your Jobs
Deleting your submitted jobs can be done with the qdel command:
   % qdel job-id
    The specified job-id is deleted.
   % qdel -u username
    All the jobs by usrename are deleted.
Users can only delete their own jobs.

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March, 12, 2019

Grid Engine Config Tips

To reboot execution hosts, you need to ensure they’re empty and avoid races with job submission.

Handy aliases

Old News ;-)

[Nov 10, 2014] Enabling qstat -j infomation

See also Enabling scheduling information in qstat -j

[Sep 20, 2014] Slot hacking in SGE

Grid Engine Config Tips by Gavin Burris

Posted 17th December 2011

Script Execution

Unix behaviour

Modules environment

Parallel Environments

Heterogeneous/Isolated Node Groups

JSVs

Wildcarded PEs

Checking for Windows-style line endings in job scripts

Scheduling Policies

Host Group Ordering

Fill Up Hosts

Avoiding Starvation (Resource Reservation/Backfilling)

Fair Share

Functional

Share Tree

Resource Management

Slot Limits

Memory Limits

Licence Tokens

Killing Detached Processes

Core Binding

Administration

Maintenance Periods

Rejecting Jobs

Down Time

Calendar

Advance reservation

Rebooting execution hosts

Broken/For-testing Hosts

Administrative Control

Using Alarm States

[Jun 13, 2014] Other ideas

[Jun 13, 2014] Brown CS Using GridEngine

Google matched content

Softpanorama Recommended