|Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
|May the source be with you, but remember the KISS principle ;-)|
|News||Grid Engine||Recommended Links||Listing all existing queues||Adding a new queue||Modifying queue parameters||Removing a queue|
|Some interesting queue params||queue_conf - Grid Engine queue configuration file format||slots queue attribute||Excluding SGE execution host from scheduling||Restricting number of slots per server||Slot hacking||sge_types|
|qconf||qdel||qhold||qalter -- Change Job Priority||qmod||qsub||qstat|
|SGE hostgroups||Execution hosts||Resource Quotas||Submitting binaries in SGE||SGE Submit Scripts||Submitting parallel OpenMPI jobs|
|Monitoring Queues and Jobs||Controlling Queues and Jobs||Most important optiona of qconf||Parallel environment||ulimit problem with infiniband in SGE||Humor||Etc|
The queue configuration parameters take as values strings, integer decimal numbers, booleans, or time and memory specifiers (see time_specifier and memory_specifier in sge_types(5)) as well as comma- separated lists.
Note, Grid Engine allows backslashes (\) be used to escape newline characters. The backslash and the newline are replaced with a space character before any interpretation.
The list of parameters below specifies the queue configuration file content.
For each parameter except qname and hostlist, it is possible to specify host-dependent values instead of a single value. This "enhanced queue configuration specifier syntax" takes the form parameter parameter_value[,[host_id=parameter_value]]... where host_id is a host_identifier, as defined in sge_types(5), and parameter_value is of the correct form for each parameter, as described below. Spaces are allowed around "," but not inside "", except within list-valued parameter_values.
An entry without brackets is always required as the default setting for all queue instances which don't override it. Tuples with a hostgroup_name (see sge_types(1)) host_id override the default setting. Tuples with a host_name host_id override both the default and the host group setting. As an example, PEs with different allocation rules may be specified according to the core count of different node types:
pe_list NONE,[@dual=all-mpi mpi-4],[@quad=all-mpi mpi-8]The queue configuration is rejected if a default setting is absent.
Ambiguous configurations (those with more than one attribute setting for a particular host) cause the relevant queue instances to go into a "configuration ambiguous" state and not accept jobs. This is reported as "c" by qstat(1) and qhost(1), and may be diagnosed with qstat -explain c. Configurations containing override values for hosts not in the execution host list are accepted as "detached", as indicated by the -sds argument of qconf(1).
Regardless of the queue_sort_method setting, qstat(1) reports queue information in the order defined by the value of the seq_no. Set this parameter to a monotonically increasing sequence. (Type: number; template default: 0.)
The syntax is that of a comma-separated list, with each list element consisting of the complex_name (see sge_types(5)) of a load value, an equal sign and the threshold value intended to trigger the overload situation (e.g. load_avg=1.75,users_logged_in=5).
Note: Load values as well as consumable resources may be scaled differently for different hosts if specified in the corresponding execution host definitions (refer to host_conf(5) for more information). Load thresholds are compared against the scaled load and consumable values. Boolean complexes can be used to set an alarm state with the value false, typically from a load sensor which checks a host's "health", e.g. load_avg=1.75,health=false.
Note, the value of priority has no effect if Grid Engine adjusts priorities dynamically to implement ticket-based entitlement policy goals. Dynamic priority adjustment is switched off by default due to sge_conf(5) reprioritize being set to false.
A set of processors in case of a multiprocessor execution host can be defined to which the jobs executing in this queue are bound. The value type of this parameter is a range description like that of the -pe option of qsub(1) (e.g. 1-4,8,10) denoting the processor numbers for the processor group to be used. Obviously the interpretation of these values relies on operating system specifics and is thus performed inside sge_execd(8) running on the queue host. Therefore, the parsing of the parameter has to be provided by the execution daemon and the parameter is only passed through sge_qmaster(8) as a string.
Currently, support is only provided for multiprocessor machines running Solaris, SGI multiprocessor machines running IRIX 6.2 and Digital UNIX multiprocessor machines. In the case of Solaris the processor set must already exist when this processors parameter is configured, so the processor set has to be created manually. In the case of Digital UNIX only one job per processor set is allowed to execute at the same time, i.e. slots (see below) should be set to 1 for this queue.
Jobs submitted with option -now y can only be scheduled on interactive queues, and -now n targets batch queues. -now y is the default for qsh, qrsh, and qlogin, while -now n is the default for qsub. Nevertheless, the option can be applied to all commands, with either argument, to direct jobs to specific queue types.
The formerly supported types parallel and checkpointing are not allowed anymore. A queue instance is implicitly of type parallel/checkpointing if there is a parallel environment or a checkpointing interface specified for this queue instance in pe_list/ckpt_list, and is implicitly BATCH if it has a parallel environment attached. Formerly possible settings e.g.
qtype PARALLELcould be changed to
qtype NONE pe_list pe_name(Type string; default: batch interactive.)
The type of this parameter is boolean, thus either TRUE or FALSE can be specified. The default is FALSE, i.e. do not restart jobs automatically.
The type of the parameter is string. The default is /bin/sh.
The default for shell_start_mode is posix_compliant. Note, though, that the shell_start_mode can only be used for batch jobs submitted by qsub(1) and can't be used for interactive jobs submitted by qrsh(1), qsh(1), qlogin(1).
#!/bin/sh # set the environment somehow exec "$@"It is, at best, tricky to write a proper substitute for the builtin method as a shell script, taking account of the variables above. It is probably best to do so in a non-macro expanded scripting language (or a compiled language, as appropriate).
The starter_method will not be invoked for qsh, qlogin, or qrsh acting as rlogin.
The same pseudo-variables can be expanded to compose the command as for the following methods.
If no executable path is given, Grid Engine takes the specified parameter entries as the signal to be delivered instead of the default signal. A signal must be either a positive number or a signal name with "SIG" as prefix and the signal name as printed by kill -l (e.g. SIGTERM).
If an executable path is given (it must be an absolute path starting with a "/") then this command, together with its arguments, is started by Grid Engine to perform the appropriate action. The following special variables are expanded at runtime and can be used (besides any other strings which have to be interpreted by the procedures) to compose a command line:
1. Queuewise subordination
A list of Grid Engine queue names in the format for queue_name in sge_types(1). Subordinate relationships are in effect only between queue instances residing at the same host. The relationship does not apply and is ignored when jobs are running in queue instances on other hosts. Queue instances residing on the same host will be suspended when a specified count of jobs is running in this queue instance. The list specification is the same as that of the load_thresholds parameter above, e.g. low_pri_q=5,small_q. The numbers denote the job slots of the queue that have to be filled in the superordinated queue to trigger the suspension of the subordinated queue. If no value is assigned, a suspension is triggered if all slots of the queue are filled.
On nodes which host more than one queue, you might wish to accord better service to certain classes of jobs (e.g., queues that are dedicated to parallel processing might need priority over low priority production queues). The default is NONE.
2. Slotwise preemption
Slotwise preemption provides a means to ensure that high priority jobs get the resources they need, while at the same time low priority jobs on the same host are not unnecessarily preempted, maximizing the host utilization. Slotwise preemption is designed to provide different preemption actions, but with the current implementation only suspension is provided. This means there is a subordination relationship defined between queues similar to the queue-wise subordination, but if the suspend threshold is exceeded, the whole subordinated queue is not suspended, only single tasks running in single slots.
As with queue-wise subordination, the subordination relationships are in effect only between queue instances residing at the same host. The relationship does not apply and is ignored when jobs and tasks are running in queue instances on other hosts.
The syntax is:
threshold =a positive integer number
queue =a Grid Engine queue name in the format for queue_name in sge_types(1).
"seq_no" =sequence number among all subordinated queues of the same depth in the tree. The higher the sequence number, the lower is the priority of the queue. Default is 0, which is the highest priority.
action =the action to be taken if the threshold is exceeded. Supported are: "sr": Suspend the task with the shortest run time. "lr": Suspend the task with the longest run time. Default is "sr".
Some examples of possible configurations and their functionalities:
a) The simplest configuration
which means the queue "B.q" is subordinated to the current queue (let's call it "A.q"), the suspend threshold for all tasks running in "A.q" and "B.q" on the current host is two, the sequence number of "B.q" is "0" and the action is "suspend task with shortest run time first". This subordination relationship looks like this:
A.q | B.qThis could be a typical configuration for a host with a dual core CPU. This subordination configuration ensures that tasks that are scheduled to "A.q" always get a CPU core for themselves, while jobs in "B.q" are not preempted as long as there are no jobs running in "A.q".
If there is no task running in "A.q", two tasks are running in "B.q" and a new task is scheduled to "A.q", the sum of tasks running in "A.q" and "B.q" is three. Three is greater than two, so this triggers the defined action. This causes the task with the shortest run time in the subordinated queue "B.q" to be suspended. After suspension, there is one task running in "A.q", one task running in "B.q", and one task suspended in "B.q".
b) A simple tree
subordinate_list slots=2(B.q:1, C.q:2)
This defines a small tree that looks like this:
A.q / \ B.q C.qA use case for this configuration could be a host with a dual core CPU and queue "B.q" and "C.q" for jobs with different requirements, e.g. "B.q" for interactive jobs, "C.q" for batch jobs. Again, the tasks in "A.q" always get a CPU core, while tasks in "B.q" and "C.q" are suspended only if the threshold of running tasks is exceeded. Here the sequence number among the queues of the same depth comes into play. Tasks scheduled to "B.q" can't directly trigger the suspension of tasks in "C.q", but if there is a task to be suspended, first "C.q" will be searched for a suitable task.
If there is one task running in "A.q", one in "C.q" and a new task is scheduled to "B.q", the threshold of "2" in "A.q", "B.q" and "C.q" is exceeded. This triggers the suspension of one task in either "B.q" or "C.q". The sequence number gives "B.q" a higher priority than "C.q", therefore the task in "C.q" is suspended. After suspension, there is one task running in "A.q", one task running in "B.q" and one task suspended in "C.q".
c) More than two levels
Configuration of A.q: subordinate_list slots=2(B.q) Configuration of B.q: subordinate_list slots=2(C.q)
looks like this:
A.q | B.q | C.q
These are three queues with high, medium and low priority. If a task is scheduled to "C.q", first the subtree consisting of "B.q" and "C.q" is checked, the number of tasks running there is counted. If the threshold which is defined in "B.q" is exceeded, the job in "C.q" is suspended. Then the whole tree is checked, if the number of tasks running in "A.q", "B.q" and "C.q" exceeds the threshold defined in "A.q" the task in "C.q" is suspended. This means, the effective threshold of any subtree is not higher than the threshold of the root node of the tree. If in this example a task is scheduled to "A.q", immediately the number of tasks running in "A.q", "B.q" and "C.q" is checked against the threshold defined in "A.q".
d) Any tree
A.q / \ B.q C.q / / \ D.q E.q F.q \ G.qThe computation of the tasks that are to be (un)suspended always starts at the queue instance that is modified, i.e. a task is scheduled to, a task ends at, the configuration is modified, a manual or other automatic (un)suspend is issued, except when it is a leaf node, like "D.q", "E.q" and "G.q" in this example. Then the computation starts at its parent queue instance (like "B.q", "C.q" or "F.q" in this example). From there first all running tasks in the whole subtree of this queue instance are counted. If the sum exceeds the threshold configured in the subordinate_list, in this subtree a task is sought to be suspended. Then the algorithm proceeds to the parent of this queue instance, counts all running tasks in the whole subtree below the parent, and checks if the number exceeds the threshold configured in the parent's subordinate_list. If so, it searches for a task to suspend in the whole subtree below the parent. And so on, until it did this computation for the root node of the tree.
For consumable resource attributes an available resource amount is determined by subtracting the current resource consumption of all running jobs in the queue from the quota in the complex_values list. Jobs can only be dispatched to a queue if no resource requests exceed any corresponding resource availability obtained by this scheme. The quota definition in the complex_values list is automatically replaced by the current load value reported for this attribute if load is monitored for this resource and if the reported load value is more stringent than the quota. This effectively avoids oversubscription of resources.
Note: Load values replacing the quota specifications may have become more stringent because they have been scaled (see host_conf(5)) and/or load adjusted (see sched_conf(5)). The -F option of qstat(1) and the load display in the qmon(1) queue control dialog (activated by clicking on a queue icon while the "Shift" key is pressed) provide detailed information on the actual availability of consumable resources and on the origin of the values taken into account currently.
Note also: The resource consumption of running jobs (used for the availability calculation) as well as the resource requests of the jobs waiting to be dispatched either may be derived from explicit user requests during job submission (see the -l option to qsub(1)) or from a "default" value configured for an attribute by the administrator (see complex(5)). The -r option to qstat(1) can be used for retrieving full detail on the actual resource requests of all jobs in the system.
For non-consumable resources Grid Engine simply compares the job's attribute requests with the corresponding specification in complex_values, taking the relation operator of the complex attribute definition into account (see complex(5)). If the result of the comparison is "true", the queue is suitable for the job with respect to the particular attribute. For parallel jobs each queue slot to be occupied by a parallel task is meant to provide the same resource attribute value.
Note: Only numeric complex attributes can be defined as consumable resources, hence non-numeric attributes are always handled on a per queue slot basis.
The default value for this parameter is NONE, i.e. no administrator defined resource attribute quotas are associated with the queue.
Note: Jobs can request queues with a certain calendar model via a "-l c=cal_name" option to qsub(1).
The resource limit parameters s_cpu and h_cpu are implemented by Grid Engine as a job limit. They impose a limit on the amount of combined CPU time consumed by all the processes in the job. If h_cpu is exceeded by a job running in the queue, it is aborted via a SIGKILL signal (see kill(1)). If s_cpu is exceeded, the job is sent a SIGXCPU signal which can be caught by the job. If you wish to allow a job to be "warned" so it can exit gracefully before it is killed, then you should set the s_cpu limit to a lower value than h_cpu. For parallel processes, the limit is applied per slot, which means that the limit is multiplied by the number of slots being used by the job before being applied.
The resource limit parameters s_vmem and h_vmem are implemented by Grid Engine as a job limit. They impose a limit on the amount of combined virtual memory consumed by all the processes in the job. If h_vmem is exceeded by a job running in the queue, it is aborted via a SIGKILL signal (see kill(1)). If s_vmem is exceeded, the job is sent a SIGXCPU signal which can be caught by the job. If you wish to allow a job to be "warned" so it can exit gracefully before it is killed, then you should set the s_vmem limit to a lower value than h_vmem. For parallel processes, the limit is applied per slot which means that the limit is multiplied by the number of slots being used by the job before being applied.
The remaining parameters in the queue configuration template specify per-job soft and hard resource limits as implemented by the setrlimit(2) system call. See this manual page on your system for more information. By default, each limit field is set to infinity (which means RLIM_INFINITY as described in the setrlimit(2) manual page). The value type for the CPU-time limits s_cpu and h_cpu is time. The value type for the other limits is memory. Note: Not all systems support setrlimit(2).
Note also: s_vmem and h_vmem (virtual memory) are only available on systems supporting RLIMIT_VMEM (see setrlimit(2) on your operating system).
sge_intro(1), sge_intro_types(1), csh(1), qconf(1), qmon(1), qrestart(1), qstat(1), qsub(1), sh(1), nice(2), setrlimit(2), access_list(5), calendar_conf(5), sge_conf(5), complex(5), host_conf(5), sched_conf(5), sge_execd(8), sge_qmaster(8), sge_shepherd(8).
See sge_intro(1) for a full statement of rights and permissions.
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: September, 12, 2017