Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

SGE cheat sheet

News Grid Engine Recommended Links qstat qhost Submitting Jobs To Queue Instance Starting and Killing Daemons
qconf qping qacct qalter -- Change Job Priority qmod Creating and modifying SGE Queues Getting information about hosts
Configuring Hosts From the Command Line Creating and modifying SGE Queues Monitoring Queues and Jobs Submitting Jobs To Queue Monitoring and Controlling Jobs Humor Etc

 

SGE is a complex system and value of various tips depend on how you use it. Very few tips are univeral.

Derived from cheat sheet by Indy Siva (Mar 09, 2011) . See also Oliver Wiki SGE

Contents


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

Space is nessesary befor "-"(minus) in sge_qstat -- the default qstat options file

sge_qstat defines the command line switches that will be used by qstat by default. If available, the default sge_qstat file is read and processed by qstat(1).

There is a cluster global and a user private sge_qstat file. The user private file has the highest precedence and is followed by the cluster global sge_qstat file. Command line switches used with qstat(1) override all switches contained in the user private or cluster global sge_qstat file.

The default sge_qstat file may contain an arbitrary number of lines, although it is unclear what is the value of lines after the first. Blank lines and lines with a '#' sign at the first column are skipped. Each line can contain set of qstat(1) options. More than one option per line is allowed.

Here is an example of a sge_qstat default options file (note the leading blank before the first "-"):

=====================================================
# Just show me my own running and suspended jobs
 -s rs -u $USER
=====================================================
Having defined a default sge_qstat file like this and using qstat without parameters
qstat
has the same effect as if qstat was executed with:
qstat -s rs -u <current_user>

[Oct 07, 2014] Enabling schedd_job_info

Use command qconf -msconf


Recommended Tips

Top Visited

Bulletin Latest Past week Past month
Google Search



Commonly Used commands

qacct -j 9999  -- get information about finished job 9999
qconf -mhgrp @allhosts                   edit hostgroup "@allhosts"
qstat -f [-q \*@node23]                  full display info [for node23 only]
qconf -sq all.q                          show "all.q" queue info
qconf -mq all.q                          modify "all.q" queue: update hostlist, #slots
qconf -aq all.q                          create queue named "all.q"
qconf -mc

qconf -rattr queue slots 0 all.q@node23  #slots -> 0 (== pbsnodes -o)

qstat -s r -q all.q@node23               show all running jobs on node23

qhost -h node23,node24                   show host info for multiple nodes
qhost -q -h node23,node24                ibid, plus queue info

qmod -e all.q@node23                     enable node23 in queue all.q (-d == disable)

qsub -j y -o `pwd` -q all.q test.sh      submit test.sh job on queue all.q

qping -info node23 6445 execd 1          check status of execd on node23

qstat                                    current user jobs
qstat -u "*"                             all user jobs
qstat -g c                               show available nodes and load
qstat -f                                 detailed list of machines and job state 
qstat -explain c -j job-id               specific job status
qstat -f -u "*"

qdel job-id                              delete job
qsub -l h_vmem=### job.sh                mem limit, see queue_conf(5) RESOURCE LIMITS



qsub -w v job.00                         Troubleshoot problems with queue/scheduling

Adding and removing administrative privileges from a host

  • qconf -ah # gives host administrative privileges
  • qconf -dh # removes administrative privileges from host

    Adding an execution host

  • Removing an execution host

    Adding and removing submit hosts

  • qconf -as <hostname> # host is now a submit host
  • qconf -ds <hostname> # jobs may not be submitted from host

    Displaying current administrative/submit/execution hosts

  • qconf -sh # show current administrative hosts
  • qconf -ss # show current submit hosts
  • qconf -sel # show current execution host list

    Administering queues

  • qconf -aq <queuename> # adding a queue
  • qconf -dq <queuename> # delete a queue
  • qconf -mq <queuename> # modify a queue
  • qconf -Aq <filename> # adding a queue from file
  • qconf -mattr queue ... # change single attributes of more than one queue    
  • qalter -w v <jobid>
    This command enlists the reasons why a job is not dispatchable in principle.
    For this purpose a dry scheduling run is performed. 
    The special with this dry scheduling run is that all consumable resources 
    (also slots) are considered to be fully available for this job. Similarly all load values are ignored because they are varying.

    Job or Queue goes in error state "E"

    Job or queue errors are indicated by an uppercase "E" in the qstat output. A job enters the error state when Grid Engine tried to execute a job in a queue, but it failed for a reason that is specific to the job. A queue enters the error state when Grid Engine tried to execute a job in a queue, but it failed for a reason that is specific to the queue.
    
    Grid Engine offers a set of possiblities for users and administrators to get diagnosis information in case of job execution errors. Since both the queue and the job error state result from a failed job execution the diagnosis possibilities are applicable to both types of error states:
    
        
    Since Grid Engine 6.0 for jobs in error state a one-line error reason is available through
    qstat -j  | grep error
    With a 6.0 this is the recommended first source of diagnosis information for the end user.
  • For queues in error state a one-line error reason is available through
    qstat -explain E
    With a 6.0 this is the recommended first source of diagnosis information for administrators in case of queue errors.
  • user abort mail If jobs are submitted with the submit option "-m a" a abort mail is sent to the adress specified with the "-M user[@host]" option. The abort mail contains diagnosis information about job errors and are the recommended source of information for users.
  • qacct accounting If no abort mail is available the user can run
    qacct -j
    to get information about the job error from Grid Engine job accounting.
  • administrator abort mail An administrator can order admistrator mails about job execution problems by specifying an appropriate email adress (see under administrator_mail in sge_conf(5) ). Administrator mails contain more detailed diagnosis information than user abort mails and are the recommended in case of frequent job execution errors.
  • messages files If no administrator mail is available the Qmasters messages file should be first investigated. Loggings related to a certain job can be found by searching for the appropriate job ID. In the 'default' installation the Qmaster messages file is located at $SGE_ROOT/default/spool/qmaster/messages Additional information can be sometimes found in the messages of the Execd where the job was started. Use qacct -j <jobid> to figure out the host where the job was started and search in $SGE_ROOT/default/spool/<host>/messages for the jobid.
    
    
    

    Suspend and resubmit stalled jobs

    Reference: http://gis.fem-environment.eu/grid-engine-howto/

    # as user:
    qstat | grep neteler | tr -s ' ' ' '  | cut -d' ' -f2 > /tmp/to_suspend.sge cat /tmp/to_suspend.sge
    
    # as root (?):
    su -
    for i in `cat /tmp/to_suspend.sge` ; do qmod -sj $i ; done
    qstat
    
    # remove crashed blade from list of execution hosts:
    qconf -de blade14
    # delete host from list:
    qconf -mhgrp "@allhosts"
    # apply new list:
    qconf -shgrp "@allhosts"
    # verify queue stats: qstat -f # resubmit jobs to other nodes (as job user!!)
    for i in `cat /tmp/to_suspend.sge` ; do qresub $i ; done
    qstat
    
    This command send a signal to a running job :
    qmod -sj | -usf | -cd (suspend | unsuspend | clear error)
    
    qmod -sj 3312136
    
    qmod -usj 3312136 root - unsuspended job 3312136

    Parallel Environment Configuration

    qconf -sp pename 	Show the configuration for the specified parallel environment.
    qconf -spl 	Show a list of all currently configured parallel environments.
    qconf -ap pename 	Add a new parallel environment.
    qconf -Ap filename 	Add a parallel environment from file filename.
    qconf -mp pename 	Modify the specified parallel environment using an editor.
    qconf -Mp filename 	Modify a parallel environment from file filename.
    qconf -dp pename 	Delete the specified parallel environment.
    

    List of currently defined queues

    qconf -sql 

    How do I Control my jobs ?

    Based on the status of the job displayed, you can control the job by the following actions:

  • Job Priorities

    The Grid Engine software also sometimes lets users set priorities among their own jobs. A user who submits several jobs can specify, for example, that job 3 is the most important and that jobs 1 and 2 are equally important but less important than job 3.

    Clean up!

     /usr/local/SGE/bin/lx24-amd64/qmod -cq lam mpich2 long short nolimit  

    Recommended Links

    Softpanorama hot topic of the month

    Softpanorama Recommended



    Etc

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

    ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least


    Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

    The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: October 11, 2015