Configuring queues on Torque

Inspecting the jobs in the queue

There are several commands that will give you detailed information about the jobs in the batch system.

Command Task useful flags
showq List jobs in queue

-r -- only running jobs

-i -- only idle jobs

-b -- only blocked jobs

-u username -- this user only

qstat List jobs in queue

-f jobid -- list details

-n -- list nodes assigned to job

While both showq and qstat do the same task the output is quite different, showq has the nice feature of sorting the jobs with respect to time to completion which makes it easy to see when resources will become available

qmgr is the torque management command

Friendly advice: backup your working config before modifying the setup:

# qmgr -c "print server" > /tmp/pbsconfig.txt
Roll back to escape from a messed up system:
# qterm; pbs_server -t create
# qmgr < /tmp/pbsconfig.txt

This will bring you back to where you started. Remark: this will wipe the whole queue setup and all currently queued and running jobs will be lost!

The default batch configuration from the torque-roll is saved in /opt/torque/pbs.default. Do this to get back the original setup that came with the torque-roll:

# qterm; pbs_server -t create
# qmgr < /opt/torque/pbs.default

> qmgr -c 'p s'

#

# Create queues and set their attributes.

#

#

# Create and define queue batch

#

create queue batch

set queue batch queue_type = Execution

set queue batch resources_default.nodes = 1

set queue batch resources_default.walltime = 01:00:00

set queue batch enabled = True

set queue batch started = True

#

# Set server attributes.

#

set server scheduling = True

set server acl_hosts = kmn

set server managers = ken@kmn

set server operators = ken@kmn

set server default_queue = batch

set server log_events = 511

set server mail_from = adm

set server node_check_rate = 150

set server tcp_timeout = 6

set server mom_job_sync = True

set server keep_completed = 300

A single queue named batch and a few needed server attributes are created.

2.5.2 pbs_server -t create

The -t create option instructs pbs_server to create the serverdb file and initialize it with a minimum configuration to run pbs_server.

> pbs_server -t create

To see the configuration and verify that Torque is configured correctly, use qmgr:

> qmgr -c 'p s'

#

# Set server attributes.

#

set server acl_hosts = kmn

set server log_events = 511

set server mail_from = adm

set server node_check_rate = 150

set server tcp_timeout = 6

2.5.3 Setting Up the Environment for pbs_server and pbs_mom

The pbs_environment file (default location: TORQUE_HOME/pbs_environment) will be sourced by pbs_mom and pbs_server when they are launched. If there are environment variables that should be set for pbs_server and/or pbs_mom, they can be placed in this file.


  • Top Visited
    Switchboard
    Latest
    Past week
    Past month

    NEWS CONTENTS

    Old News ;-)

    [Mar 20, 2020] Adding and Removing Nodes to a specific queue for PBS-Pro

    Feb 13, 2019

    This is a quick command to add nodes to a specific queue

    # qmgr -c "set node node-name queue=queue-name"

    To remove nodes from a specific queue

    # qmgr -c "unset node node-name queue"

    TORQUE ROLL DOCUMENTATION

    > Example: express queue

    Goal: Supporting development and job script testing, but prevent misuse

    Basic philosophy:

    Create the queue with qmgr:

    create queue express
    set queue express queue_type = Execution
    set queue express resources_max.walltime = 08:00:00
    set queue express resources_default.nodes = 1:ppn=8
    set queue express resources_default.walltime = 08:00:00
    set queue express enabled = True
    set queue express started = True
    

    Increase the priority and limit the usage:

    CLASSWEIGHT             1000
    CLASSCFG[express] PRIORITY=1000 MAXIJOB=1  MAXJOBPERUSER=1 QLIST=express QDEF=express
    QOSCFG[express] FLAGS=IGNUSER
    

    This will allow users to test job scripts and run interactive jobs with good turnaround by submitting to the express queue, qsub -q express ........ At the same time misuse is prevented since only 1 running job is allowed per user

    [Jul 6, 2011] Torque jobs does not enter "E" state (unless "qrun")

    Jobs I add to the queue stays there in "Queued" state without attempts to be executed (unless I manually qrun them)

    /var/spool/torque/server_logs say just
    04/11/2011 12:43:27;0100;PBS_Server;Job;16.localhost;enqueuing into batch, state 1 hop 1
    04/11/2011 12:43:27;0008;PBS_Server;Job;16.localhost;Job Queued at request of test@localhost, owner = test@localhost, job name = Qqq, queue = batch
    
    The job requires just 1 CPU on 1 node.
    # qmgr -c "list queue batch"
    Queue batch
        queue_type = Execution
        total_jobs = 0
        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 
        max_running = 3
        acl_host_enable = True
        acl_hosts = localhost
        resources_min.ncpus = 1
        resources_min.nodect = 1
        resources_default.ncpus = 1
        resources_default.nodes = 1
        resources_default.walltime = 00:00:10
        mtime = Mon Apr 11 12:07:10 2011
        resources_assigned.ncpus = 0
        resources_assigned.nodect = 0
        kill_delay = 3
        enabled = True
        started = True
    
    I can't set resources_assigned to nonzero because of Cannot set attribute, read only or insufficient permission resources_assigned.ncpus. When I qrun some task, this goes to mom's log:
    04/11/2011 21:27:48;0001;   pbs_mom;Svr;pbs_mom;LOG_DEBUG::mom_checkpoint_job_has_checkpoint, FALSE
    04/11/2011 21:27:48;0001;   pbs_mom;Job;TMomFinalizeJob3;job 18.localhost started, pid = 28592
    04/11/2011 21:27:48;0080;   pbs_mom;Job;18.localhost;scan_for_terminated: job 18.localhost task 1 terminated, sid=28592
    04/11/2011 21:27:48;0008;   pbs_mom;Job;18.localhost;job was terminated
    04/11/2011 21:27:48;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
    04/11/2011 21:27:48;0080;   pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop
    04/11/2011 21:27:48;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error from job stat
    04/11/2011 21:27:48;0080;   pbs_mom;Job;18.localhost;obit sent to server
    
    Scheduler log (/var/spool/torque/sched_logs/20110705):
    07/05/2011 21:44:53;0002; pbs_sched;Svr;Log;Log opened
    07/05/2011 21:44:53;0002; pbs_sched;Svr;TokenAct;Account file /var/spool/torque/sched_priv/accounting/20110705 opened
    07/05/2011 21:44:53;0002; pbs_sched;Svr;main;/usr/sbin/pbs_sched startup pid 16234
    
    qstat -f:
    Job Id: 26.localhost
        Job_Name = qwe
        Job_Owner = test@localhost
        job_state = Q
        queue = batch
        server = localhost
        Checkpoint = u
        ctime = Tue Jul  5 21:43:31 2011
        Error_Path = localhost:/home/test/jscfi/default/0.738784810485275/qwe.e26
        Hold_Types = n
        Join_Path = n
        Keep_Files = n
        Mail_Points = a
        mtime = Tue Jul  5 21:43:31 2011
        Output_Path = localhost:/home/test/jscfi/default/0.738784810485275/qwe.o26
    
        Priority = 0
        qtime = Tue Jul  5 21:43:31 2011
        Rerunable = True
        Resource_List.ncpus = 1
        Resource_List.neednodes = 1:ppn=1
        Resource_List.nodect = 1
        Resource_List.nodes = 1:ppn=1
        Resource_List.walltime = 00:01:00
        substate = 10
        Variable_List = PBS_O_HOME=/home/test,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=test,
        PBS_O_PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games,
        PBS_O_MAIL=/var/mail/test,PBS_O_SHELL=/bin/sh,PBS_SERVER=127.0.0.1,
        PBS_O_WORKDIR=/home/test/jscfi/default/0.738784810485275,
        PBS_O_QUEUE=batch,PBS_O_HOST=localhost
        euser = test
        egroup = test
        queue_rank = 1
        queue_type = E
        etime = Tue Jul  5 21:43:31 2011
        submit_args = run.pbs
        Walltime.Remaining = 6
        fault_tolerant = False
    How to make it execute jobs automatically, without manual qrun?
    scheduled-task batch-processing torque pbs 
    
    Vi. 660817

    If you do a qrun to force the job to run, does it work? What do you see on the mom_log on either your scheduler node or the execution node after you do a qrun? I saw this issue once a while back (jobs refusing to autostart), but it was a really weird condition and I'm trying to remember how I fixed it. I'm assuming that restarting pbs_server, pbs_mom, etc makes no difference? – ajdecon Apr 11 '11 at 13:14

    @ajdecon, No, restarting changes nothing. – Vi. Apr 11 '11 at 18:30

    OK, I found my notes from this bug, but I'm not sure it will help. When I saw this issue, it was caused by a mismatch of the /etc/group and /etc/passwd files between the head node and the computes. Only doing qrun as root would make the jobs start. – ajdecon Apr 12 '11 at 0:15

    Running everything on single system. How can all that /etc/{hosts,passwd,group,whatever} affect it, especially without any loud log messages? Is there something like "debug log" or other thing where I can look why is it holding back the task? – Vi. Apr 12 '11 at 0:30

    I do not see any communication between the scheduler and the server in the log. Also, localhost is not a good name for a server. You should configure a proper hostname that can be resolved correctly on every node of the cluster. – Dmitri Chubarov Nov 3 '12 at 16:29

    anonymous, Aug 25 '13

    I spent several hours on the problem with similar symptoms and at the end it was single option missing in server settings:
    qmgr -c "set server scheduling = True"
    Norky,May 19 '11
    Normally it would be the scheduler that decides when jobs are to be run, i.e. when there are sufficient resources, and tells the server to run the job. Are you running a scheduler? TORQUE includes a basic scheduler (pbs_sched), or you could install and run the more sophisticated maui (free) or moab (pay-for).

    The pbs_server part of PBS/TORQUE is a "resource manager" - essentially just a 'framework'. It makes no decisions itself: that is the job of the scheduler.

    Yes, scheduler is running (basic Torque scheduler, not maui). Attaching the scheduler log. – Vi. Jul 5 '11 at 18:51

    @Vi: in that case, the standard TORQUE scheduler might have attached a comment to the job: run qstat -f and checked for comments at the end of each job's metadata which might give you a clue as to why it is not running. – Norky Jul 6 '11 at 10:51

    I see nothing strange in qstat -f output. Where to look? (attached it to the question). – Vi. Jul 6 '11 at 12:21

    Recommended Links

    Google matched content

    Softpanorama Recommended

    Top articles

    Sites

    Top articles

    Sites

    ...



    Etc

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


    Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to to buy a cup of coffee for authors of this site

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Last modified: March, 29, 2020