Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Resource requirements and limitations

News Grid Engine Reference Recommended Links sge_conf SGE Parallel Environment SGE Queues
Resource requirements and limitations SGE Consumable Resources License tokens processing and limitation of the number of concurrent jobs Slot limits and restricting number of slots per server Load Sensors SGE Submit Scripts
Glossary SGE History Tips Perl Admin Tools and Scripts Humor Etc

Table of contents

The SGE batch scheduling system dispatches jobs according to the availability of resources, necessary for their successful completion. Therefore an unspecified resource or a resource specified unnecessarily high, might lead to an avoidable delay in the scheduling of your jobs. It is therefore advisable to carefully specify the required resources for each of your jobs, in order to optimally utilize our HPC systems.

1. Status information and resource limitations

The following sections will provide you with a brief introduction on how to obtain information about the overall and the actually available resources on the cluster.

1.1 Host specific resource and status information (qhost)

To get an abbreviated overview of the currently available resources on each of the cluster's execution hosts, use the command qhost. For a complete representation of all available host-specific resource attributes execute

qhost -F

For more information about standard resource attributes consult the complex man page.

1.2 Queue specific resource information

To obtain a list of the available queues on the cluster execute

qconf -sql

Use the following command to get detailed resource information for a specific queue:

qconf -sq queue

1.3 Slot limitations

There are limitations on the number of available slots per user, as well as the maximum number of slots per job. As a rule of thumb, a single job may occupy approximately half of the cluster and each user may fill the cluster up to about 75% with his jobs.

Depending on the specific HPC system there are also transient limitations on the number of available slots per user, which come into effect only at times of high cluster load (and consequently increased competition for the available resources). Execute

qquota

to see these limits and your actual resource consumption (there's nothing displayed, if you have no running jobs on the cluster).

Note: The transient limits are usually not enforced, but they may cause problems for interactive sessions (see the section Submitting interactive jobs on the subject) or for short running jobs. Please contact the ZID cluster administration if you experience problems or need more resources for the progress of an urgent project.

2. Specifying resource requirements for jobs

The subsequent sections will provide sample cases of how to specify the resource requirements for your jobs.

2.1 Runtime limits

For an optimal scheduling of your jobs, i.e. to allow your job to run as soon as the necessary resources are available, it is advisable to specify the job runtime as closely as possible (in this way exploiting the so called backfilling possibilities of SGE in the case of ongoing resource reservations.) Runtime limits are specified with the h_rt resource attribute. For example, submit your job with the following command line, if it will for sure not take more than 4 hours and 30 minutes (wallclock time) for its completion:

qsub -l h_rt=4:30:00 job_script.sh

Note: Do not use runtime limits too aggressively or if you are unsure about the actual duration of your jobs, as the jobs will be terminated as soon as the specified runtime limits are exceeded. If no runtime limits are provided, the default runtime limits of the system are taken into account, which can be taken from the queue specific resource information.

2.2 Memory usage

In order to avoid job failures due to memory oversubscription, the maximum available amount of memory per process is by default limited to a cluster specific value (issue the command qconf -sc | grep "default\|h_vmem" to find out the default value).

If your job requires less than 1.5 GByte of memory per process, you can explicitly specify this by setting the SGE's resource parameter h_vmem as in the following example:

qsub -l h_vmem=1500M -pe openmpi-fillup 4 job_script.sh

This will reserve a total of 6 GByte of memory for your job, potentially distributed over several hosts. Memory values are specified in bytes by positive decimal (1500), octal (02734) or hexadecimal (0x5dc) integers. For convenience the multipliers k(1000), K(1024), m, M, g and G can be appended.

Note: If you know that your memory requirements lie below the default limit, please do specify the lower value.

3. Altering resource requirements for pending jobs

You can alter (most of the) the resource requirements of pending jobs at any time with SGE's qalter command. For example, to change the parallel environment, including the number of desired slots of a waiting parallel job, enter:

qalter -pe openmpi-fillup 8-16 YOUR_JOB_ID


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

Limiting User Greed Resource Quotas

Integrated over time, fair-share scheduling should ensure that each user gets their appropriate CPU usage (provided they submit sufficient jobs). Over and above this, we want to prevent any one user dominating any host-group at any given time.

Prevent any one user dominating the serial queue:
  {
    name         C6100-STD-serial.q.rqs
    description  NONE
    enabled      TRUE
    limit        users {*} queues C6100-STD-serial.q to slots=48
        #
        # ..."users {*}" means "each and every user" while "users *" would 
        #    mean "all users together"...
        #
 }
Limit total slot-count for each user on the main queues:
  {
    name         CSF.q.rqs
    description  NONE
    enabled      TRUE
    limit        users {*} queues R815.q,C6100-STD.q,C6100-STD-ib.q, \
    C6100-FAT.q,C6100-VFAT.q,R410-twoday.q to slots=256
  }
Discourage interactive work:
  {
    name         C6100-STD-interactive.q.rqs
    description  NONE
    enabled      TRUE
    limit        users {*} queues C6100-STD-interactive.q to slots=4
  }
Prevent any one user grabbing more than half of this one:
  {
    name         R815.q.rqs
    description  NONE
    enabled      TRUE
    limit        users {*} queues R815.q to slots=256
  }
Since we have so few M610x-hosted GPGPUs, limit to one per user:
  {
    name         M610x.rqs
    description  NONE
    enabled      TRUE
    limit        users {*} hosts @M610x-GPU to slots=1
  }

Limit total usage (sum of all users) on some queues:
{
   name         CSF-Queues-total-users.rqs
   description  NONE
   enabled      TRUE
   limit        users * queues C6100-STD-serial.q to slots=144
   limit        users * queues R410-twoday-interactive.q to slots=12
   limit        users * queues R410-short-interactive.q to slots=12
}
Multiple queues on some hosts, but don't want to overload them:
{
   name         CSF-Hosts-slots.rqs
   description  NONE
   enabled      TRUE
   limit        hosts {@C6100-STD} to slots=12
   limit        hosts {@C6100-FAT} to slots=12
   limit        hosts {@C6100-STD-ib} to slots=12
   limit        hosts {@C6100-STD-test} to slots=12
   limit        hosts {@R815} to slots=32
   limit        hosts {@R410-twoday} to slots=12
   limit        hosts {@R410-short} to slots=12
}
Don't want any individual to hog the precious IB-connected Intel nodes:
{
   name         CSF-PEs-each-user.rqs
   description  NONE
   enabled      TRUE
   limit        users {*} pes orte-12-ib.pe to slots=96
}
Limit MACE use of the non-IB Intel nodes as they contributed only AMD:
{
   name         CSF-Usersets.rqs
   description  NONE
   enabled      TRUE
   limit        users @mace01.userset queues C6100-STD.q to slots=36
Limit each user's greed on each (well, most) queues:
{
   name         CSF-Queues-each-user.rqs
   description  NONE
   enabled      TRUE
   limit        users {*} queues C6100-FAT.q to slots=36
   limit        users {*} queues C6100-STD-serial.q to slots=36
   limit        users {*} queues C6100-STD-interactive.q to slots=4
   limit        users {*} queues R815.q to slots=256
   limit        users {*} queues R815.q,C6100-STD.q,C6100-STD-ib.q, \
   C6100-FAT.q,C6100-VFAT.q,R410-twoday.q to slots=256
   limit        users {*} queues M610x-GPU.q,M610x-GPU-interactive.q to slots=3
}
Limit total usage (sum of users) on some PE/Queue combos:
{
   name         CSF-PEs-total-users.rqs
   description  NONE
   enabled      TRUE
## limit        users * pes orte.pe,orte-12.pe to slots=550
   limit        users * pes orte.pe,orte-12.pe queues C6100-STD.q to slots=96
   #
   # ...above, changed one t'other...
   #
   limit        users * pes smp.pe queues C6100-STD.q to slots=440
## limit        users * pes fluent-smp.pe queues C6100-STD.q to slots=48
    #
    # ...above, replaced by mace.userset quota...
}



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September, 12, 2017