Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Excluding SGE execution host from scheduling

News Configuring Hosts From the Command Line Recommended Links Administration Hosts Execution hosts Submit Hosts SGE hostgroups
Excluding host from scheduling Shutting down execution daemon Monitoring Queues and Jobs qmod qrsh qsub ulimit problem with infiniband in SGE
  Controlling Queues and Jobs Parallel environment  Glossary Grid Engine Config Tips Humor Etc

Introduction

There is an option softstop for the execution daemon init script.

You can not remove the host from execution host list while jobs are running.

Disabling individual node

You can exclude the host from a particular queue even if jobs are running on particular host using qmod command:

qmod -d blades.q@b8 # (-d == disable)
root@z99 changed state of "blades.q@b8" (disabled) 
Current job will remain running but no new jobs will be scheduled on the node.

After maintenance is done you can re-enable it

qmod -e  blades.q@b8 >  # enable b8 in queue blades.q 

Disabling a queue

You can also disable  a particular queue. Now it will not schedule new jobs on the hosts that are defined in it and you can wait for any active jobs to finish before you run the shutdown procedure.

To disable a queue use the qmod -dq command:

% qmod -dq {<cluster-queue> | <queue-instance> | <queue-domain>}

For information about  queues, see Creating and modifying SGE Queues

The qmod -dq command prevents new jobs from being scheduled to the disabled queue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons.

Removing host from a particular queue

You also can remove the host from the queue you use thus preventing further scheduling of this host. The problem is when you have many queues. In this case you should convert them to hostgroups.

 Option softstop for the execution daemon init script.

You can use softstop option for the execution daemon init script. Along with regular "stop" the init script for execution daemon accepts option softstop. Which preserves running jobs as it does not kill shepherd process:

# This script can be called with the following arguments:
#
#       start       start execution daemon
#       stop        Terminates the execution daemon
#                   and the shepherd. This only works if the execution daemon
#                   spool directory is in the default location.
#       softstop    do not kill the shepherd process
#
# Unix commands which may be used in this script:
#    cat cut tr ls grep awk sed basename
#
# This script requires the script $SGE_ROOT/util/arch
#

PATH=/bin:/usr/bin:/sbin:/usr/sbin

You can  run a "suicidal" script that softstop the daemon on all the necessary nodes.

The script should be run as root or SGE admin account and can contain just one command. for example service sgeexecd.644 softstop command.

qsub -q blades.q@b8 node_suicide.sh

This way you can shut down execution daemon on multiple nodes.

 Option -q provides a lot of flexibility in selecting nodes including the ability to use simple regular expressions. See ws_queue definition. Among them:

   -q destination
               Defines the destination of the job.  The  destination  names  a  queue,  a  server, or a queue at a server.

               The qsub command will submit  the  script  to  the   server  defined  by  the destination argument.  If the destination is a routing queue, the job may be
               routed by the server to a new destination.

               If  the  -q option is not specified, the qsub command will submit the script to the default server.
               See  PBS_DEFAULT  under  the Environment Variables section on this man page and the PBS  ERS  section
               2.7.4, "Default Server".

               If the -q option is specified, it is in one of the following three forms:

                   queue
                   @server
                   queue@server

               If the destination argument names a queue and does  not  name  a  server, the job will be submitted to the named queue at the default server.

               If the destination argument  names  a  server  and does  not  name a queue, the job will be submitted to the default queue at the named server.

               If the destination argument names both a queue and a  server,  the job will be submitted to the named  queue at the named server.

Suspend and resubmit stalled jobs

Reference:

http://gis.fem-environment.eu/grid-engine-howto/

# as user:
qstat | grep neteler | tr -s ' ' ' '  | cut -d' ' -f2 > /tmp/to_suspend.sge cat /tmp/to_suspend.sge

# as root (?):
su -
for i in `cat /tmp/to_suspend.sge` ; do qmod -sj $i ; done
qstat

# remove crashed blade from list of execution hosts:
qconf -de blade14

# delete host from list:
qconf -mhgrp "@allhosts"

# apply new list:
qconf -shgrp "@allhosts"

# verify queue stats:
qstat -f

# resubmit jobs to other nodes (as job user!!):
exit
for i in `cat /tmp/to_suspend.sge` ; do qresub $i ; done
qstat


This command send a signal to a running job :
qmod -sj | -usf | -cd (suspend | unsuspend | clear error)

qmod -sj 3312136

qmod -usj 3312136
root - unsuspended job 3312136

Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[Sep 19, 2014] How do I temporarily take a node out from SGE (Sun Grid Engine)

Earthwithsun.com
Q: I'm having some trouble with a specific node. Until I resolve it, I don't want any jobs to run on ii. How can I temporarily take this node out of the nodes "pool"?

A1: To disable:

qmod -d *@node_name

To re-enable:

qmod -e *@node_name

A2: Without knowing your SGE version I cannot say for certain that this will achieve the desired outcome, however, qconf -de foo will delete the execution host foo. qconf -ae foo will then add the host foo back to the execution list.

If you're running 6.1 or better, here's the best way. Create a new hostgroup called @disabled

qconf -ahgrp @disabled

Create a new resource quota set with

qconf -arqs limit hosts @disabled to slots=0

Now, to disable a host, just add it to the host group

qconf -aattr hostgroup hostlist MYHOST @disabled

To reenable the host, remove it from the host group

qconf -dattr hostgroup hostlist MYHOST @disabled

This process will stop new jobs from being scheduled to the machine and allow the currently running jobs to complete.

[Sep 17, 2014] Excluding nodes from qsub command under sge

Stack Overflow

I have more than 200 jobs I need to submit to and sge cluster. I'll be submitting them into two ques. One of the queues have a machine that I don't want to submit jobs to. How can I exclude that machine? The only thing I found that might be helpful is (assuming three valid nodes available to q1 and all the available nodes for q2 are valid):

qsub -q q1.q@n1 q1.q@n2 q1.q@n3 q2.q

Assuming you don't want to run it on is called n4 then adding the following to your script should work.

#$ -l h=!n4

xxx

The best way I've found for this is to set up a custom resource on the nodes that you want to allow the execution on, then require that resource when you submit the job.

In qmon, go to the "complex" configuration and add a new attribute. Set the name to something like "my_allowed" and the shortcut to something like "m_a", the type to BOOL, the relation to ==, requestable to Yes, consumable to No, and "Add" it. Commit your changes to the complex configurations.

The next step is probably easier to do from the command line, but you can do it in qmon, as well. You need to add your consumable to each host that you're going to allow your job to run on. In qmon, you can go to the host configuration, select execution host, and open each host in turn, clicking on the consumables/fixed attributes tab and adding the new complex that you just configured above with "True" as the value. From the command line, you can get a list of your execution hosts with "qconf -sel". This list is suitable for passing to a loop and grepping out the host(s) you don't want included. Do something like this:

qconf -sel | grep -v host_to_exclude | while read host; do
EDITOR="ed" qconf -me $h <<EOL
/complex_values/s/$/,my_test=True/
w
q
EOL
done
This lets you programmatically edit the host (not normally allowed by qconf as it wants to start up your editor for you). It does this by setting the editor to "ed" (you'll have to make sure you have the ed editor installed... try running it by hand first... type "q" to get out). ed takes the list of editing commands on it's stdin, so we give it three commands. The first edits the line with the complex_values on it to include the my_test value. The second writes out the temporary file and the third quits ed. Once you've done this, submit your jobs with a limit option that requires your new complex:
qsub -q whatever -l my_test=True my_prog.sh
The -l option sets a limit and the my_test=True says the job can only run on hosts that have the complex my_test with a value of True. Since the complex isn't consumable, it can still run as many jobs on each host as it wants to (up to the slot limit for the hosts), but it will avoid any hosts that don't have the my_test complex set to True.
xxx
There is a nice bypass to this.

Generate a simple bash file:

#!/bin/bash
sleep 6000 #replace 6000 with any long period of time that will be enough to submit your jobs

submit this jobs to the node you wish to exclude until they fully occupy it.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019