|May the source be with you, but remember the KISS principle ;-)|
|Contents||Bulletin||Scripting in shell and Perl||Network troubleshooting||History||Humor|
|News||Oracle Grid Engine||Recommended Links||Job Post Mortem||Job stuck in the queue problem||SGE log files||Enabling scheduling information qstat -j|
|Job or Queue Reported in Error State E||MPI startup failure with message memlock is too small on infiniband||ulimit problem with infiniband||Creating and modifying SGE Queues||Parallel Environment||Gridengine diag tool||Backup of SGE configuration|
|Monitoring and Controlling Jobs||Monitoring Queues||qsub -- Submitting Jobs To Queue Instances||SGE Submit Scripts||MPI|
|Main troubleshooting commandss||qstat||qacct command||qping||qhost||qmod||SGE cheat sheet|
|Installation of SCE on a small set of multicore servers||Usage of NFS||Installation of the Master Host||Installation of the Execution Hosts||Perl Admin Tools and Scripts||Humor||Etc|
Grid Engine messages can be found in syslog during startup:
After startup the daemons log their messages in their spool directories.
or in syslog, if configured
There more logs
log_consumables This parameter controls writing of consumable resources to the reporting file. When set to (log_consumables=true) information about all consumable resources (their current usage and their capacity) will be written to the reporting file, whenever a consumable resource changes either in definition, or in capacity, or when the usage of a consumable resource changes. When log_consumables is set to false (default), only those variables will be written to the reporting file, that are configured in the report_variables in the exec host configuration, see host_conf (5) for further information about report_variables.
http://arc.liv.ac.uk/SGE/howto/rotatelogs.html [PDF] Grid Engine 6 - BioTeam www.bioteam.net/wp-content/uploads/2009/09/07-SGE-6-Admin... Execd spool logs often hold job specific error ... Will create a 1-time scheduler trace file $SGE_ROOT/default/common ... Increase log verbosity A SGE configuration ... SGE CERES Scripting - Earth's energy budget ceres.larc.nasa.gov/documents/presentations/SGE_Scripting_DMT.pdf Additional Log File for SGE Session Contains everything printed to STDOUT and STDERR during SGE session Also includes a header line with job ID and log file name. [PDF] Prototype Scripts for Running PGE's on SGE ceres.larc.nasa.gov/documents/presentations/SGE_Scripting_SSIT.pdf Introduction Need to run PGE's on SGE independent of platform submitted to Develop scripts to setup for either environment on the fly Currently in ...
Command line parameters
Sun Grid Engine daemons create log files called "messages" in their respective spool directories. Also, an 'accounting' file and a 'statistics' file is created. A script for truncating log files is found in the following directory:$SGE_ROOT/util/logchecker.sh
The script is not activated by any of the Sun Grid Engine daemons automatically. It is intended to be edited according to the needs of your site. After customizing the script, you can add an entry to your crontab. The script is can run in verbose mode or completely silently. It can also run in a mode where it only prints what would be done. The script accepts only two command line parameters for overriding the ACTION_ON parameter and the location of the exec daemon spool directory (see below).
Sun Grid Engine Software daemons create log files in the qmaster_spool_dir and execd_spool_dir which are defined in the global cluster configuration, the can be overridden in the local cluster configuration of every execution host (usually this is not done). The directory is usually called 'default', and only if the $SGE_CELL variable is used, 'default' is overridden.
Default location of Sun Grid Engine log files:<qmaster_spool_dir>/messages <qmaster_spool_dir>/schedd/messages <execd_spool_dir>/<hostname>/messages <sge_root>/<sge_cell>/common/accounting <sge_root>/<sge_cell>/common/statistics
Since these directories can all be located in the same directory hierarchy in a shared NFS filesystem, or the execd spool directories can point to a local directory, it is possible to specify with the ACTION_ON parameter (see below) which 'messages' files should be rotated when the script is called.
The following variables need to be configured in the script. The "|" character specifies an alternative. All variables in the script must be entered in Bourne shell syntax. So there may be no white space before or after the equal "=" sign.
After the script is configured you should set this value to "no". If set to "yes" (or any other value), the script only will print out what would be done.
The colon ":" is the null command in the shell. If you set the variable to this value, the script will work silently (only error messages are printed). If you set the value to "echo" the script will print what it is currently doing.
Enter the path of your sge_ROOT directory here.
Enter the name of your cell, if not 'default'
1 = work on qmaster and scheduler "messages" files only
2 = work on "messages" file on current host only
3 = work on all accessible execd "messages" files of global config
4 = work on qmaster "messages" and all accessible execd "messages" files
rotate/delete only if file size exceeds ACTIONSIZE in kilobytes. If ACTIONSIZE is set to 0, rotate "messages" file each time script is called.
Defines the number of old messages files to be preserved. E.g. "30" means that "messages.0" to "messages.29" are saved. A value of "0" means no backup is done. The most recent messages file has the extension ".0".
yes = compress rotated "messages.0" file with gzip
no = leave rotated "messages.0" file uncompressed
yes = rotate accounting file when rotating qmaster 'messages' file
no = don't rotate accounting file
delete = delete statistics file
yes = rotate statistics file
no = don't rotate statistics file
The 'statistics' file in this release is not used. You can safely delete it. you can also set the the parameter stat_log_time in your global cluster configuration to a very long interval (default is 48:00:00 - 48 hours)
Command line parameters
The script accepts the following command line parameters:
Define the base directory of the execd spool directory. Do not add the unqualified hostname in the command line. The hostname is added automatically by the script.
- -action_on 1|2|3|4
Override the ACTION_ON variable in the script.
- All Sun Grid Engine spool directories are shared. You can call the script on any one of your Sun Grid Engine hosts or on your file server.
set ACTION_ON to "4" in the script. Set other values according to your needs and add the script to your crontab of one of the above machines.
- Sun Grid Engine execd spool are defined only through the global cluster configuration, but point to a local directory.
set ACTION_ON="3". Add the start of the script to all crontabs of your execds in your cluster. On your qmaster machine (or on your file server) add the following call of the script to your crontab:<path_to_script>/logchecker.sh -action_on 1
- Sun Grid Engine spool directories of execds are defined in the local configuration.Set ACTION_ON="2" in the script:
On your qmaster machine (or on your file server) add the following call of the script to your crontab:<path_to_script>/logchecker.sh -action_on 1On your exec hosts add the following line:<path_to_script>/logchecker.sh -execd_spool
Softpanorama hot topic of the month
Rotating and truncating Grid Engine Log Files
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least
Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: July 14, 2016