|Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells
SGE produces job post mortem, via setting execd_param KEEP_ACTIVE=ERROR
Set “KEEP_ACTIVE=TRUE” for execd_params in the global or hostspecific configuration disables the job spool dir deletion. That means that directory contents can be analyzed after job failed...
Job spool dir retention
Trigger SGE Debug Output
The qmaster will produce (large amounts of) debugging output if SGE_DEBUG_LEVEL is set appropriately. Source util/dl.sh and run sge_qmaster directly after then using `dl 4', for instance. (I don't know what an appropriate debugging level is.) Otherwise, strace, or a similar tool if it's not GNU/Linux, might show why the socket binding fails if this is on GNU/Linux.
It contains environment. */active_jobs_dir/*/environment It's hard to debug when something is running as deamon. So you need to prevent daemonizing with SGE_ND=1. SGE_DEBUG_LEVEL defines the emount of information execd_param -- important KEEP_ACTIVE=ERROR
Google matched content
Grid Engine Trouble Shooting HOWTO
Daniel Templeton SGE developer, one of the DRMAA evangelists, frequent Sun blogger on SGE topics
Permalink to post on debug output
Enabling Debugging Output (DanT's Grid Blog)