Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

qh command

News Enterprise Job schedulers Recommended Links qb ql qh  jobpar
Parallel Environment Client Commands Monitoring Queues        
Installation of SCE on a small set of multicore servers Usage of NFS Installation of the Master Host Installation of the Execution Hosts Creating and modifying SGE Queues Submitting Jobs To Queue Instance Monitoring and Controlling Jobs
qconf qstat qmod qalter -- Change Job Priority qsub -- Submitting Jobs To Queue Instance qacct command MPI
Troubleshooting Gridengine diag tool Slot limits and restricting number of slots per server Resource Quotas Perl Admin Tools and Scripts Humor Etc

The qh program parses the SGE information similar to qb, but shows different kinds of per-machine data. This can include the machine's architecture, the maximum amount of memory, etc.

?
jbp@head1 [ 84 ] % qh

Cluster Information (by sub-cluster)

as of Fri Feb  2 10:57:06 2007

  showing arch

 

aeroel   01:04  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

         05:08  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

bio     001:004 |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

        005:008 |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

        009:012 |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

        013:016 |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

cee      01:04  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

         05:08  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

         09:12  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

chg      01:04  |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

         05:08  |   lx26-x86 |   lx26-x86 |   lx26-x86 |   lx26-x86 |

compbio  01:04  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

         05:08  | lx26-amd64 | lx26-amd64 | lx26-amd64 | lx26-amd64 |

compeb   01:04  |   lx26-x86 |       down |   lx26-x86 |   lx26-x86 |

         05:08  |   lx26-x86 |   lx26-x86 |   lx26-x86 |       down |

Use "qh -h" to show a list of possible data-items that can be shown.

The output can be useful, especially in conjunction with qb, to understand why a given job might not be capable of running yet. While qb may show open nodes, qh may show that those nodes are all 32-bit and hence a 64-bit job will have to wait.

Some other options:

?
qh -d mem_total

Shows the total amount of memory on the machines

?
qh -m -d mem_total

will show the values in MB (default is GB)

?
qh -d mhz

shows the MHz (or GHz) rating of the machines, i.e. which machines are faster than other.

?
qh -C 4

if the output from qh looks too cramped, you can specify the number of columns you want in the output (default is to make a guess at what will fit in an 80-column screen).

#!/usr/bin/perl
#
# (C) 2004-2009, John Pormann, Duke University
#      jbp1@duke.edu
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#
# RCSID $Id: qh,v 1.11 2007/01/02 15:55:39 jbp1 Exp jbp1 $
#
# qh - produce a 'block' view of some cluster/queue data item

use Getopt::Std;
getopts('hvVxC:d:m');

if( defined($opt_h) ) {
  print "usage:  qh [opts]\n"
    . "  -d data         data-item to show (see below)\n"
    . "  -m              convert memory values to MB (default=GB)\n"
    . "  -C cols         use alternate number of columns in output\n"
    . "  -v              verbose\n"
    . "  -V              really verbose\n";
  print "\navailable data-items:\n"
    .   "  arch num_proc mem_total swap_total virtual_total load_avg\n"
    .   "  load_short load_medium load_long mem_free swap_free virtual_free\n"
    .   "  mem_used swap_used virtual_used cpu mhz scr_free np_load_avg\n"
    .   "  np_load_short np_load_medium np_load_long\n";
  exit;
}

if( defined($opt_V) ) {
  $opt_v = $opt_V;
}

if( not defined($opt_d) ) {
  $opt_d = 'mem_free';
}

if( $opt_d eq 'arch' ) {
	$fmt = ' %10s ';
} else {
	if( $opt_d =~ m/(mem|swap|virtual|scr)_/ ) {
		if( defined($opt_m) ) {
			$fmt = ' %.1fM ';
		} else {
			$fmt = ' %.1fG ';
		}
	} else {
		$fmt = ' %.1f ';
	}
}

%hinfo = ();
$nextsym = 0;
%subclusters = ();

$hname = 'none';
$htext = '';

open( FP, "qhost -F |" );
# skip first two lines
;
;
while(  ) {
  chomp( $_ );
  $orig = $_;
  @fld = split( m/\s+/, $orig );
  if( $fld[0] ne '' ) {
		# save last data 
		if( ($hname ne 'global') and ($hname ne 'none') ) {
			$hinfo{$hname} = $htext;
			if( defined($opt_V) ) {
				print "storing host [$hname]\n";
			}
		}
		$htext = '';
		# new host entry
		$hname = $fld[0];
		# is host down?
		# : for SGE6, fields 3 and 5 should say '-'
		if( $fld[3] =~ m/\-/ ) {
			$htext = 'down:';
			if( defined($opt_V) ) {
				print "** host [$hname] is down\n";
			}
		}
		# add to the 'subclusters' list
		$cluster = $fld[0];
		$cluster =~ s/(.*?)\-(.*)/$1/;
		$subclusters{$cluster} = 1;
  } else {
		# this line goes with previous data
		$x = $fld[1];
		$x =~ s/(..)\:(.*)/$2/;
		$htext .= $x . ':';
  }
}
# don't forget the last line!
if( ($hname ne 'global') and ($hname ne 'none') ) {
	$hinfo{$hname} = $htext;
}
close( FP );

if( defined($opt_v) ) {
	print "hinfo:\n";
	foreach $key ( keys(%hinfo) ) {
		$val = $hinfo{$key};
		print "  [$key] [$val]\n";
	}
}

@hostlist = sorthosts( keys(%hinfo) );

# # # # # # # # # # #
# print header info #
# # # # # # # # # # #
$z = localtime;
print "Cluster Information (by sub-cluster)\nas of $z\n\n"
  .   "   showing $opt_d\n\n";

# kludge up a better ordering for subclusters
delete( $subclusters{'global'} );
@subclusterlist = sort(keys(%subclusters));

# try to pretty up the output
# : find max jobs per machine
$maxlen = 0;
foreach $h ( keys(%hinfo) ) {
	$hi = $hinfo{$h};
	if( $hi =~ m/down/ ) {
		$hi = 'down';
	} elsif( $hi =~ m/\:$opt_d\=/ ) {
		$hi =~ s/(.*?)\:$opt_d\=(.*?)\:(.*)/$2/g;
	} else {
		$hi = 'unk';
	}
	# convert memory values (if appropriate)
	if( $opt_d =~ m/(mem|swap|virtual|scr)_/ ) {
		if( defined($opt_m) ) {
			# show as MB
			if( $hi =~ m/G/ ) {
				$hi = ($hi+0)*1024;
			}
		} else {
			# show as GB
			if( $hi =~ m/M/ ) {
				$hi = ($hi+0)/1024;
			}
		}
	}
	# convert values to given format
	$hi = sprintf( $fmt, $hi );
	# find length of data-item
	$n = length($hi);
	if( $n > $maxlen ) {
		$maxlen = $n;
	}
}
# : figure 80 cols - 6/3/3/4 = 64 chars in line
$cols = int( 64/($maxlen+1) );
if( defined($opt_C) ) {
	$cols = $opt_C;
}

$n = -1;
$f = 0;
foreach $cluster ( @subclusterlist ) {
  $clst = $cluster;
  $strt = -1;
  $fnsh = -1;
  $text = '';
  $n = 0;
  foreach $h ( @hostlist ) {
    if( $h !~ m/^$cluster/ ) {
      next;
    }
    if( $n == $cols ) {
      printf( "%-7s %3s:%-3s |%s\n", $clst, $strt, $fnsh, $text );
      $clst = '';
      $strt = -1;
      $fnsh = -1;
      $text = '';
      $n = 0;
    }
    if( $strt < 0 ) {
      $strt = $h;
      $strt =~ s/(.*?)\-//g;
      $strt =~ s/\D//g;
    }
    $fnsh = $h;
    $fnsh =~ s/(.*?)\-//g;
    $fnsh =~ s/\D//g;
    $w = '';
    $hi = $hinfo{$h};
		if( $hi =~ m/down/ ) {
   	   $hi = 'down';
		} elsif( $hi=~ m/\:$opt_d\=/ ) {
			$hi =~ s/(.*?)\:$opt_d\=(.*?)\:(.*)/$2/g;
		} else {
			$hi = 'unk';
		}
		# convert memory values (if appropriate)
		if( $opt_d =~ m/(mem|swap|virtual|scr)_/ ) {
			if( defined($opt_m) ) {
				# show as MB
				if( $hi =~ m/G/ ) {
					$hi = ($hi+0)*1024;
				}
			} else {
				# show as GB
				if( $hi =~ m/M/ ) {
					$hi = ($hi+0)/1024;
				}
			}
		}
		# convert values to given format
		$hi = sprintf( $fmt, $hi );
		$l = length($hi);
		if( $l < $maxlen ) {
			$text .= ' ' x ($maxlen-$l);
		}
      $text .= $hi . '|';
    $n++;
  }
  printf( "%-7s %3s:%-3s |%s\n", $clst, $strt, $fnsh, $text );
  print "\n";
}

if( defined($opt_x) ) {
  exit;
}

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

# sorthosts subroutine by Benny Kjellgren <@staff.spray.se>
#       correctly sorts FQDN as well as non-FQDN
sub sorthosts {
  my @unsorted = @_;
  my $fqdn;
  my $host;
  my $domain;
  my %domain;
  my %caps;
  my %nums;

  for( @unsorted ) {
     $fqdn = $_;
     ( $host, $domain ) = split('\.', $fqdn, 2);
     $domain{$fqdn} = uc($domain) || "";
     ( $caps{$fqdn} = uc($host) ) =~ s/\d*$//;
     ( $nums{$fqdn} ) = ( $host =~ /(\d*)$/ );
     $nums{$fqdn} = 0 unless $nums{$fqdn};
  }

  my @list = sort {
    $domain{$a} cmp $domain{$b}
      ||
    $caps{$a} cmp $caps{$b}
      ||
    $nums{$a} <=> $nums{$b}
  } @unsorted;

  return( @list );
}

sub get_header_info {
  my $aref = shift( @_ );
  my $i    = shift( @_ );
  my ($y,$z,$cluster,$j,$jj);

  $cluster = $aref->[$i];
  $cluster =~ s/(.*?)[\-\d](.*)/$1/g;

  # first node number for this cluster
  $y = $aref->[$i];
  $y =~ s/\D+//g;

  # last node ( scalar(@$aref) ) {
      $z = $aref->[scalar(@$aref)-1];
      last;
    } elsif( $aref->[$jj] !~ m/^$cluster/ ) {
      $z = $aref->[$jj-1];
      last;
    } else {
    }
  }
  if( $z eq '' ) {
    $z = $aref->[$jj];
  }
  $z =~ s/\D+//g;

  # trim cluster to only 6 letters
  $cluster =~ s/(......)(.*)/$1/;

  return( ($cluster,$y,$z) );
}