Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Sun SGE 6.2u5 - SGE Classic

News SGE implementations Recommended Links SGE 6.2u5 RPMs from EPEL are Junk Oracle Grid Engine Installation of Oracle SGE Engine Master Host on Red Hat
Extra Packages for Enterprise Li How to Enable EPEL Repository YUM Installation of   Engine Master   Installation of the Execution Hosts Using the command line installer
Usage of NFS in Grid Engine  SGE 6.2u7 (Oracle Grid engine) Grid Scheduler Son of Grid Engine  UNIVA Grid Engine  
SGE Troubleshooting Gridengine diag tool Duke University Tools SGE History Humor Etc

Introduction

Sun SGE 6.2u5 was the last open source version produced by Sun before Oracle acquisition. This is a classic software and probably the most widely used version of Grid Engine.  Installation of classic Sun version is essentially identical to installation of Oracle Grid Engine:

Classic Sun distribution

It's difficult to find  the original Sun 6.2u5 distribution on the Internet. The only place I know is  Open Grid Scheduler - Browse Files at SourceForge.net which contains SGE6.2u5p2

If you need compiled binaries for Red Hat your best bet is to use Son of Grid Engine which is pretty close to classic in spirit and installation details (with important bug fixes and enhancements) and works on RHEL 6.5 pretty well.

Some universities like rutgers have tar files available too. See for example SGE installation from Rutgers for Debian.

Debian packages

Debian packages of SGE 6.2u5 are maintained, but do not use Sun installer. They represent pretty radical deviation from traditional way to install grid Engine.  See Debian Package Tracking System - gridengine.

Looks like the initial packager has had  too much zeal.

For example here is how file list of package gridengine-exec for amd64 architecture looks like

/etc/init.d/gridengine-exec
/usr/lib/gridengine/qrsh_starter
/usr/lib/gridengine/sge_coshepherd
/usr/lib/gridengine/sge_execd
/usr/lib/gridengine/sge_shepherd
/usr/sbin/sge_coshepherd
/usr/sbin/sge_execd
/usr/sbin/sge_shepherd
/usr/share/doc/gridengine-exec/NEWS.Debian.gz
/usr/share/doc/gridengine-exec/changelog.Debian.gz
/usr/share/doc/gridengine-exec/copyright
/usr/share/man/man8/sge_execd.8.gz
/usr/share/man/man8/sge_shepherd.8.gz

For execution nodes they have special rpm gridengine-client. I do not understand why utilities are duplicated... 

/usr/bin/qacct
/usr/bin/qalter
/usr/bin/qconf
/usr/bin/qdel
/usr/bin/qhold
/usr/bin/qhost
/usr/bin/qlogin
/usr/bin/qmod
/usr/bin/qping
/usr/bin/qquota
/usr/bin/qrdel
/usr/bin/qresub
/usr/bin/qrls
/usr/bin/qrsh
/usr/bin/qrstat
/usr/bin/qrsub
/usr/bin/qselect
/usr/bin/qsh
/usr/bin/qstat
/usr/bin/qsub
/usr/lib/gridengine/qacct
/usr/lib/gridengine/qalter
/usr/lib/gridengine/qconf
/usr/lib/gridengine/qdel
/usr/lib/gridengine/qhold
/usr/lib/gridengine/qhost
/usr/lib/gridengine/qlogin
/usr/lib/gridengine/qmod
/usr/lib/gridengine/qping
/usr/lib/gridengine/qquota
/usr/lib/gridengine/qrdel
/usr/lib/gridengine/qresub
/usr/lib/gridengine/qrls
/usr/lib/gridengine/qrsh
/usr/lib/gridengine/qrstat
/usr/lib/gridengine/qrsub
/usr/lib/gridengine/qselect
/usr/lib/gridengine/qsh
/usr/lib/gridengine/qstat
/usr/lib/gridengine/qsub
/usr/share/doc/gridengine-client/NEWS.Debian.gz
/usr/share/doc/gridengine-client/changelog.Debian.gz
/usr/share/doc/gridengine-client/copyright
/usr/share/doc/gridengine-client/examples
/usr/share/man/man1/qacct.1.gz
/usr/share/man/man1/qalter.1.gz
/usr/share/man/man1/qconf.1.gz
/usr/share/man/man1/qdel.1.gz
/usr/share/man/man1/qhold.1.gz
/usr/share/man/man1/qhost.1.gz
/usr/share/man/man1/qlogin.1.gz
/usr/share/man/man1/qmod.1.gz
/usr/share/man/man1/qping.1.gz
/usr/share/man/man1/qquota.1.gz
/usr/share/man/man1/qrdel.1.gz
/usr/share/man/man1/qresub.1.gz
/usr/share/man/man1/qrls.1.gz
/usr/share/man/man1/qrsh.1.gz
/usr/share/man/man1/qrstat.1.gz
/usr/share/man/man1/qrsub.1.gz
/usr/share/man/man1/qselect.1.gz
/usr/share/man/man1/qsh.1.gz
/usr/share/man/man1/qstat.1.gz
/usr/share/man/man1/qsub.1.gz
/usr/share/man/man1/sge_submit.1.gz

EPEL packages for RHEL

For RHEL CentOs or Fedora SGE6.2u5 RPM's can be found in EPEL repository. They are close in spitit to Debian distribution. The problem is that they are not working and installer is seriously buggy ;-).  In any case for those who want to try them themselves here are links:

Index of -pub-epel-6-x86_64

  gridengine-6.2u5-10.el6.4.i686.rpm                                                        2012-04-17 21:59   15M 
  gridengine-6.2u5-10.el6.4.x86_64.rpm                                                   2012-04-17 21:59   15M 
  gridengine-devel-6.2u5-10.el6.4.i686.rpm                                             2012-04-17 21:59    74K 
  gridengine-devel-6.2u5-10.el6.4.x86_64.rpm                                        2012-04-17 21:59    74K 
  gridengine-execd-6.2u5-10.el6.4.x86_64.rpm                                        2012-04-17 21:59  1.3M 
  gridengine-qmaster-6.2u5-10.el6.4.x86_64.rpm                                     2012-04-17 21:59  1.5M 
  gridengine-qmon-6.2u5-10.el6.4.x86_64.rpm                                         2012-04-17 21:59  1.4M 
        

Source RPMs are also availble

gridengine rpm build for : RedHat EL 6. For other distributions click gridengine.

 
Name : gridengine  
Version : 6.2u5 Vendor : Fedora Project
Release : 10.el6.4 Date : 2012-04-17 20:58:35
Group : Applications/System Source RPM : gridengine-6.2u5-10.el6.4.src.rpm
Size : 42.94 MB  
Packager : Fedora Project  
Summary : Grid Engine - Distributed Computing Management software
Description :
In a typical network that does not have distributed resource management
software, workstations and servers are used from 5% to 20% of the time.
Even technical servers are generally less than fully utilized. This
means that there are a lot of cycles that can be used productively if
only users know where they are, can capture them, and put them to work.

Grid Engine finds a pool of idle resources and harnesses it
productively, so an organization gets as much as five to ten times the
usable power out of systems on the network. That can increase utilization
to as much as 98%.

Grid Engine software aggregates available compute resources and
delivers compute power as a network service.

These are the local files shared by both the qmaster and execd
daemons. You must install this package in order to use any one of them.

RPM found in directory: /mirror/download.fedora.redhat.com/pub/fedora/epel/6/x86_64

Content of RPM  Changelog  Provides Requires
Download
 
ftp.univie.ac.at   gridengine-6.2u5-10.el6.4.x86_64.rpm
ftp.muug.mb.ca   gridengine-6.2u5-10.el6.4.x86_64.rpm
mirror.switch.ch   gridengine-6.2u5-10.el6.4.x86_64.rpm
ftp.pbone.net   gridengine-6.2u5-10.el6.4.x86_64.rpm
ftp.icm.edu.pl   gridengine-6.2u5-10.el6.4.x86_64.rpm
ftp.sunet.se   gridengine-6.2u5-10.el6.4.x86_64.rpm
ftp.is.co.za   gridengine-6.2u5-10.el6.4.x86_64.rpm
      

Provides :
config(gridengine)
libcore.so()(64bit)
libdrmaa.so.1.0()(64bit)
libjgdi.so()(64bit)
libjuti.so()(64bit)
libspoolb.so()(64bit)
libspoolc.so()(64bit)
perl(JSV)
gridengine
gridengine(x86-64)

Requires :
 
/bin/bash  
/bin/csh  
/bin/ksh  
/bin/sh  
/bin/sh  
/bin/sh  
/bin/sh  
/sbin/ldconfig  
/sbin/ldconfig  
/usr/bin/perl  
/usr/bin/tclsh  
/usr/sbin/alternatives  
/usr/sbin/alternatives  
binutils  
config(gridengine) = 6.2u5-10.el6.4  
fedora-usermgmt  
fedora-usermgmt  
libc.so.6()(64bit)  
libc.so.6(GLIBC_2.11)(64bit)  
libc.so.6(GLIBC_2.2.5)(64bit)  
libc.so.6(GLIBC_2.3)(64bit)  
libc.so.6(GLIBC_2.3.4)(64bit)  
libc.so.6(GLIBC_2.4)(64bit)  
libc.so.6(GLIBC_2.7)(64bit)  
libcrypt.so.1()(64bit)  
libcrypt.so.1(GLIBC_2.2.5)(64bit)  
libcrypto.so.10()(64bit)  
libdb-4.7.so()(64bit)  
libdl.so.2()(64bit)  
libdl.so.2(GLIBC_2.2.5)(64bit)  
libgcc_s.so.1()(64bit)  
libgcc_s.so.1(GCC_3.0)(64bit)  
libgcc_s.so.1(GCC_3.3.1)(64bit)  
libjemalloc.so.1()(64bit)  
libm.so.6()(64bit)  
libm.so.6(GLIBC_2.2.5)(64bit)  
libncurses.so.5()(64bit)  
libpam.so.0()(64bit)  
libpam.so.0(LIBPAM_1.0)(64bit)  
libpthread.so.0()(64bit)  
libpthread.so.0(GLIBC_2.2.5)(64bit)  
libpthread.so.0(GLIBC_2.3.2)(64bit)  
libspoolb.so()(64bit)  
libspoolc.so()(64bit)  
libssl.so.10()(64bit)  
libtinfo.so.5()(64bit)  
ncurses  
perl(Env)  
perl(Exporter)  
perl(JSV)  
perl(lib)  
perl(strict)  
perl(warnings)  
rpmlib(CompressedFileNames) <= 3.0.4-1  
rpmlib(FileDigests) <= 4.6.0-1  
rpmlib(PayloadFilesHavePrefix) <= 4.0-1  
rtld(GNU_HASH)  
rpmlib(PayloadIsXz) <= 5.2-1  


Content of RPM :
/etc/profile.d/sge.csh
/etc/profile.d/sge.sh
/etc/sysconfig/gridengine
/usr/bin/qalter-ge
/usr/bin/qconf
/usr/bin/qdel-ge
/usr/bin/qevent
/usr/bin/qhold-ge
/usr/bin/qhost
/usr/bin/qlogin
/usr/bin/qmake-ge
/usr/bin/qmod
/usr/bin/qping
/usr/bin/qquota
/usr/bin/qrdel
/usr/bin/qresub
/usr/bin/qrls-ge
/usr/bin/qrsh
/usr/bin/qrstat
/usr/bin/qrsub
/usr/bin/qselect-ge
/usr/bin/qsh
/usr/bin/qstat-ge
/usr/bin/qsub-ge
/usr/bin/qtcsh
/usr/bin/sge_shadowd
/usr/bin/sgepasswd
/usr/lib64/gridengine
/usr/lib64/gridengine/jgdi.jar
/usr/lib64/gridengine/juti.jar
There is 397 files more in these RPM.

 

The Grid Engine RPMs that you will need are:

Install the master RPM on the server which will be your master host. On this machine, run:

$SGE_ROOT/install_qmaster
I think after that you will have no question about the quality of those RPMs. You might consider Son of Grid Engine 8.1.8 RPMs,  which can be installed on RHEL 6.5 without major problems.  For it installation instructions including recommendation on how to resolve dependencies are at

You are experienced builder then like challenges you might also consider  Grid Scheduler --  another abandonware version of Sun SGE 6.2u5 with some bug fixes enhancements (support of cgroups in Linux).

Open Grid Scheduler/Grid Engine is released under the Sun Industry Standards Source License (SISSL). New code (new file) is licensed under the BSD license.

The fine print: Most of the code was taken from Sun Grid Engine (more specifically SGE 6.2 update 5 released in 2009), which was developed by Sun Microsystems. Using 6.2u5 as the starting point, we add new features and fixes to create Open Grid Scheduler/Grid Engine.

Configuration

Configuration of classic Sun distribution is very close to configuration of Oracle Grid Engine (which is actually rebranded Sun SGE 6.2u7).  Oracle documentation can be used. See Oracle Grid Engine


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[Nov 08, 2014] malariagen informatics

I've had Sun GridEngine running on our cluster of 12-core HP blades from its earliest days. What has not been working is the the inter-host communication (the ability of the system to schedule and distribute jobs across the nodes). I therefore set out to fix this situation. It turns out that the problems that prevented this from working are mainly caused by quirks in the way that the Debian (and by inheritance, Ubuntu) packaging was done.

Prerequisites for gridengine: Most of the problems that I saw with the Debianised gridengine system are due to a lack of these prerequisites:

1. check the hosts file for localhost.localdomain type entries. If these are present, they will cause host communication to fail. Ensure that, at minimum, there is an entry in the hosts file of the master for each exec node, and in the hosts file of the exec nodes there should be an entry for the master. For example:

I will set up a cluster between my desktop machine, KWIAT22 and my laptop, caleb.
/etc/hosts on KWIAT22 contains:

1

2

3

4

127.0.0.1 localhost

#127.0.0.1 localhost.localdomain localhost

129.67.46.129 KWIAT22

129.67.46.255 caleb

plus some other irrelevant entries. Note that localhost.localdomain is commented out.
/etc/hosts on caleb contains:

1

2

3

4

127.0.0.1 caleb

#127.0.0.1 localhost.localdomain localhost

129.67.46.255 caleb

129.67.46.129 KWIAT22

Note again, the localhost.localdomain entry has been commented out.

2. Java is required for inter-host communication. We will use Sun Java, as it is assumed to be most compatible with Sun GridEngine. Edit /etc/apt/sources.list and uncomment the entries for the partner repository:

1

2

deb http://archive.canonical.com/ubuntu maverick partner

deb-src http://archive.canonical.com/ubuntu maverick partner

Then install the JRE:

1 apt-get install sun-java6-jre

Check which version of java we've got selected:

1

2

3

4

root@caleb:~# java -version

java version "1.6.0_22"

OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1)

OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

From that we can see that I still have OpenJDK selected, so we change that:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15
root@caleb:~# update-alternatives --config java

There are 2 choices for the alternative java (providing /usr/bin/java).

Selection Path Priority Status

------------------------------------------------------------

* 0 /usr/lib/jvm/java-6-openjdk/jre/bin/java 1061 auto mode

1 /usr/lib/jvm/java-6-openjdk/jre/bin/java 1061 manual mode

2 /usr/lib/jvm/java-6-sun/jre/bin/java 63 manual mode

Press enter to keep the current choice[*], or type selection number: 2

update-alternatives: using /usr/lib/jvm/java-6-sun/jre/bin/java to provide /usr/bin/java (java) in manual mode.

root@caleb:~# java -version

java version "1.6.0_24"

Java(TM) SE Runtime Environment (build 1.6.0_24-b07)

Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Now that we have these prerequisites satisfied, we can install the relevant gridengine packages. Installing gridengine on Ubuntu systems is made simple by the packages. We can install the packages on the master node (in our case KWIAT22):

1 apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master

Configure SGE automatically? Yes
SGE cell name: default
SGE master hostname: KWIAT22 (this should be the fully qualified domain name of the SGE master, not localhost)

Output will typically look something like this:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

Reading package lists... Done

Building dependency tree

Reading state information... Done

The following extra packages will be installed:

gridengine-common

The following NEW packages will be installed:

gridengine-client gridengine-common gridengine-exec gridengine-master gridengine-qmon

0 upgraded, 5 newly installed, 0 to remove and 37 not upgraded.

Need to get 0 B/18.7 MB of archives.

After this operation, 44.8 MB of additional disk space will be used.

Do you want to continue [Y/n]?

Preconfiguring packages ...

Selecting previously deselected package gridengine-common.

(Reading database ... 372804 files and directories currently installed.)

Unpacking gridengine-common (from .../gridengine-common_6.2u5-1ubuntu1_all.deb) ...

Selecting previously deselected package gridengine-client.

Unpacking gridengine-client (from .../gridengine-client_6.2u5-1ubuntu1_amd64.deb) ...

Selecting previously deselected package gridengine-exec.

Unpacking gridengine-exec (from .../gridengine-exec_6.2u5-1ubuntu1_amd64.deb) ...

Selecting previously deselected package gridengine-master.

Unpacking gridengine-master (from .../gridengine-master_6.2u5-1ubuntu1_amd64.deb) ...

Selecting previously deselected package gridengine-qmon.

Unpacking gridengine-qmon (from .../gridengine-qmon_6.2u5-1ubuntu1_amd64.deb) ...

Processing triggers for man-db ...

Processing triggers for ureadahead ...

Setting up gridengine-common (6.2u5-1ubuntu1) ...

Creating config file /etc/default/gridengine with new version

Setting up gridengine-client (6.2u5-1ubuntu1) ...

Setting up gridengine-exec (6.2u5-1ubuntu1) ...

error: communication error for "KWIAT22/execd/1" running on port 6445: "can't bind socket"

error: commlib error: can't bind socket (no additional information available)

..........................

critical error: abort qmaster registration due to communication errors

daemonize error: child exited before sending daemonize state

Setting up gridengine-master (6.2u5-1ubuntu1) ...

Initializing cluster with the following parameters:

=&gt; SGE_ROOT: /var/lib/gridengine

=&gt; SGE_CELL: default

=&gt; Spool directory: /var/spool/gridengine/spooldb

=&gt; Initial manager user: sgeadmin

Initializing spool (/var/spool/gridengine/spooldb)

Initializing global configuration based on /usr/share/gridengine/default-configuration

Initializing complexes based on /usr/share/gridengine/centry

Initializing usersets based on /usr/share/gridengine/usersets

Adding user sgeadmin as a manager

Cluster creation complete

Setting up gridengine-qmon (6.2u5-1ubuntu1) ...

Note that the execd cannot bind the socket. This occurs because of a left-over execd that failed to stop from a previous install. It also results if you don't have java installed, as the execd won't respond to /etc/init.d/gridengine-exec stop without java. Also, if you're doing an apt-get purge gridengine-* to get back to a fresh slate, typically the execd will not be stopped properly, despite being removed from the system. This can be fixed by:

1

2

3

4

5

6

7
root@KWIAT22:~# ps aux |grep sge

sgeadmin 22244 0.0 0.0 135172 4940 ? Sl 17:42 0:00 /usr/lib/gridengine/sge_qmaster

sgeadmin 24272 0.0 0.0 58688 2500 ? Sl May16 0:22 /usr/lib/gridengine/sge_execd

root@KWIAT22:~# kill 24272

root@KWIAT22:~# /etc/init.d/gridengine-exec start

root@KWIAT22:~# /etc/init.d/gridengine-master restart

* Restarting Sun Grid Engine Master Scheduler sge_qmaster

The logfiles we can use for tracking down problems in communication between the qmaster and execd processes are not in the standard debian/ubuntu locations. Instead, they are stored in /var/spool/gridengine/execd/messages for the qmaster and /tmp/execd_messages.[pid] or /var/spool/gridengine/execd/messages for the execd processes. The log messages for our previous socket problem look like this (/tmp/execd_messages.24107):

1

2

3

4

05/16/2011 20:17:16| main|KWIAT22|E|communication error for "KWIAT22/execd/1" running on port 6445: "can't bind socket"

05/16/2011 20:17:17| main|KWIAT22|E|commlib error: can't bind socket (no additional information available)

05/16/2011 20:17:45| main|KWIAT22|C|abort qmaster registration due to communication errors

05/16/2011 20:17:47| main|KWIAT22|W|daemonize error: child exited before sending daemonize state

If you see any lines containing |E| then you have an error that must be addressed. Any lines with |W| are warnings, and it's probably wise to fix those too.

On the exec nodes:

1 apt-get install gridengine-exec

Configure SGE automatically? yes
SGE cell name: default
SGE master hostname: KWIAT22

After installing, you will see the following error in the /tmp/exed_messages.[pid] file and the process will exit:

1

2

05/18/2011 17:53:00| main|caleb|E|getting configuration: denied: host "caleb" is neither submit nor admin host

05/18/2011 17:53:05| main|caleb|C|can't get configuration qmaster - terminating

This occurs because the master doesn't yet know about the exec node. We need to set up a basic configuration on the master. We will use the documentation in /usr/share/doc/gridengine-common/README.Debian, which I will duplicate here, to form the basis of our configuration:

[gridengine users] Configure gridengine on CentOS 6.3

Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3
Date: Tue, 30 Oct 2012 11:27:49 +0100

Thank you for your reply.

> Am 30.10.2012 um 10:53 schrieb Petter Gustad:
> 
>> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>> 
>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
>> systems for some time. Now I'm trying to add some new nodes running
>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
> 


> Just use the version you have already in the shared /usr/sge or your
> particular mountpoint.

I should probably try this first, at least to verify that it's
working. But later I would like to migrate to the CentOS on all my
exechosts and leave the installation to somebody else.

> For exechosts there is no real installation necessary. It's
> sufficient to add the new exechosts as adminhosts, and then start
> the sgeexecd on the nodes (you might want to install it in
> /etc/init.d/sgeexecd or alike with appropriate links so that they
> start while booting). The script you will find in
> /usr/sge/default/common/sgeexecd.

I'll try this manual approach.

> During startup of the sgeexecd they will become exechosts
> automatically in SGE's list of exyechosts.

> NB: It's not advisable to mix different versions of SGE in a cluster
> (while it's fine to mix different platforms of the the same
> version).

OK. I will try to get the old version running first, then migrate to
the more recent version as I replace Gentoo with CentOS on the
exechosts.


> PS: You installed SGE in addition locally on each node with the
> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused
> having more than one version of the tools.

There is not PATH pointing to the old version. When I first tried the
old version was not even mounted.

However, I would assume CentOS users with no previous installation
would experince the same problem.

Thank you again for your helpful reply.

Best regards
//Petter


>> But the installation procedure seem to be somewhat different to what
>> I'm used to. So where can I find an installation guide for the CentOS
>> version?
>> 
>> My problem seem to be related to that sge_coshepherd and sge_shepherd
>> is missing. Is this a problem with the CentOS 6.3 package or is there
>> a different installation procedure on CentOS?
>> 
>> Here's the error message I get when I try to run inst_sge:
>> 
>> 
>>  # export SGE_ROOT=/usr/share/gridengine
>>  # sh ./inst_sge -x
>>  missing program >sge_coshepherd< in directory >./bin/lx26-amd64<
>>  missing program >sge_shepherd< in directory >./bin/lx26-amd64<
>> 
>>  Missing Grid Engine binaries!
>> 
>>  A complete installation needs the following binaries in >./bin/lx26-amd64<:
>> 
>>  qacct           qlogin          qrsh            sge_shepherd
>>  qalter          qmake           qselect         sge_coshepherd
>>  qconf           qmod            qsh             sge_execd
>>  qdel            qmon            qstat           sge_qmaster
>>  qhold           qresub          qsub            qhost
>>  qrls            qtcsh           sge_shadowd     qping
>>  qquota
>> 
>>  and the binaries in >./utilbin/lx26-amd64< should be:
>> 
>>  adminrun       gethostbyaddr  loadcheck      rlogin         uidgid
>>  authuser       checkprog      gethostbyname  now            rsh
>>  infotext       checkuser      gethostname    openssl        rshd
>>  filestat       getservbyname  qrsh_starter   testsuidroot
>> 
>>  Installation failed. Exit.
>> 
>> 
>> Thanks!
>> Best regards
>> //Petter
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users

[gridengine users] Configure gridengine on CentOS 6.3

Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3
Date: Tue, 30 Oct 2012 11:27:49 +0100

Thank you for your reply.

> Am 30.10.2012 um 10:53 schrieb Petter Gustad:
> 
>> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>> 
>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
>> systems for some time. Now I'm trying to add some new nodes running
>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
> 


> Just use the version you have already in the shared /usr/sge or your
> particular mountpoint.

I should probably try this first, at least to verify that it's
working. But later I would like to migrate to the CentOS on all my
exechosts and leave the installation to somebody else.

> For exechosts there is no real installation necessary. It's
> sufficient to add the new exechosts as adminhosts, and then start
> the sgeexecd on the nodes (you might want to install it in
> /etc/init.d/sgeexecd or alike with appropriate links so that they
> start while booting). The script you will find in
> /usr/sge/default/common/sgeexecd.

I'll try this manual approach.

> During startup of the sgeexecd they will become exechosts
> automatically in SGE's list of exyechosts.

> NB: It's not advisable to mix different versions of SGE in a cluster
> (while it's fine to mix different platforms of the the same
> version).

OK. I will try to get the old version running first, then migrate to
the more recent version as I replace Gentoo with CentOS on the
exechosts.


> PS: You installed SGE in addition locally on each node with the
> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused
> having more than one version of the tools.

There is not PATH pointing to the old version. When I first tried the
old version was not even mounted.

However, I would assume CentOS users with no previous installation
would experince the same problem.

Thank you again for your helpful reply.

Best regards
//Petter


>> But the installation procedure seem to be somewhat different to what
>> I'm used to. So where can I find an installation guide for the CentOS
>> version?
>> 
>> My problem seem to be related to that sge_coshepherd and sge_shepherd
>> is missing. Is this a problem with the CentOS 6.3 package or is there
>> a different installation procedure on CentOS?
>> 
>> Here's the error message I get when I try to run inst_sge:
>> 
>> 
>>  # export SGE_ROOT=/usr/share/gridengine
>>  # sh ./inst_sge -x
>>  missing program >sge_coshepherd< in directory >./bin/lx26-amd64<
>>  missing program >sge_shepherd< in directory >./bin/lx26-amd64<
>> 
>>  Missing Grid Engine binaries!
>> 
>>  A complete installation needs the following binaries in >./bin/lx26-amd64<:
>> 
>>  qacct           qlogin          qrsh            sge_shepherd
>>  qalter          qmake           qselect         sge_coshepherd
>>  qconf           qmod            qsh             sge_execd
>>  qdel            qmon            qstat           sge_qmaster
>>  qhold           qresub          qsub            qhost
>>  qrls            qtcsh           sge_shadowd     qping
>>  qquota
>> 
>>  and the binaries in >./utilbin/lx26-amd64< should be:
>> 
>>  adminrun       gethostbyaddr  loadcheck      rlogin         uidgid
>>  authuser       checkprog      gethostbyname  now            rsh
>>  infotext       checkuser      gethostname    openssl        rshd
>>  filestat       getservbyname  qrsh_starter   testsuidroot
>> 
>>  Installation failed. Exit.
>> 
>> 
>> Thanks!
>> Best regards
>> //Petter
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users

[gridengine users] Configure gridengine on CentOS 6.3

Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012


From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3
Date: Tue, 30 Oct 2012 11:27:49 +0100

Thank you for your reply.

> Am 30.10.2012 um 10:53 schrieb Petter Gustad:
> 
>> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>> 
>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
>> systems for some time. Now I'm trying to add some new nodes running
>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
> 


> Just use the version you have already in the shared /usr/sge or your
> particular mountpoint.

I should probably try this first, at least to verify that it's
working. But later I would like to migrate to the CentOS on all my
exechosts and leave the installation to somebody else.

> For exechosts there is no real installation necessary. It's
> sufficient to add the new exechosts as adminhosts, and then start
> the sgeexecd on the nodes (you might want to install it in
> /etc/init.d/sgeexecd or alike with appropriate links so that they
> start while booting). The script you will find in
> /usr/sge/default/common/sgeexecd.

I'll try this manual approach.

> During startup of the sgeexecd they will become exechosts
> automatically in SGE's list of exyechosts.

> NB: It's not advisable to mix different versions of SGE in a cluster
> (while it's fine to mix different platforms of the the same
> version).

OK. I will try to get the old version running first, then migrate to
the more recent version as I replace Gentoo with CentOS on the
exechosts.


> PS: You installed SGE in addition locally on each node with the
> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused
> having more than one version of the tools.

There is not PATH pointing to the old version. When I first tried the
old version was not even mounted.

However, I would assume CentOS users with no previous installation
would experince the same problem.

Thank you again for your helpful reply.

Best regards
//Petter


>> But the installation procedure seem to be somewhat different to what
>> I'm used to. So where can I find an installation guide for the CentOS
>> version?
>> 
>> My problem seem to be related to that sge_coshepherd and sge_shepherd
>> is missing. Is this a problem with the CentOS 6.3 package or is there
>> a different installation procedure on CentOS?
>> 
>> Here's the error message I get when I try to run inst_sge:
>> 
>> 
>>  # export SGE_ROOT=/usr/share/gridengine
>>  # sh ./inst_sge -x
>>  missing program >sge_coshepherd< in directory >./bin/lx26-amd64<
>>  missing program >sge_shepherd< in directory >./bin/lx26-amd64<
>> 
>>  Missing Grid Engine binaries!
>> 
>>  A complete installation needs the following binaries in >./bin/lx26-amd64<:
>> 
>>  qacct           qlogin          qrsh            sge_shepherd
>>  qalter          qmake           qselect         sge_coshepherd
>>  qconf           qmod            qsh             sge_execd
>>  qdel            qmon            qstat           sge_qmaster
>>  qhold           qresub          qsub            qhost
>>  qrls            qtcsh           sge_shadowd     qping
>>  qquota
>> 
>>  and the binaries in >./utilbin/lx26-amd64< should be:
>> 
>>  adminrun       gethostbyaddr  loadcheck      rlogin         uidgid
>>  authuser       checkprog      gethostbyname  now            rsh
>>  infotext       checkuser      gethostname    openssl        rshd
>>  filestat       getservbyname  qrsh_starter   testsuidroot
>> 
>>  Installation failed. Exit.
>> 
>> 
>> Thanks!
>> Best regards
>> //Petter
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users

[gridengine users] Configure gridengine on CentOS 6.3

Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012


From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3
Date: Tue, 30 Oct 2012 11:27:49 +0100

Thank you for your reply.

> Am 30.10.2012 um 10:53 schrieb Petter Gustad:
> 
>> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>> 
>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
>> systems for some time. Now I'm trying to add some new nodes running
>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
> 


> Just use the version you have already in the shared /usr/sge or your
> particular mountpoint.

I should probably try this first, at least to verify that it's
working. But later I would like to migrate to the CentOS on all my
exechosts and leave the installation to somebody else.

> For exechosts there is no real installation necessary. It's
> sufficient to add the new exechosts as adminhosts, and then start
> the sgeexecd on the nodes (you might want to install it in
> /etc/init.d/sgeexecd or alike with appropriate links so that they
> start while booting). The script you will find in
> /usr/sge/default/common/sgeexecd.

I'll try this manual approach.

> During startup of the sgeexecd they will become exechosts
> automatically in SGE's list of exyechosts.

> NB: It's not advisable to mix different versions of SGE in a cluster
> (while it's fine to mix different platforms of the the same
> version).

OK. I will try to get the old version running first, then migrate to
the more recent version as I replace Gentoo with CentOS on the
exechosts.


> PS: You installed SGE in addition locally on each node with the
> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused
> having more than one version of the tools.

There is not PATH pointing to the old version. When I first tried the
old version was not even mounted.

However, I would assume CentOS users with no previous installation
would experince the same problem.

Thank you again for your helpful reply.

Best regards
//Petter


>> But the installation procedure seem to be somewhat different to what
>> I'm used to. So where can I find an installation guide for the CentOS
>> version?
>> 
>> My problem seem to be related to that sge_coshepherd and sge_shepherd
>> is missing. Is this a problem with the CentOS 6.3 package or is there
>> a different installation procedure on CentOS?
>> 
>> Here's the error message I get when I try to run inst_sge:
>> 
>> 
>>  # export SGE_ROOT=/usr/share/gridengine
>>  # sh ./inst_sge -x
>>  missing program >sge_coshepherd< in directory >./bin/lx26-amd64<
>>  missing program >sge_shepherd< in directory >./bin/lx26-amd64<
>> 
>>  Missing Grid Engine binaries!
>> 
>>  A complete installation needs the following binaries in >./bin/lx26-amd64<:
>> 
>>  qacct           qlogin          qrsh            sge_shepherd
>>  qalter          qmake           qselect         sge_coshepherd
>>  qconf           qmod            qsh             sge_execd
>>  qdel            qmon            qstat           sge_qmaster
>>  qhold           qresub          qsub            qhost
>>  qrls            qtcsh           sge_shadowd     qping
>>  qquota
>> 
>>  and the binaries in >./utilbin/lx26-amd64< should be:
>> 
>>  adminrun       gethostbyaddr  loadcheck      rlogin         uidgid
>>  authuser       checkprog      gethostbyname  now            rsh
>>  infotext       checkuser      gethostname    openssl        rshd
>>  filestat       getservbyname  qrsh_starter   testsuidroot
>> 
>>  Installation failed. Exit.
>> 
>> 
>> Thanks!
>> Best regards
>> //Petter
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
Reuti reuti at staff.uni-marburg.de
Tue Oct 30 17:23:31 UTC 2012


Hi,

Am 30.10.2012 um 14:20 schrieb Petter Gustad:

>> Am 30.10.2012 um 10:53 schrieb Petter Gustad:
>> 
>>> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>>> 
>>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
>>> systems for some time. Now I'm trying to add some new nodes running
>>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
>> 
> 
> 
>> Just use the version you have already in the shared /usr/sge or your
>> particular mountpoint.
> 
> I should probably try this first, at least to verify that it's
> working. But later I would like to migrate to the CentOS on all my
> exechosts and leave the installation to somebody else.

Then it's bets to start with the qmaster.


>> For exechosts there is no real installation necessary. It's
>> sufficient to add the new exechosts as adminhosts, and then start
>> the sgeexecd on the nodes (you might want to install it in
>> /etc/init.d/sgeexecd or alike with appropriate links so that they
>> start while booting). The script you will find in
>> /usr/sge/default/common/sgeexecd.
> 
> I'll try this manual approach.
> 
>> During startup of the sgeexecd they will become exechosts
>> automatically in SGE's list of exyechosts.
> 
>> NB: It's not advisable to mix different versions of SGE in a cluster
>> (while it's fine to mix different platforms of the the same
>> version).
> 
> OK. I will try to get the old version running first, then migrate to
> the more recent version as I replace Gentoo with CentOS on the
> exechosts.

Only the exechost? Then I suggest to reinstall the qmaster on the head node with the version you want to use on all exechosts.


>> PS: You installed SGE in addition locally on each node with the
>> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused
>> having more than one version of the tools.
> 
> There is not PATH pointing to the old version. When I first tried the
> old version was not even mounted.

It's quite usual to mount /usr/sge (or alike) and /home in the cluster and have only SGE's spool directory local (e.g. /var/spool/sge):

http://arc.liv.ac.uk/SGE/howto/nfsreduce.html

-- Reuti


> However, I would assume CentOS users with no previous installation
> would experince the same problem.
> 
> Thank you again for your helpful reply.
> 
> Best regards
> //Petter
> 
> 
>>> But the installation procedure seem to be somewhat different to what
>>> I'm used to. So where can I find an installation guide for the CentOS
>>> version?
>>> 
>>> My problem seem to be related to that sge_coshepherd and sge_shepherd
>>> is missing. Is this a problem with the CentOS 6.3 package or is there
>>> a different installation procedure on CentOS?
>>> 
>>> Here's the error message I get when I try to run inst_sge:
>>> 
>>> 
>>> # export SGE_ROOT=/usr/share/gridengine
>>> # sh ./inst_sge -x
>>> missing program >sge_coshepherd< in directory >./bin/lx26-amd64<
>>> missing program >sge_shepherd< in directory >./bin/lx26-amd64<
>>> 
>>> Missing Grid Engine binaries!
>>> 
>>> A complete installation needs the following binaries in >./bin/lx26-amd64<:
>>> 
>>> qacct           qlogin          qrsh            sge_shepherd
>>> qalter          qmake           qselect         sge_coshepherd
>>> qconf           qmod            qsh             sge_execd
>>> qdel            qmon            qstat           sge_qmaster
>>> qhold           qresub          qsub            qhost
>>> qrls            qtcsh           sge_shadowd     qping
>>> qquota
>>> 
>>> and the binaries in >./utilbin/lx26-amd64< should be:
>>> 
>>> adminrun       gethostbyaddr  loadcheck      rlogin         uidgid
>>> authuser       checkprog      gethostbyname  now            rsh
>>> infotext       checkuser      gethostname    openssl        rshd
>>> filestat       getservbyname  qrsh_starter   testsuidroot
>>> 
>>> Installation failed. Exit.
>>> 
>>> 
>>> Thanks!
>>> Best regards
>>> //Petter
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
Orion Poplawski orion at cora.nwra.com
Tue Oct 30 14:46:37 UTC 2012


On 10/30/2012 03:53 AM, Petter Gustad wrote:
>
> Does anybody have a pointer to GRE installation docs for CentOS 6.3?
>
> I've been running GRE version 6.2u5p2 (built from source) on Gentoo
> systems for some time. Now I'm trying to add some new nodes running
> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package.
>
> But the installation procedure seem to be somewhat different to what
> I'm used to. So where can I find an installation guide for the CentOS
> version?

Strictly speaking it's not CentOS, but EPEL.  Install guide is in:

/usr/share/doc/gridengine-6.2u5/README

> My problem seem to be related to that sge_coshepherd and sge_shepherd
> is missing. Is this a problem with the CentOS 6.3 package or is there
> a different installation procedure on CentOS?

Have you installed the gridengine-execd package on the exec hosts?  Yes, the 
install is a little different.  See the README.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com

Setting up Sun Grid Engine on Ubuntu - Sandbox Chronicles

Looks like on Debian RPMs are of higher quality and, unlike ROMs from EPEL, might work out of the box... But again it looks like they changes too much from Sun classic version by reconfiguring the whole distribution and eliminating classic Sun installer script. Which make it a mixed blessing. In no way it is of the same quality as Son of Grid Engine 8.1.8 which is preferable open source distribution of SGE.
June 9, 2011 | wiki.unixh4cks.com

On master node

Installing prerequisites

 apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-nonfree-syriac xfonts-75dpi xfonts-100dpi xfs xfstt 
nano /etc/apt/sources.list

Uncomment these two lines

deb http://archive.canonical.com/ubuntu natty partner
deb-src http://archive.canonical.com/ubuntu natty partner

Install Java Runtime.

 apt-get install sun-java6-jre 

If you have mutiple java installations select the required (sun java 1.6 or higher) using

update-alternatives --config java
nano /etc/hosts

Setting up environment

192.168.122.75       sge0.shadow.local sge0
192.168.122.115       sge1.shadow.local sge0

Install Gridengine master,client and exec packages on master node

apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec 

Configure SGE automatically? Yes SGE cell name: default SGE master hostname: sge0.shadow.local (this should be the fully qualified domain name of the SGE master, not localhost)

On exec node

Install sun-java6-jre as above and set proper host definition on /etc/host file.

Install Gridengine exec package

apt-get install gridengine-exec 

See status of exec process

root@sge1:~# cat /tmp/execd_messages.[PID]
06/08/2011 21:48:52|  main|sge1|E|can't connect to service
06/08/2011 21:48:52|  main|sge1|E|can't get configuration from qmaster -- backgrounding

This occurs because the master doesn't yet know about the exec node. We need to set up a basic configuration on the master

Configuration

We will use the documentation in /usr/share/doc/gridengine-common/README.Debian

Syntax : sudo -u sgeadmin qconf -am user_name

root@sge0:~# sudo -u sgeadmin qconf -am basil
[email protected] added "basil" to manager list

Syntax: qconf -au myuser users

root@sge0:~# qconf -au basil users
added "basil" to access list "users"

Syntax : qconf -as myhost.mydomain

root@sge0:~# qconf -as sge0.shadow.local
sge0.shadow.local added to submit host list

Syntax : qconf -ahgrp @allhosts

root@sge0:~# qconf -ahgrp @allhosts           # Just save the file without modifying it
[email protected] added "@allhosts" to host group list

Syntax : qconf -aattr hostgroup hostlist myhost.mydomain @allhosts

root@sge0:~# qconf -aattr hostgroup hostlist sge0.shadow.local @allhosts
[email protected] modified "@allhosts" in host group list

Syntax : qconf -aq main.q

root@sge0:~# qconf -aq main.q   # just save the file without modifying it
[email protected] added "main.q" to cluster queue list

Syntax : qconf -aattr queue hostlist @allhosts main.q

root@sge0:~# qconf -aattr queue hostlist @allhosts main.q
[email protected] modified "main.q" in cluster queue list

Syntax : qconf -aattr queue slots "[myhost.mydomain=1]" main.q

root@sge0:~# qconf -aattr queue slots "2, [sge0.shadow.local=3]" main.q
[email protected] modified "main.q" in cluster queue list

2 by default for all nodes, 1 specifically for sge0.shadow.local, which leaves 1 of the 2 cpus free for the master process.

adding Exec node to the grid

We then add sge1.shadow.local as a submit and exec host

root@sge0:~# qconf -as sge1.shadow.local
sge1.shadow.local added to submit host list
hostname              sge1.shadow.local
load_scaling          NONE
complex_values        NONE
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE
root@sge0:~# qconf -aattr hostgroup hostlist sge1.shadow.local @allhosts
[email protected] modified "@allhosts" in host group list

Kill the sge_execd process in exec node and then start it via init.d script. Check that it doesn't create a log file in /tmp/execd_messages.[pid]. If it doesn't then it's OK.

Back on our master node, a qstat -f should now show us all set up. Use qmon if the master node have X running.

root@sge0:~# qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
[email protected]       BIP   0/0/3          0.11     lx26-amd64    
---------------------------------------------------------------------------------
[email protected]       BIP   0/0/2          0.02     lx26-amd64    

SGE installation

FslSge - FslWiki

fsl.fmrib.ox.ac.uk

This is a quick walk through to get Grid Engine going on Linux for those who would like to use it for something like FSL. This documentation is a little old, being written when the Grid Engine software was owned by Sun and often referred to as SGE (Sun Grid Engine). However, this covers the basic requirements. A quick start guide for Ubuntu/Debian is available here, but more detailed setup can be found on this page.

Since the demise of the open source (Sun) Grid Engine, various ports have sprung up. Ubuntu/Debian package the last publicly available release (6.2u5), but users of Red Hat variants (CentOS, Scientific Linux) or Debian/Ubuntu users wishing to use a more modern release should look to installing Son of Grid Engine which makes available RPM and DEB packages and is still actively maintained (last update November 2013).

Grid Engine generally consists of one master (qmaster) and a number of execute (exec) hosts, note that the qmaster machine can also be an exec host which is fine for small deployments, but large clusters should look to keeping these functions separate.

This documentation was originally produced by A. Janke ([email protected]) and is now maintained by the FSL team.

Sun Grid Engine - Science IT

sit.auckland.ac.nz

The Sun Grid Engine makes it easy for users to run compute jobs.

http://gridengine.sunsource.net/

The Sun Grid Engine (SGE) allows jobs to be queued for running on a suitable compute host. Suitable means that there is currently spare CPU time, sufficient memory for your job, and any other number of characteristics which you wish to test. It can also be used to launch jobs running MPI.

From an administrator's perspective it allows a lot more control over jobs. If a host is overloaded (due to no fault of SGE's - sometimes jobs consume > 100% CPU) jobs can easily be suspended or rescheduled to run on a less-loaded host. Jobs will be run, and they'll be resubmitted until they complete (depending on job options).

A variety of material regarding SGE is on Stephen Cope's Sun Grid Engine page. This covers both administration, using SGE, and other common questions from users (namely, "I want my job running on the fastest machine, now.")

Available Sun Grid Engine installations:

GridEngine - FarmShare

web.stanford.edu

We're using the Debian packages of "Sun Grid Engine" which isn't quite "Sun" anymore since Oracle bought Sun, and the Debian packages are a bit behind the current forks of Open Grid Engine or Son of Grid Engine or Univa Grid Engine.

Setting up Sun Grid Engine with three nodes on Debian

June 05, 2012 | Lindqvist
Finally, I've got nfs set up to share a folder from the front node (~/jobs) to all my subnodes. See here for instructions on how to set it up: http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-sharing-folder.html

When you use ecce, you can and SHOULD use local scratch folders i.e. use your nfs shared folder as the runtime folder, but set scratch to e.g. /tmp which isn't an nfs exported folder.


Before you start, stop and purge
if you've tried installing and configuring gridengine in the past, there may be processes and files which will interfere. On each computer do
ps aux|grep sge
use sudo kill to kill any sge processes
Then
sudo apt-get purge gridengine-*


First install sun/oracle java on all nodes.

[UPDATE 24 Aug 2013: openjdk-6-jre or openjdk-7-jre work fine, so you can skip this]

There's no sun/oracle java in the debian testing repos anymore, so we'll follow this: http://verahill.blogspot.com.au/2012/04/installing-sunoracle-java-in-debian.html

sudo apt-get install java-package
Download the jre-6u31-linux-x64.bin from here: http://java.com/en/download/manual.jsp?locale=en

make-jpkg jre-6u31-linux-x64.bin

sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.deb

Then select your shiny oracle java by doing:
sudo update-alternatives --config java
sudo update-alternatives --config javaws

Do that one every node, front and subnodes. You don't have to do all the steps though: you just built oracle-j2re1.6_1.6.0+update31_amd64.deb so copy that to your nodes, do sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.deb and then do the sudo update-alternatives dance.




Front node:
sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
(at the moment this installs v 6.2u5-7)

I used the following:

Configure automatically: yes
Cell name: rupert
Master hostname: beryllium
=> SGE_ROOT: /var/lib/gridengine
=> SGE_CELL: rupert
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin

Once it was installed, I added myself as an sgeadmin:
sudo -u sgeadmin qconf -am ${USER}

sgeadmin@beryllium added "verahill" to manager list
and to the user list:
qconf -au ${USER} users
added "verahill" to access list "users"
We add beryllium as a submit host
qconf -as beryllium
beryllium added to submit host list
Create the group allhosts
qconf -ahgrp @allhosts
1 group_name @allhosts
2 hostlist NONE
I made no changes

Add beryllium to the hostlist
qconf -aattr hostgroup hostlist beryllium @allhosts

verahill@beryllium modified "@allhosts" in host group list
qconf -aq main.q

This opens another text file. I made no changes.

verahill@beryllium added "main.q" to cluster queue list
Add the host group to the queue:

qconf -aattr queue hostlist @allhosts main.q

verahill@beryllium modified "main.q" in cluster queue list
1 core on beryllium is added to SGE:

qconf -aattr queue slots "[beryllium=1]" main.q

verahill@beryllium modified "main.q" in cluster queue list
Add execution host
qconf -ae
which opens a text file in vim

I edited hostname (boron) but nothing else. Saving returns

added host boron to exec host list
Add boron as a submit host
qconf -as boron
boron added to submit host list
Add 3 cores for boron:
qconf -aattr queue slots "[boron=3]" main.q


Add boron to the queue
qconf -aattr hostgroup hostlist boron @allhosts

Here's my history list in case you can't be bother reading everything in detail above.

2015 sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
2016 sudo -u sgeadmin qconf -am ${USER}
2017 qconf -help
2018 qconf user_list
2019 qconf -au ${USER} users
2020 qconf -as beryllium
2021 qconf -ahgrp @allhosts
2022 qconf -aattr hostgroup hostlist beryllium @allhosts
2023 qconf -aq main.q
2024 qconf -aattr queue hostlist @allhosts main.q
2025 qconf -aattr queue slots "[beryllium=1]" main.q
2026 qconf -as boron
2027 qconf -ae
2028 qconf -aattr hostgroup hostlist beryllium @allhosts
2029 qconf -aattr queue slots "[boron=3]" main.q
2030 qconf -aattr hostgroup hostlist boron @allhosts

Next, set up your subnodes:

My example here is a subnode called boron.

On the subnode:
sudo apt-get install gridengine-exec gridengine-client

Configure automatically: yes
Cell name: rupert
Master hostname: beryllium
This node is called boron.

Check whether sge_execd got start after the install
ps aux|grep sge

sgeadmin 25091 0.0 0.0 31712 1968 ? Sl 13:54 0:00 /usr/lib/gridengine/sge_execd
If not, and only if not, do


/etc/init.d/gridengine-exec start


cat /tmp/execd_messages.*

If there's no message corresponding to the current iteration of sge (i.e. you may have old error messages from earlier attempts) then you're probably in a good place.

Back to the front node:

qhost

HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
beryllium lx26-amd64 6 0.57 7.8G 3.9G 14.9G 597.7M
boron lx26-amd64 3 0.62 3.8G 255.6M 14.9G 0.0
If the exec node isn't recognised (i.e. it's listed but no cpu info or anything else) then you're in a dark place. Probably you'll find a message about "request for user soandso does not match credentials" in your /tmp/execd_messages.* files on the exec node. The only way I got that solved was stopping all sge processes everywhere, purging all gridengine-* packages on all nodes and starting from the beginning -- hence why I posted the history output above.

qstat -f


queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
main.q@beryllium BIP 0/0/1 0.64 lx26-amd64
---------------------------------------------------------------------------------
main.q@boron BIP 0/0/3 0.72 lx26-amd64

gengine-chef-cookbook-install_manual.md at master · vpenso-gengine-chef-cookbook

GitHub

Debian provides binaries for Grid Engine with the packages: gridengine-master, gridengine-exec, gridengine-client. The queue master stores logging in /var/spool/gridengine/qmaster/messages. It contains scheduling decisions and error information about the daemon as well as failed jobs. The corresponding daemon sge_qmaster needs to be running in order to accept jobs. You can check this by looking for processes from the user sgeadmin. Control the master daemons using the script /etc/init.d/gridengine-master. In order to accept jobs from the queue master each execution node needs to have a correctly configured sge_execd daemon running under the user account sgeadmin. Control the execution daemons using the init-script /etc/init.d/gridengine-exec.

In case of communication problem between queue master and the exec node lookout for log files like /tmp/exed_messages.[pid]. Also the queue master indicates authorization problems with execution nodes in its log-file. The job spool directory is located in /var/spool/gridengine/execd/.

Installation

The most simple setup configures a single machine to host the Grid Engine queue master, to act as an execution node and to be an job submit node with client command-line interface. The following example is build with a virtual machine named lxdev01.devops.test running Debian Wheezy as operating system.

" apt-get install gridengine-master gridengine-exec gridengine-client
[...SNIP...]
" /etc/init.d/gridengine-exec start
" qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
lxdev01.devops.test     lx26-amd64      1  0.42  497.0M   64.7M     0.0     0.0

After installing all the packages the queue master daemon sge_qmaster should be running. Once the init-script gridengine-exec starts an instance of sge_execd daemon the host can execute jobs. Before a job can be submitted a host group @default is defined, which in turn is used to configure a queue default.

" qconf -ahgrp @default
[email protected] added "@default" to host group list
" qconf -shgrp @default
group_name @default
hostlist lxdev01.devops.test
" qconf -aq default
[email protected] added "default" to cluster queue list
" qconf -sq default | head -2
qname                 default
hostlist              @default

Last thing to do is to add the host to the list of submit nodes.

" qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
default                           0.02      0      0      1      1      0      0
" qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
[email protected]    BIP   0/0/1          0.01     lx26-amd64
" qconf -as $(hostname -f)
lxdev01.devops.test added to submit host list

Installation and configuration is done with root privileges, to submit the first job a user account devops is used.

" cat echo.sge
echo $USER@`hostname`:`pwd`
" qsub -j y -o /tmp/job.log -wd /tmp echo.sge
Your job 1 ("echo.sge") has been submitted
" qstat
job-ID  prior   name       user         state submit/start at     queue    slots
--------------------------------------------------------------------------------
      1 0.00000 echo.sge   devops       qw    12/06/2012 13:48:25          1
" cat /tmp/job.log 
devops@lxdev01:/tmp
" qacct -j 1
==============================================================
qname        default             
hostname     lxdev01.devops.test 
group        devops              
owner        devops              
project      NONE                
department   defaultdepartment   
jobname      echo.sge      
[...SNIP...]

Adding Another Execution-Node

The actually build a "cluster" of machines at least a second execution node lxdev02.devops.test is needed. Before this node is installed we can add it to the @default host group.

" qconf -mhgrp @default
lxdev01.devops.test modified "@default" in host group list
" qconf -shgrp @default
group_name @default
hostlist lxdev01.devops.test lxdev02.devops.test
" qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
lxdev01.devops.test     lx26-amd64      1  0.01  497.0M   65.5M     0.0     0.0
lxdev02.devops.test     -               -     -       -       -       -       -

On the node itself install only the execution node package and configure the address of the queue master in the file /var/lib/gridengine/default/common/act_qmaster.

" apt-get install gridengine-exec
[...SNIP...]
" echo "lxdev01.devops.test" > /var/lib/gridengine/default/common/act_qmaster
" service gridengine-exec restart
Restarting Sun Grid Engine Execution Daemon: sge_execd.

After restarting the execution daemon, it should register with the queue master.

" qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
lxdev01.devops.test     lx26-amd64      1  0.01  497.0M   65.7M     0.0     0.0
lxdev02.devops.test     lx26-amd64      1  0.14 1003.0M   64.5M     0.0     0.0
" for i in {1..10}; do qsub -b y sleep -- 10 ; done
Your job 7 ("sleep") has been submitted
Your job 8 ("sleep") has been submitted
Your job 9 ("sleep") has been submitted
Your job 10 ("sleep") has been submitted
Your job 11 ("sleep") has been submitted
Your job 12 ("sleep") has been submitted
Your job 13 ("sleep") has been submitted
Your job 14 ("sleep") has been submitted
Your job 15 ("sleep") has been submitted
Your job 16 ("sleep") has been submitted



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019