Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Grid Engine As a High Quality Unix/Linux Batch System

News Cluster job schedulers Recommended Links Implementations Son of Grid Engine Grid Scheduler Documentation
Starting and Killing SGE Daemons SGE hosts Submitting parallel OpenMPI jobs Monitoring and Controlling Jobs Monitoring Queues Job or Queue Reported in Error State E
Troubleshooting Monitoring and Controlling Jobs Monitoring Queues Queue instance in AU state SGE hostgroups Installation of SGE on CentOS 7
SGE Queues SGE Queues SGE Jobs SGE hostgroups    
Commands qalter qstat qsub qrsh qmod
qacct qdel qhost qhold qconf  
SGE Parallel Environment qsub SGE Submit Scripts Submitting binaries in SGE SGE hostgroups Message Passing Interface
Managing User Access Resource Quota Sets SGE Consumable Resources Restricting number of slots per server Gridengine diag tool Backup of SGE configuration
Installation of SCE on a small set of multicore servers Installation of the Master Host Installation of the Execution Hosts sge_execd - Sun Grid Engine job execution agent SGE shepherd Usage of NFS
SGE cheat sheet History Glossary Tips Humor Etc

Introduction

Grid Engine which is often called Sun Grid Engine (SGE) is a software classic. It is a batch jobs controller like  batch command on steroids, not so much a typical scheduler. At one point Sun open source the code, so open source version exists. It is the most powerful (albeit specialized) open source scheduler in existence. This is one of most valuable contributions of Sun to open source community as it provides industrial strength batch scheduler for Unix/Linux. 

Again this is one of the few classic Unix software systems. SGE 6.2u7 as released by Sun has all signs of the software classic. It inherited from Sun days a fairly good documentation (although software vandals  form Oracle destroyed a lot of valuable Sun documents). Any engineer or scientist can read the SGE User Manual and Installation Guide. Then install it (it set up a single queue all.q that can be used immediately), and start using it for his/her  needs in a day or two using the defaults without any training. As long as the  networking is reliable and jobs are submitted correctly, SGE runs them with nearly zero administration.

SGE is an very powerful and flexible batch system, that probably should became standard Linux subsystem  replacing or supplementing very basic batch command. It is available in several Linux  distributions such as Debian and Ubuntu as installable software package from the main depository. It is available from "other" depositories for CentOS, RHEL and Suse.

SGE has many options to help effectively use all of computational resources -- grid consisting of head node and computational nodes, each with certain number of cores (aka slots).

But the back side of power and flexibility is complexity. It is a complex system that requires study. You need carefully study the man pages and manuals to get most out of it.  The SGE mailing list is also a great educational resource.  Don't hesitate to ask questions. Then when you become an expert you can help others to get to speed with the product.  Installation is easy, but it usually take a from six month to a year for an isolated person to master the basics (much less if you have at least one expert on the floor). But like with any complex and powerful system even admins with 10years experience probably know only 60-70% of the SGE. 

Now that the pieces are falling into place, after Oracle's acquisition of Sun Microsystems and then abandoning the product, we can see that open source can help "vendor-proof" important parts of Unix. Unix did not have a decent batch scheduler before Grid Engine and now it has it. Grid Engine is alive and well, with a blog, a mailing list, a git repository, and even commercial version from Univa. Source code repositories can also be found from the Open Grid Scheduler (Grid Engine 2011.11 is compatible with Sun Grid Engine 6.2u7 ) and Son of Grid Engine projects. Open Grid Scheduler looks like abandonware (user group is active), while Son of Grid Engine is actively developed and currently represents the most viable open source SGE implementation.

As of version 8.1.8 it  is the most well debugged open source distribution. It might be especially attractive for those who what have experience with building software, but can be used by everybody on RHEL for which precompiled binaries exist.

Installation is pretty raw, but I tried to compensate for that by creating several pages which together document installation process of RHEL 6.5 or 6.6 pretty well:

Even in the present form they are definitely more clear and useful then old Sun 6.2u5 installation documentation ;-).

Most of the SGE discussion uses the term cluster, but SGE is not linked to cluster technology in any meaningful way. In reality it designed to operate on a heterogeneous server farm.

We will use the term "server farm" here as an alternative and less ambitious term then the term "grid".

The default installation of Grid Engine assumes that the $SGE_ROOT directory (root directory for Grid Engine installation) is on a shared (for example by NFS) filesystem accessible by all hosts.

Right now SGE exists in several competing versions (see SGE implementations) but the last version of Son of Grid engine produced was 8.1.9. After that  Dave Love abandoned the project. So while it can be installed on RHEL7 and works, the future of SGE is again in limbo.

The last version of Son grid engine was released in March 2016 (all versions listed below are also downloadable from Son of Grid Engine (SGE) - Browse -SGE-releases at SourceForge.net):

 

Key concepts

The Grid Engine system has functions typical for any powerful batch system:

But as a powerful batch system it is oriented on running multiple jobs optimally on the available resources Typically multiple computers(nodes) of a computational cluster. In its simplest form, a grid appears to users as a large system that provides a single point of access to multiple computers.

In other words grid is just a loose confederation of different computers which can run different OSes connected by regular TCP/IP links. In this sense it is close to the concept of a server farm. Grid engine does not care about uniformity of a server farm and along with scheduling provides some central administration and monitoring capabilities to server farm environment.

SGE enables to distribute jobs across a grid and treat the grid as a single computational resource. It accepts jobs submitted by user(s) and schedule them to be run on appropriate systems in the grid. Users can submit as many jobs at a time as they want without being concerned about where the jobs run.

The main purpose of a batch system like SGE is to optimally utilize system resources that are present in a server farm. To schedule jobs ob available nodes in the most efficient way possible.

Every aspect of the batch system is accessible through the perl API. There is almost no documentation but a few sample scripts in gridengine/source/experimental/perlgui and on Internet such as by Wolfgand Frieebel from DESY ( see ifh.de) can be used as a guidance

Grid Engine architecture is structured around two main concepts:

Queue

A queue is a container for a class of jobs that are allowed to run on one or more hosts concurrently. Logically queue is a child of parallel environment (see below) although it can have several such parents.  It defines set of hosts and limitation of resources on those hosts.

A queue can reside on a single host, or a queue can extend across multiple hosts. The latter are called server farm queues. Server farm queues enable users and administrators to work with a server farm of execution hosts by means of a single queue configuration. Each host that is attached to the head node can belong to one of more queues

A queue determines certain job attributes. Association with a queue affects some of the things that can happen to a job. For example, if a queue is suspended, all jobs associated with that queue are also suspended.

Grid Engine has always have one default queue called all.q, which is created during  the initial installation and updated each time you add another execution host. You can  can have several additional queries each of them defining the set of host to run the jobs each with own computational requirements, for example, number of CPUs (ala slots) . The problem here is that without special measures queues are independent and if they contain the same set of nodes oversubscription can easily occur.

Each job should not exceed maximum parameters defined in queue (directly or indirectly via parralal environment). Then SGE scheduler can optimize the job mix for available resources by selecting the most suitable job from the input query and sending it to the most appropriate node of a grid.

Queue defines class of jobs that consume computer resources in a similar way. It also define list of computational nodes on which such jobs can be run.

Jobs typically are submitted to a queue.  

In the book Building N1™ Grid Solutions Preparing, Architecting, and Implementing Service-Centric Data Centers we can find an interesting although overblown statement:

The N1 part of the name was never intended to be a product name or a strategy name visible outside of Sun. The name leaked out and stuck. It is the abbreviation for the original project name “Network-1.” The SUN1 workstation was Sun's first workstation. It was designed specifically to be connected to the network.

N1 Grid systems are the first systems intended to be built with the network at their core and be based on the principle that an IP-based network is effectively the system bus.

Parallel environment

Parallel environment (PE) is the central notion of SGE and represents  a set of settings that tell Grid Engine how to start, stop, and manage jobs run by the class of  queues that are using this PE.

It sets the maximum number of slot that can be assigned to all jobs within a given queue. It also set some parameters for parallel messaging framework such as MPI, that is used by parallel jobs.

The usual syntax applies:

Parallel environment is the defining characteristic of each queue. Needs to be specified in correctly for queue to work. It is specified in pe_list attribute which can contain a single PE or list of PEs.  For example:

pe_list               make mpi mpi_fill_up

Each parallel environment determines a class of queues that use it and has several important attributes are:

  1. slots - the maximum number of job slots that the parallel environment is allowed to occupy at once
  2. allocation_rule" -> see man page. $pe_slots will allocate all slots for that job on a single host. Other rules support to schedule the job on multiple machines.
  3. control_slaves -> when set to "true" Grid Engine takes care about  starting the slave MPI taks. In this case MPI should be compiled with the option -with_sge
  4. job_is_first_task  The job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates that the Sun Grid Engine job script already contains one of the tasks of the parallel application (the number of slots reserved for the job is the number of slots requested with the -pe switch), while a value of FALSE indicates that the job script (and its child processes) is not part of the parallel program (the number of slots reserved for the job is the number of slots requested with the -pe switch + 1).

    If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set to FALSE, the job_is_first_task parameter influences the accounting for the job: A value of TRUE means that accounting for cpu and requested memory gets multiplied by the number of slots requested with the -pe switch, if job_is_first_task is set to FALSE, the accounting information gets multiplied by number of slots + 1.
     

  5. accounting_summary This parameter is only checked if control_slaves (see above) is set to TRUE and thus Sun Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8). In this case, accounting information is available for every single slave task started by Sun Grid Engine.

    The accounting_summary parameter can be set to TRUE or FALSE. A value of TRUE indicates that only a single accounting record is written to the accounting(5) file, containing the accounting summary of the whole job including all slave tasks, while a value of FALSE indicates an individual accounting(5) record is written for every slave task, as well as for the master task.

    Note:
    When running tightly integrated jobs with SHARETREE_RESERVED_USAGE set, and with having accounting_summary enabled in the parallel environment, reserved usage will only be reported by the master task of the parallel job. No per parallel task usage records will be sent from execd to qmaster, which can significantly reduce load on qmaster when running large tightly integrated parallel jobs.

Some important details are well explained in the blog post Configuring a New Parallel Environment

Architecture

Grid generally consists of a head node and computational nodes.  Head node typically runs sge_master and often called master host. Master host can and often is be the source of export of NFS to computational nodes but this is not necessary.

Daemons

Two daemons provide the functionality of the Grid Engine system. They are started via init scripts.

Documents and code location

Documentation for such a complex and powerful system is fragmentary and generally is of low quality. Even some man pages contain questionable information. Many does not explain features available well, or at all.

This is actually why this set of pages was created: to compensate for insufficient documentation for SGE. 

Although version of SGE generally are compatible, some features implementation depends on version used. See history for the list of major implementations.

Documentation to the last opensource version produced by Sun (version 6.2u5) is floating on the Internet. See, for example:

There are docs for older versions too,

And some presentations

Some old Sun Blueprints about SGe still can be found too. But generally Oracle behaved horribly bad as a trustee of Sun documentation portal. They proved to be simply vandals in this particular respect: discarding almost everything without mercy, destroying considerable value and an important part of Sun heritage.

Moreover, those documents organized into historical website might still can earn some money (and respect, which is solely missing now, after this vandalism) for Oracle if they preserved the website. No they discarded everything mercilessly.

Documentation for Oracle Grid Engine which is now abandonware might also floating around.

For more information see 

See also SGE Documentation


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Jan 29, 2021] I just noticed that in Sept 2020 Univa was bought by Altair

Jan 29, 2021 | finance.yahoo.com

Altair Acquires Univa

So it now owns the only commercial SGE offering along with PBSpro. Univa Grid Engine will now be referred to as Altair Grid Engine.

Altair will continue to invest in Univa's technology to support existing customers while integrating with Altair's HPC and data analytics solutions. These efforts will further enhance the capability and performance requirements for all Altair customers and solidify the company's leadership in workload management and cloud enablement for HPC. Univa has two flagship products:

· Univa ® Grid Engine ® is a leading distributed resource management system to optimize workloads and resources in thousands of data centers, improving return-on-investment and delivering better results faster.

[Dec 16, 2018] Index of -downloads-SGE-releases-8.1.9

Dec 16, 2018 | liv.ac.uk

Index of /downloads/SGE/releases/8.1.9

[ICO] Name Last modified Size Description

[PARENTDIR] Parent Directory -
[TXT] README.txt 2016-02-29 23:39 1.5K
[ ] arco-8.1.6.tar.gz 2013-11-04 18:03 1.0M
[ ] arco-8.1.6.tar.gz.sig 2013-11-04 18:03 287
[ ] dbwriter-8.1.6.tar.gz 2013-11-04 18:03 3.8M
[ ] dbwriter-8.1.6.tar.gz.sig 2013-11-04 18:03 287
[ ] debian.tar.gz 2016-02-28 20:29 30K
[ ] gridengine-8.1.9-1.el5.src.rpm 2016-02-28 19:54 18M
[ ] gridengine-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 19M
[ ] gridengine-8.1.9-1.el6.x86_64.rpm 2016-03-01 14:25 16M
[ ] gridengine-debuginfo-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 38M
[ ] gridengine-debuginfo-8.1.9-1.el6.x86_64.rpm 2016-03-01 14:25 35M
[ ] gridengine-devel-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 1.5M
[ ] gridengine-devel-8.1.9-1.el6.noarch.rpm 2016-03-01 14:25 1.2M
[ ] gridengine-drmaa4ruby-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 13K
[ ] gridengine-drmaa4ruby-8.1.9-1.el6.noarch.rpm 2016-03-01 14:25 13K
[ ] gridengine-execd-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 1.6M
[ ] gridengine-execd-8.1.9-1.el6.x86_64.rpm 2016-03-01 14:25 1.4M
[ ] gridengine-guiinst-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 688K
[ ] gridengine-guiinst-8.1.9-1.el6.noarch.rpm 2016-03-01 14:25 696K
[ ] gridengine-qmaster-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 1.7M
[ ] gridengine-qmaster-8.1.9-1.el6.x86_64.rpm 2016-03-01 14:25 1.5M
[ ] gridengine-qmon-8.1.9-1.el5.x86_64.rpm 2016-02-28 19:54 1.5M
[ ] gridengine-qmon-8.1.9-1.el6.x86_64.rpm 2016-03-01 14:25 1.4M
[ ] sge-8.1.9.tar.gz 2016-02-28 19:55 11M
[ ] sge-8.1.9.tar.gz.sig 2016-02-28 19:55 287
[ ] sge-common_8.1.9_all.deb 2016-02-28 20:20 1.4M
[ ] sge-dbg_8.1.9_amd64.deb 2016-02-28 20:20 17M
[ ] sge-doc_8.1.9_all.deb 2016-02-28 20:20 916K
[ ] sge_8.1.9.dsc 2016-02-28 20:20 1.5K
[ ] sge_8.1.9.tar.xz 2016-02-28 20:20 8.2M
[ ] sge_8.1.9_amd64.deb 2016-02-28 20:20 8.4M

This is Son of Grid Engine version v8.1.9.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on
recent changes.  See <https://arc.liv.ac.uk/trac/SGE> for more
information.

The .deb and .rpm packages and the source tarball are signed with PGP
key B5AEEEA9.

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See <https://copr.fedorainfracloud.org/coprs/loveshack/SGE/> for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage>, and see
    <http://arc.liv.ac.uk/downloads/SGE/support/> if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

More RPMs (unsigned, unfortunately) are available at
<http://copr.fedoraproject.org/coprs/loveshack/SGE/>.

[Dec 16, 2018] GitHub - gawbul-docker-sge Dockerfile to build a container with SGE installed

Dec 16, 2018 | github.com

docker-sge

Dockerfile to build a container with SGE installed.

To build type:

git clone [email protected]:gawbul/docker-sge.git
cd docker-sge
docker build -t gawbul/docker-sge .

To pull from the Docker Hub type:

docker pull gawbul/docker-sge

To run the image in a container type:

docker run -it --rm gawbul/docker-sge login -f sgeadmin

You need the login -f sgeadmin as root isn't allowed to submit jobs

To submit a job run:

echo "echo Running test from $HOSTNAME" | qsub

[Dec 16, 2018] wtakase-sge-master - Docker Hub

Dec 16, 2018 | hub.docker.com

Docker SGE (Son of Grid Engine) Kubernetes All-in-One Usage

  1. Setup Kubernetes cluster, DNS service, and SGE cluster

    Set KUBE_SERVER , DNS_DOMAIN , and DNS_SERVER_IP currectly. And run ./kubernetes/setup_all.sh with number of SGE workers.

    export KUBE_SERVER=xxx.xxx.xxx.xxx
    export DNS_DOMAIN=xxxx.xxxx
    export DNS_SERVER_IP=xxx.xxx.xxx.xxx
    ./kubernetes/setup_all.sh 20
    
  2. Submit Job
    kubectl exec sgemaster -- sudo su sgeuser bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
    kubectl exec sgemaster -- sudo su sgeuser bash -c 'cat /home/sgeuser/STDIN.o1'
    
  3. Add SGE workers
    ./kubernetes/add_sge_workers.sh 10
    
Kubernetes Step-by-Step Usage
  1. Setup Kubernetes cluster
    ./kubernetes/setup_k8s.sh
    
  2. Setup DNS service

    Set KUBE_SERVER , DNS_DOMAIN , and DNS_SERVER_IP currectly

    export KUBE_SERVER=xxx.xxx.xxx.xxx
    export DNS_DOMAIN=xxxx.xxxx
    export DNS_SERVER_IP=xxx.xxx.xxx.xxx
    ./kubernetes/setup_dns.sh
    
  3. Check DNS service
    • Boot test client
    kubectl create -f ./kubernetes/skydns/busybox.yaml
    
    • Check normal lookup
    kubectl exec busybox -- nslookup kubernetes
    
    • Check reverse lookup
      kubectl exec busybox -- nslookup 10.0.0.1
      
    • Check pod name lookup
    kubectl exec busybox -- nslookup busybox.default
    
  4. Setup SGE cluster

    Run ./kubernetes/setup_sge.sh with number of SGE workers.

    ./kubernetes/setup_sge.sh 10
    
  5. Submit job
    kubectl exec sgemaster -- sudo su sgeuser bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
    kubectl exec sgemaster -- sudo su sgeuser bash -c 'cat /home/sgeuser/STDIN.o1'
    
  6. Add SGE workers
    ./kubernetes/add_sge_workers.sh 10
    
Simple Docker Command Usage
  1. Load nfsd module
    modprobe nfsd
    
  2. Boot DNS server
    docker run -d --hostname resolvable -v /var/run/docker.sock:/tmp/docker.sock -v /etc/resolv.conf:/tmp/resolv.conf mgood/resolvable
    
  3. Boot NFS servers
    docker run -d --name nfshome --privileged cpuguy83/nfs-server /exports
    docker run -d --name nfsopt --privileged cpuguy83/nfs-server /exports
    
  4. Boot SGE master
    docker run -d -h sgemaster --name sgemaster --privileged --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-master:ubuntu
    
  5. Boot SGE workers
    docker run -d -h sgeworker01 --name sgeworker01 --privileged --link sgemaster:sgemaster --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-worker:ubuntu
    docker run -d -h sgeworker02 --name sgeworker02 --privileged --link sgemaster:sgemaster --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-worker:ubuntu
    
  6. Submit job
    docker exec -u sgeuser -it sgemaster bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
    docker exec -u sgeuser -it sgemaster cat /home/sgeuser/STDIN.o1
    

[Nov 08, 2018] SGE Installation on Centos 7

Nov 08, 2018 | liv.ac.uk

I installed SGE on Centos 7 back in January this year. If my recolection is correct, the procedure was analogous to the instructions for Centos 6. There were some issues with the firewalld service (make sure that it is not blocking SGE), as well as some issues with SSL.
Check out these threads for reference:

http://arc.liv.ac.uk/pipermail/sge-discuss/2017-January/001050.html

Max

[Sep 07, 2018] Experiences with Sun Grid Engine

Notable quotes:
"... are important ..."
Sep 07, 2018 | auckland.ac.nz

Experiences with Sun Grid Engine

In October 2007 I updated the Sun Grid Engine installed here at the Department of Statistics and publicised its presence and how it can be used. We have a number of computation hosts (some using Māori fish names as fish are often fast) and a number of users who wish to use the computation power. Matching users to machines has always been somewhat problematic.

Fortunately for us, SGE automatically finds a machine to run compute jobs on . When you submit your job you can define certain characteristics, eg, the genetics people like to have at least 2GB of real free RAM per job, so SGE finds you a machine with that much free memory. All problems solved!

Let's find out how to submit jobs ! (The installation and administration section probably won't interest you much.)

I gave a talk on 19 February 2008-02-19 to the Department, giving a quick overview of the need for the grid and how to rearrange tasks to better make use of parallelism.

Installation

My installation isn't as polished as Werner's setup, but it comes with more carrots and sticks and informational emails to heavy users of computing resources.

For this very simple setup I first selected a master host, stat1. This is also the submit host. The documentation explains how to go about setting up a master host.

Installation for the master involved:

  1. Setting up a configuration file, based on the default configuration.
  2. Uncompressing the common and architecture-specific binaries into /opt/sge
  3. Running the installation. (Correcting mistakes, running again.)
  4. Success!

With the master setup I was ready to add compute hosts. This procedure was repeated for each host. (Thankfully a quick for loop in bash with an ssh command made this step very easy.)

  1. Login to the host
  2. Create /opt/sge .
  3. Uncompress the common and architecture-specific binaries into /opt/sge
  4. Copy across the cluster configuration from /opt/sge/default/common . (I'm not so sure on this step, but I get strange errors if I don't do this.)
  5. Add the host to the cluster. (Run qhost on the master.)
  6. Run the installation, using the configuration file from step 1 of the master. (Correcting mistakes, running again. Mistakes are hidden in /tmp/install_execd.* until the installation finishes. There's a problem where if /opt/sge/default/common/install_logs is not writeable by the user running the installation then it will be silently failing and retrying in the background. Installation is pretty much instantaneous, unless it's failing silently.)
    • As a sub-note, you receive architecture errors on Fedora Core. You can fix this by editing /opt/sge/util/arch and changing line 248 that reads 3|4|5) to 3|4|5|6) .
  7. Success!

If you are now to run qhost on some host, eg, the master, you will now see all your hosts sitting waiting for instructions.

Administration

The fastest way to check if the Grid is working is to run qhost , which lists all the hosts in the Grid and their status. If you're seeing hyphens it means that host has disappeared. Is the daemon stopped, or has someone killed the machine?

The glossiest way to keep things up to date is to use qmon . I have it listed as an application in X11.app on my Mac. The application command is as follows. Change 'master' to the hostname of the Grid master. I hope you have SSH keys already setup.

ssh master -Y . /opt/sge/default/common/settings.sh \; qmon

Want to gloat about how many CPUs you have in your cluster? (Does not work with machines that have > 100 CPU cores.)

admin@master:~$ qhost | sed -e 's/^.\{35\}[^0-9]\+//' | cut -d" " -f1

Adding Administrators

SGE will probably run under a user you created it known as "sgeadmin". "root" does not automatically become all powerful in the Grid's eyes, so you probably want to add your usual user account as a Manager or Operator. (Have a look in the manual for how to do this.) It will make your life a lot easier.

Automatically sourcing environment

Normally you have to manually source the environment variables, eg, SGE_ROOT, that make things work. On your submit hosts you can have this setup to be done automatically for you.

Create links from /etc/profile.d to the settings files in /opt/sge/default/common and they'll be automatically sourced for bash and tcsh (at least on Redhat).

Slots

The fastest processing you'll do is when you have one CPU core working on one problem. This is how the Grid is setup by default. Each CPU core on the Grid is a slot into which a job can be put.

If you have people logging on to the machines and checking their email, or being naughty and running jobs by hand instead of via the Grid engine, these calculations get mucked up. Yes, there still is a slot there, but it is competing with something being run locally. The Grid finds a machine with a free slot and the lowest load for when it runs your job so this won't be a problem until the Grid is heavily laden.

Setting up queues

Queues are useful for doing crude prioritisation. Typically a job gets put in the default queue and when a slot becomes free it runs.

If the user has access to more than one queue, and there is a free slot in that queue, then the job gets bumped into that slot.

A queue instance is the queue on a host that it can be run on. 10 hosts, 3 queues = 30 queue instances. In the below example you can see three queues and seven queue instances : all.q@paikea, dnetc.q@paikea, beagle.q@paikea, all.q@exec1, dnetc.q@exec1, all.q@exec2, dnetc.q@exec2. Each queue can have a list of machines it runs on so, for example, the heavy genetics work in beagle.q can be run only on the machines attached to the SAN holding the genetics data. A queue does not have to include all hosts, ie, @allhosts.)

Diagram to explain hosts, queues, and slots

From this diagram you can see how CPUs can become oversubscribed. all.q covers every CPU. dnetc.q covers some of those CPUs a second time. Uh-oh! (dnetc.q is setup to use one slot per queue instance. That means that even if there are 10 CPUs on a given host, it will only use 1 of those.) This is something to consider when setting up queues and giving users access to them. Users can't put jobs into queues they don't have access to, so the only people causing contention are those with access to multiple queues but don't specify a queue ( -q ) when submitting.

Another use for queues are subordinate queues . I run low priority jobs in dnetc.q. When the main queue gets busy, all the jobs in dnetc.q are suspended until the main queue's load decreases. To do this I edited all.q, and under Subordinates added dnetc.q.

So far the shortest queue I've managed to make is one that uses 1 slot on each host it is allowed to run on. There is some talk in the documentation regarding user defined resources ( complexes ) which, much like licenses, can be "consumed" by jobs, thus limiting the number of concurrent jobs that can be run. (This may be useful for running an instance of Folding@Home, as it is not thread-safe , so you can set it up with a single "license".)

You can also change the default nice value of processes, but possibly the most useful setting is to turn on "rerunnable", which allows a task to be killed and run again on a different host.

Parallel Environment

Something that works better than queues and slots is to set up a parallel environment . This can have a limited number of slots which counts over the entire grid and over every queue instance. As an example, Folding@Home is not thread safe. Each running thread needs its own work directory.

How can you avoid contention in this case? Make each working directory a parallel environment, and limit the number of slots to 1.

I have four working directories named fah-a to fah-d . Each contains its own installation of the Folding@Home client:

$ ls ~/grid/fah-a/
fah-a
client.cfg
FAH504-Linux.exe
work

For each of these directories I have created a parallel environment:

admin@master:~$ qconf -sp fah-a
pe_name           fah-a
slots             1
user_lists        fah

These parallel environments are made available to all queues that the job can be run in and all users that have access to the working directory - which is just me.

The script to run the client is a marvel of grid arguments. It requests the parallel environment, bills the job to the Folding@Home project, names the project, etc. See for yourself:

#!/bin/sh
# use bash
#$ -S /bin/sh
# current directory
#$ -cwd
# merge output
#$ -j y
# mail at end
#$ -m e
# project
#$ -P fah
# name in queue
#$ -N fah-a
# parallel environment
#$ -pe fah-a 1
./FAH504-Linux.exe -oneunit

Note the -pe argument that says this job requires one slot worth of fah-a please.

Not a grid option, but the -oneunit flag for the folding client is important as this causes the job to quit after one work unit and the next work unit can be shuffled around to an appropriate host with a low load whose queue isn't disabled. Otherwise the client could end up running in a disabled queue for a month without nearing an end.

With the grid taking care of the parallel environment I no longer need to worry about manually setting up job holds so that I can enqueue multiple units for the same work directory. -t 1-20 ahoy!

Complex Configuration

An alternative to the parallel environment is to use a Complex. You create a new complex, say how many slots are available, and then let people consume them!

  1. In the QMON Complex Configuration, add a complex called "fah_l", type INT, relation <=, requestable YES, consumable YES, default 0. Add, then Commit.
  2. I can't manage to get this through QMON, so I do it from the command line. qconf -me global and then add fah_l=1 to the complex_values.
  3. Again through the command line. qconf -mq all.q and then add fah_l=1 to the complex_values. Change this value for the other queues. (Note that a value of 0 means jobs requesting this complex cannot be run in this queue.)
  4. When starting a job, add -l fah_l=1 to the requirements.

I had a problem to start off with, where qstat was telling me that -25 licenses were available. However this is due to the default value, so make sure that is 0!

Using Complexes I have set up license handling for Matlab and Splus .

As one host group does not have Splus installed on them I simply set that host group to have 0 Splus licenses available. A license will never be available on the @gradroom host group, thus Splus jobs will never be queued there.

Quotas

Instead of Complexes and parallel environments, you could try a quota!

Please excuse the short details:

admin@master$ qconf -srqsl
admin@master$ qconf -mrqs lm2007_slots
{
   name         lm2007_slots
   description  Limit the lm2007 project to 20 slots across the grid
   enabled      TRUE
   limit        projects lm2007 to slots=20
}

Pending jobs

Want to know why a job isn't running?

  1. Job Control
  2. Pending Jobs
  3. Select a job
  4. Why ?

This is the same as qstat -f , shown at the bottom of this page.

Using Calendars

A calendar is a list of days and times along with states: off or suspended. Unless specified the state is on.

A queue, or even a single queue instance, can have a calendar attached to it. When the calendar says that the queue should now be "off" then the queue enters the disabled (D) state. Running jobs can continue, but no new jobs are started. If the calendar says it should be suspended then the queue enters the suspended (S) state and all currently running jobs are stopped (SIGSTOP).

First, create the calendar. We have an upgrade for paikea scheduled for 17 January:

admin@master$ qconf -scal paikeaupgrade
calendar_name    paikeaupgrade
year             17.1.2008=off
week             NONE

By the time we get around to opening up paikea's case and pull out the memory jobs will have had several hours to complete after the queue is disabled. Now, we have to apply this calendar to every queue instance on this host. You can do this all through qmon but I'm doing it from the command line because I can. Simply edit the calendar line to append the hostname and calendar name:

admin@master$ qconf -mq all.q
...
calendar              NONE,[paikea=paikeaupgrade]
...

Repeat this for all the queues.

There is a user who likes to use one particular machine and doesn't like jobs running while he's at the console. Looking at the usage graphs I've found out when he is using the machine and created a calendar based on this:

admin@master$ qconf -scal michael
calendar_name    michael
year             NONE
week             mon-sat=13-21=off

This calendar is obviously recurring weekly. As in the above example it was applied to queues on his machine. Note that the end time is 21, which covers the period from 2100 to 2159.

Suspending jobs automatically

Due to the number of slots being equal to the number of processors, system load is theoretically not going to exceed 1.00 (when divided by the number of processors). This value can be found in the np_load_* complexes .

But (and this is a big butt) there are a number of ways in which the load could go past a reasonable level:

For example, with paikea , there are three queues:

  1. all.q (4 slots)
  2. paikea.q (4 slots)
  3. beagle.q (overlapping with the other two queues)

all.q is filled first, then paikea.q. beagle.q, by project and owner restrictions, is only available to the sponsor of the hardware. When their jobs come in, they can get put into beagle.q, even if the other slots are full. When the load average comes up, other tasks get suspended: first in paikea.q, then in all.q.

Let's see the configuration:

qname                 beagle.q
hostlist              paikea.stat.auckland.ac.nz
priority              19,[paikea.stat.auckland.ac.nz=15]
user_lists            beagle
projects              beagle

We have the limited access to this queue through both user lists and projects. Also, we're setting the Unix process priority to be higher than the other queues.

qname                 paikea.q
hostlist              paikea.stat.auckland.ac.nz
suspend_thresholds    NONE,[paikea.stat.auckland.ac.nz=np_load_short=1.01]
nsuspend              1
suspend_interval      00:05:00
slots                 0,[paikea.stat.auckland.ac.nz=4]

The magic here being that suspend_thresholds is set to 1.01 for np_load_short. This is checked every 5 minutes, and 1 process is suspended at a time. This value can be adjusted to get what you want, but it seems to be doing the trick according to graphs and monitoring the load. np_load_short is chosen because it updates the most frequently (every minute), more than np_load_medium (every five), and np_load_long (every fifteen minutes).

all.q is fairly unremarkable. It just defines four slots on paikea.

Submitting jobs Jobs are submitted to the Grid using qsub . Jobs are shell scripts containing commands to be run.

If you would normally run your job by typing ./runjob , you can submit it to the Grid and have it run by typing: qsub -cwd ./runjob

Jobs can be submitted while logged on to any submit host: sge-submit.stat.auckland.ac.nz .

For all the commands on this page I'm going to assume the settings are all loaded and you are logged in to a submit host. If you've logged in to a submit host then they'll have been sourced for you. You can source the settings yourself if required: . /opt/sge/default/common/settings.sh - the dot and space at the front are important .

Depending on the form your job is currently in they can be very easy to submit. I'm just going to go ahead and assume you have a shell script that runs the CPU-intensive computations you want and spits them out to the screen. For example, this tiny test.sh :

#!/bin/sh
expr 3 + 5

This computation is very CPU intensive!

Please note that the Sun Grid Engine ignores the bang path at the top of the script and will simply run the file using the queue's default shell which is csh. If you want bash, then request it by adding the very cryptic line: #$ -S /bin/sh

Now, let's submit it to the grid for running: Skip submission output

user@submit:~$ qsub test.sh
Your job 464 ("test.sh") has been submitted
user@submit:~$ qstat
job-ID  prior   name       user         state submit/start at     queue                slots ja-task-ID 
-------------------------------------------------------------------------------------------------------
    464 0.00000 test.sh    user         qw    01/10/2008 10:48:03                          1

There goes our job, waiting in the queue to be run. We can run qstat a few more times to see it as it goes. It'll be run on some host somewhere, then disappear from the list once it is completed. You can find the output by looking in your home directory: Skip finding output

user@submit:~$ ls test.sh*
test.sh  test.sh.e464  test.sh.o464
user@submit:~$ cat test.sh.o464
8

The output file is named based on the name of the job, the letter o , and the number of the job.

If your job had problems running have a look in these files. They probably explain what went wrong.

Easiest way to submit R jobs

Here are two scripts and a symlink I created to make it easy as possible to submit R jobs to your Grid:

qsub-R

If you normally do something along the lines of:

user@exec:~$ nohup nice R CMD BATCH toodles.R

Now all you need to do is:

user@submit:~$ qsub-R toodles.R
Your job 3540 ("toodles.R") has been submitted

qsub-R is linked to submit-R, a script I wrote. It calls qsub and submits a simple shell wrapper with the R file as an argument. It ends up in the queue and eventually your output arrives in the current directory: toodles.R.o3540

Download it and install it. You'll need to make the ' qsub-R ' symlink to ' 3rd_party/uoa-dos/submit-R ' yourself, although there is one in the package already for lx24-x86: qsub-R.tar (10 KiB, tar)

Thrashing the Grid

Sometimes you just want to give something a good thrashing, right? Never experienced that? Maybe it's just me. Anyway, here are two ideas for submitting lots and lots of jobs:

  1. Write a script that creates jobs and submits them
  2. Submit the same thing a thousand times

There are merits to each of these methods, and both of them mimic typical operation of the grid, so I'm going to explain them both.

Computing every permutation

If you have two lists of values and wish to calculate every permutation, then this method will do the trick. There's a more complicated solution below .

qsub will happily pass on arguments you supply to the script when it runs. Let us modify our test.sh to take advantage of this:

#!/bin/sh
#$ -S /bin/sh
echo Factors $1 and $2
expr $1 + $2

Now, we just need to submit every permutation to Grid:

user@submit:~$ for A in 1 2 3 4 5 ; do for B in 1 2 3 4 5 ; do qsub test.sh $A $B ; done ; done

Away the jobs go to be computed. If we have a look at different jobs we can see that it works. For example, job 487 comes up with:

user@submit:~$ cat test.sh.?487
Factors 3 and 5
8

Right on, brother! That's the same answer as we got previously when we hard coded the values of 3 and 5 into the file. We have algorithm correctness!

If we use qacct to look up the job information we find that it was computed on host mako (shark) and used 1 units of wallclock and 0 units of CPU.

Computing every permutation, with R

This method of creating job scripts and running them will allow you to compute every permutation of two variables. Note that you can supply arguments to your script, so it is not actually necessary to over-engineer your solution quite this much. This script has the added advantage of not clobbering previous computations. I wrote this solution for Yannan Jiang and Chris Wild and posted it to the r-downunder mailing list in December 2007. ( There is another method of doing this! )

In this particular example the output of the R command is deterministic, so it does not matter that a previous run (which could have taken days of computing time) gets overwritten, however I also work around this problem.

To start with I have my simple template of R commands (template.R):

alpha <- ALPHA
beta <- c(BETA)
# magic happens here
alpha
beta

The ALPHA and BETA parameters change for each time this simulation is run. I have these values stored, one per line, in the files ALPHA and BETA.

ALPHA:

0.9
0.8
0.7

BETA (please note that these contents must work both in filenames, bash commands, and R commands):

0,0,1
0,1,0
1,0,0

I have a shell script that takes each combination of ALPHA x BETA, creates a .R file based on the template, and submits the job to the Grid. This is called submit.sh:

#!/bin/sh

if [ "X${SGE_ROOT}" == "X" ] ; then
         echo Run: . /opt/sge/default/common/settings.sh
         exit
fi

cat ALPHA | while read ALPHA ; do
         cat BETA | while read BETA ; do
                 FILE="t-${ALPHA}-${BETA}"

                 # create our R file
                 cat template.R | sed -e "s/ALPHA/${ALPHA}/" -e "s/BETA/${BETA}/" > ${FILE}.R

                 # create a script
                 echo \#!/bin/sh > ${FILE}.sh
                 echo \#$ -S /bin/sh >> ${FILE}.sh
                 echo "if [ -f ${FILE}.Rout ] ; then echo ERROR: output file exists already ; exit 5 ; fi" >> ${FILE}.sh
                 echo R CMD BATCH ${FILE}.R ${FILE}.Rout >> ${FILE}.sh
                 chmod +x ${FILE}.sh

                 # submit job to grid
                 qsub -j y -cwd ${FILE}.sh
         done
done

qstat

When this script runs it will, for each permutation of ALPHA and BETA,

  1. create an R file based on the template, filling in the values of ALPHA and BETA,
  2. create a script that checks if this permutation has been calculated and then calls R,
  3. submits this job to the queue

... and finally shows the jobs waiting in the queue to execute.

Once computation is complete you will have a lot of files waiting in your directory. You will have:

The output files, stderr and stdout from when R was run, are always empty (unless something goes terribly wrong). For each permutation we receive four files. There are nine permutations (n ALPHA = 3, n BETA = 3, 3 × 3 = 9). A total of 36 files are created. (This example has been pared down from the original for purposes of demonstration.)

My initial question to the r-downunder list was how to get the output from R to stdout and thus t-ALPHA-BETA.sh.oNNN instead of t-ALPHA-BETA.Rout, however in this particular case, I have dodged that. In fact, being deterministic it is better that this job writes its output to a known filename, so I can do a one line test to see if the job has already been run.

I should also point out the -cwd option to the qsub command, which causes the job to be run in the current directory (which if it is in your home directory is accessible in the same place on all machines), rather than in /tmp/* . This allows us to find the R output, since R writes it to the directory it is currently in. Otherwise it could be discarded as a temporary file once the job ends!

Submit the same thing a thousand times

Say you have a job that, for example, pulls in random numbers and runs a simulation, or it grabs a work unit from a server, computes it, then quits. ( FAH -oneunit springs to mind, although it cannot be run in parallel. Refer to the parallel environment setup .) The script is identical every time.

SGE sets the SGE_JOB_ID environment variable which tells you the job number. You can use this as some sort of crude method for generating a unique file name for your output. However, the best way is to write everything to standard output (stdout) and let the Grid take care of returning it to you.

There are also Array Jobs which are identical tasks being differentiated only by an index number , available through the -t option on qsub . This sets the environment variable of SGE_TASK_ID.

For this example I will be using the Distributed Sleep Server . The Distributed Sleep Project passes out work units, packages of time, to clients who then process the unit. The Distributed Sleep Client, dsleepc , connects to the server to fetch a work unit. They can then be processed using the sleep command. A sample script: Skip sample script

#!/bin/sh
#$ -S /bin/sh
WORKUNIT=`dsleepc`
sleep $WORKUNIT && echo Processed $WORKUNIT seconds

Work units of 300 seconds typically take about five minutes to complete, but are known to be slower on Windows. (The more adventurous can add the -bigunit option to get a larger package for themselves, but note that they take longer to process.)

So, let us submit an array job to the Grid. We are going to submit one job with 100 tasks, and they will be numbered 1 to 100:

user@submit:~$ qsub -t 1-100 dsleep
Your job-array 490.1-100:1 ("dsleep") has been submitted

Job 490, tasks 1 to 100, are waiting to run. Later we can come back and pick up our output from our home directory. You can also visit the Distributed Sleep Project and check the statistics server to see if your work units have been received.

Note that running 100 jobs will fill the default queue, all.q. This has two effects. First, if you have any other queues that you can access jobs will be added to those queues and then run. (As the current setup of queues overlaps with CPUs this can lead to over subscription of processing resources. This can cause jobs to be paused, depending on how the queue is setup.) Second, any subordinate queues to all.q will be put on hold until the jobs get freed up.

Array jobs, with R

Using the above method of submitting multiple jobs, we can access this and use it in our R script, as follows: Skip R script

# alpha+1 is found in the SGE TASK number (qsub -t)
alphaenv <- Sys.getenv("SGE_TASK_ID")
alpha <- (as.numeric(alphaenv)-1)

Here the value of alpha is being pulled from the task number. Some manipulation is done of it, first to turn it from a string into a number, and secondly to change it into the expected form. Task numbers run from 1+, but in this case the code wants them to run from 0+.

Similar can be done with Java, by adding the environment value as an argument to invocation of the main class.

Advanced methods of queue submission

When you submit your job you have a lot of flexibility over it. Here are some options to consider that may make your life easier. Remember you can always look in the man page for qsub for more options and explanations.

qsub -N timmy test.sh

Here the job is called "timmy" and runs the script test.sh . Your the output files will be in timmy.[oe]*

The working directory is usually somewhere in /tmp on the execution host. To use a different working directory, eg, the current directory, use -cwd

qsub -cwd test.sh

To request specific characteristics of the execution host, for example, sufficient memory, use the -l argument.

qsub -l mem_free=2500M test.sh

This above example requests 2500 megabytes (M = 1024x1024, m = 1000x1000) of free physical memory (mem_free) on the remote host. This means it won't be run on a machine that has 2.0GB of memory, and will instead be put onto a machine with sufficient amounts of memory for BEAGLE Genetic Analysis . There are two other options for ensuring you get enough memory:

If your binary is architecture dependent you can ask for a particular architecture.

qsub -l arch=lx24-amd64 test.bin

This can also be done in the script that calls the binary so you don't accidentally forget about including it.

#$ -l arch=lx24-amd64

This requesting of resources can also be used to ask for a specific host, which goes against the idea of using the Grid to alleviate finding a host to use! Don't do this!

qsub -l hostname=mako test.sh

If your job needs to be run multiple times then you can create an array job. You ask for a job to be run several times, and each run (or task) is given a unique task number which can be accessed through the environment variable SGE_TASK_ID. In each of these examples the script is run 50 times:

qsub -t 1-50 test.sh
qsub -t 75-125 test.sh

You can request a specific queue. Different queues have different characteristics.

qsub -q dnetc.q test.sh

A job can be held until a previous job completes. For example, this job will not run until job 380 completes:

qsub -hold_jid 380 test.sh

Can't figure out why your job isn't running? qstat can tell you:

qstat -j 490
... lots of output ...
scheduling info:            queue instance "[email protected]" dropped because it is temporarily not available
                            queue instance "[email protected]" dropped because it is full
                            cannot run in queue "all.q" because it is not contained in its hard queue list (-q)

Requesting licenses

Should you be using software that requires licenses then you should specify this when you submit the job. We have two licenses currently set up but can easily add more as requested:

The Grid engine will hold your job until a Splus license or Matlab license becomes available.

Note: The Grid engine keeps track of the license pool independently of the license manager. If someone is using a license that the Grid doesn't know about, eg, an interactive session you left running on your desktop, then the count will be off. Believing a license is available, the Grid will run your job, but Splus will not run and your job will end. Here is a job script that will detect this error and then allow your job to be retried later: Skip Splus script

#!/bin/sh
#$ -S /bin/bash
# run in current directory, merge output
#$ -cwd -j y
# name the job
#$ -N Splus-lic
# require a single Splus license please
#$ -l splus=1
Splus -headless < $1
RETVAL=$?
if [ $RETVAL == 1 ] ; then
        echo No license for Splus
        sleep 60
        exit 99
fi
if [ $RETVAL == 127 ] ; then
        echo Splus not installed on this host
        # you could try something like this:
        #qalter -l splus=1,h=!`hostname` $JOB_ID
        sleep 60
        exit 99
fi
exit $RETVAL

Please note that the script exits with code 99 to tell the Grid to reschedule this job (or task) later. Note also that the script, upon receiving the error, sleeps for a minute before exiting, thus slowing the loop of errors as the Grid continually reschedules the job until it runs successfully. Alternatively you can exit with error 100, which will cause the job to be held in the error (E) state until manually cleared to run again.

You can clear a job's error state by using qmod -c jobid .

Here's the same thing for Matlab. Only minor differences from running Splus: Skip Matlab script

#!/bin/sh
#$ -S /bin/sh
# run in current directory, merge output
#$ -cwd -j y
# name the job
#$ -N ml
# require a single Matlab license please
#$ -l matlab=1

matlab -nodisplay < $1

RETVAL=$?
if [ $RETVAL == 1 ] ; then
        echo No license for Matlab
        sleep 60
        exit 99
fi
if [ $RETVAL == 127 ] ; then
        echo Matlab not installed on this host, `hostname`
        # you could try something like this:
        #qalter -l matlab=1,h=!`hostname` $JOB_ID
        sleep 60
        exit 99
fi
exit $RETVAL

Save this as "run-matlab". To run your matlab.m file, submit with: qsub run-matlab matlab.m

Processing partial parts of input files in Java

Here is some code I wrote for Lyndon Walker to process a partial dataset in Java.

It comes with two parts: a job script that passes the correct arguments to Java, and some Java code that extracts the correct information from the dataset for processing.

First, the job script gives some Grid task environment variables to Java. Our job script is merely translating from the Grid to the simulation:

java Simulation $@ $SGE_TASK_ID $SGE_TASK_LAST

This does assume your shell is bash, not csh. If your job is in 10 tasks, then SGE_TASK_ID will be a number between 1 and 10, and SGE_TASK_LAST will be 10. I'm also assuming that you are starting your jobs from 1, but you can also change that setting and examine SGE_TASK_FIRST.

Within Java we now read these variables and act upon them:

sge_task_id   = Integer.parseInt(args[args.length-2]);
sge_task_last = Integer.parseInt(args[args.length-1]);

For a more complete code listing, refer to sun-grid-qsub-java-partial.java (Simulation.java).

Preparing confidential datasets

The Grid setup here includes machines on which users can login. That creates the problem where someone might be able to snag a confidential dataset that is undergoing processing. One particular way to keep the files secure is as follows:

A script that does this would look like the following: Skip dataset preparation script

#!/bin/sh
#$ -S /bin/sh

DATASET=confidential.csv

# check our environment
umask 0077
cd ${TMPDIR}
chmod 0700 .

# find srm
SRM=`which srm`
NOSRM=$?
if [ $NOSRM -eq 1 ] ; then
        echo system srm not found on this host, exiting >> /dev/stderr 
        exit 99
fi

# copy files from data store
RETRIES=0
while [ ${RETRIES} -lt 5 ] ; do
        ((RETRIES++))
        scp user@filestore:/store/confidential/${DATASET} .
        if [ $? -eq 0 ] ; then
                RETRIES=5000
        else
                # wait for up to a minute (MaxStartups 10 by default)
                sleep `expr ${RANDOM} / 542`
        fi
done
if [ ! -f ${DATASET} ] ; then
        # unable to copy dataset after 5 retries, quit but retry later
        echo unable to copy dataset from store >> /dev/stderr
        exit 99
fi
# if you were decrypting the dataset, you would do that here

# copy our code over too
cp /mount/code/*.class .

# process data
java Simulation ${DATASET}

# collect results
# (We are just printing to the screen.)

# clean up
${SRM} -v ${DATASET} >> /dev/stderr

echo END >> /dev/stderr

Code will need to be adjusted to match your particular requirements, but the basic form is sketched out above.

As the confidential data is only in files and directories that root and the running user can access, and the same precaution is taken with the datastore, then only the system administrator and the user who has the dataset has access to these files.

The one problem here is how to manage the password-less scp securely. As this is run unattended, it would not be possible to have a password on a file, nor to forward authentication to some local agent. It may be possible to grab the packets that make up the key material. There must be a better way to do this. Remember that the job script is stored world-readable in the Grid cell's spool, so nothing secret can be put in there either.

Talk at Department Retreat

I gave a talk about the Sun Grid Engine on 19 February 2008-02-19 to the Department, giving a quick overview of the need for the grid and how to rearrange tasks to better make use of parallelism. It was aimed at end users and summarises into neat slides the reason for using the grid engine as well as a tutorial and example on how to use it all.

Download: Talk (with notes) PDF 5.9MiB

Question time afterwards was very good. Here are, as I recall them, the questions and answers.

Which jobs are better suited to parallelism?

Q (Ross Ihaka): Which jobs are better suited to parallelism? (Jobs with large data sets do not lend themselves to this sort of parallelism due to I/O overheads.)

A: Most of the jobs being used here are CPU intensive. The grid copies your script to /tmp on the local machine on which it runs. You could copy your data file across as well at the start of the job, thus all your later I/O is local.

(This is a bit of a poor answer. I wasn't really expecting it.) Bayesian priors and multiple identical simulations (eg, MCMC differing only by random numbers) lend themselves well to being parallelised.

Can I make sure I always run on the fastest machine?

A: The grid finds the machine with the least load to run jobs on. If you pile all jobs onto one host, then that host will slow down and become the slowest overall. Submit it through the grid and some days you'll get the fast host, and some days you'll get the slow host, and it is better in the long run. Also it is fair for other users. You can force it with -l, however, it is selfish.

Preemptable queues?

Q (Nicholas Horton): Is there support for preemptable queues? A person who paid for a certain machine might like it to be available only to them when they require it all for themselves.

A: Yes, the Grid has support for queues like that. It can all be configured. This particular example will have to be looked in to further. Beagle.q, as an example, only runs on paikea and overlaps with all.q . Also when the load on paikea , again using that as an example, gets too high, jobs in a certain queue (dnetc.q) are stopped.

An updated answer: the owner of a host can have an exclusive queue that preempts the other queues on the host. When the system load is too high, less important jobs can be suspended using suspend_thresholds .

Is my desktop an execution host?

Q (Ross Ihaka): Did I see my desktop listed earlier?

A: No. So far the grid is only running on the servers in the basement and the desktops in the grad room. Desktops in staff offices and used by PhD candidates will have to opt in.

(Ross Ihaka) Offering your desktop to run as an execution host increases the total speed of the grid, but your desktop may run slower at times. It is a two way street.

Is there job migration?

A: It's crude, and depends on your job. If something goes wrong (eg, the server crashes, power goes out) your job can be restarted on another host. When queue instances become unavailable (eg, we're upgrading paikea) they can send a signal to your job, telling it to save its work and quit, then can be restarted on another host.

Migration to faster hosts

Q (Chris Wild): What happens if a faster host becomes available while my job is running?

A: Nothing. Your job will continue running on the host it is on until it ends. If a host is overloaded, and not due to the grid's fault, some jobs can be suspended until load decreases . The grid isn't migrating jobs. The best method is to break your job down into smaller jobs, so that when the next part of the job is started it gets put onto what is currently the best available host.

Over sufficient jobs it will become apparent that the faster host is processing more jobs than a slower host.

Desktops and calendars

Q (Stephane Guindon): What about when I'm not at my desktop. Can I have my machine be on the grid then, and when I get to the desktop the jobs are migrated?

A: Yes, we can set up calendars so that at certain times no new jobs will be started on your machine. Jobs that are already running will continue until they end. (Disabling the queue.) Since some jobs run for days this can appear to have no influence on how many jobs are running. Alternatively jobs can be paused, which frees up the CPU, but leaves the job sitting almost in limbo. (Suspending the queue.) Remember the grid isn't doing migration. It can stop your job and run it elsewhere (if you're using the -notify option on submission and handling the USR1 signal).

Jobs under the grid

Q (Sharon Browning): How can I tell if a job is running under the grid's control? It doesn't show this under top .

A: Try ps auxf . You will see the job taking a lot of CPU time, the parent script, and above that the grid (sge_shepherd and sge_execd).

Talk for Department Seminar

On September 11 I gave a talk to the Department covering:

Download slides with extensive notes: Supercomputing and You (PDF 3MiB)

A range of good questions:

Summary

In summary, I heartily recommend the Sun Grid Engine. After a few days installation, configuring, messing around, I am very impressed with what can be done with it.

Try it today.

[Aug 17, 2018] Rocks 7.0 Manzanita (CentOS 7.4)

Aug 17, 2018 | www.rocksclusters.org

Operating System Base

Rocks 7.0 (Manzanita) x86_64 is based upon CentOS 7.4 with all updates available as of 1 Dec 2017.

Building a bare-bones compute cluster Building a more complex cluster

In addition to above, select the following rolls:

  • area51
  • fingerprint
  • ganglia
  • kvm (used for virtualization)
  • hpc
  • htcondor (used independently or in conjunction with sge)
  • perl
  • python
  • sge
  • zfs-linux (used to build reliable storage systems)
Building Custom Clusters

If you wish to build a custom cluster, you must choose from our a la carte selection, but make sure to download the required base , kernel and both CentOS rolls. The CentOS rolls include CentOS 7.4 w/updates pre-applied. Most users will want the full updated OS so that other software can be added.

MD5 Checksums

Please double check the MD5 checksums for all the rolls you download.

Downloads

All ISOs are available for downloads from here . Individual links are listed below.

Name Description Name Description
kernel Rocks Bootable Kernel Roll required zfs-linux ZFS On Linux Roll. Build and Manage Multi Terabyte File Systems.
base Rocks Base Roll required fingerprint Fingerprint application dependencies
core Core Roll required hpc Rocks HPC Roll
CentOS CentOS Roll required htcondor HTCondor High Throughput Computing (version 8.2.8)
Updates-CentOS CentOS Updates Roll required sge Sun Grid Engine (Open Grid Scheduler) job queueing system
kvm Support for building KVM VMs on cluster nodes perl Support for Newer Version of Perl
ganglia Cluster monitoring system from UCB python Python 2.7 and Python 3.x
area51 System security related services and utilities openvswitch Rocks integration of OpenVswitch

[Aug 17, 2018] Installation of Son of Grid Engine(SGE) on CentOS7 by byeon iksu

Oct 15, 2017 | biohpc.blogspot.com

Installation of Son of Grid Engine(SGE) on CentOS7

SGE Master installation

master# hostnamectl set-hostname qmaster.local

master# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01

master# mkdir -p /BiO/src
master# yum -y install epel-release
master# yum -y install jemalloc-devel openssl-devel ncurses-devel pam-devel libXmu-devel hwloc-devel hwloc hwloc-libs java-devel javacc ant-junit libdb-devel motif-devel csh ksh xterm db4-utils perl-XML-Simple perl-Env xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi
master# groupadd -g 490 sgeadmin
master# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin
master# visudo
%sgeadmin ALL=(ALL) NOPASSWD: ALL
master# cd /BiO/src
master# wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz
master# tar zxvfp sge-8.1.9.tar.gz
master# cd sge-8.1.9/source/
master# sh scripts/bootstrap.sh && ./aimk && ./aimk -man
master# export SGE_ROOT=/BiO/gridengine && mkdir $SGE_ROOT
master# echo Y | ./scripts/distinst -local -allall -libs -noexit
master# chown -R sgeadmin.sgeadmin /BiO/gridengine

master# cd $SGE_ROOT
master# ./install_qmaster
press enter at the intro screen
press "y" and then specify sgeadmin as the user id
leave the install dir as /BiO/gridengine
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_qmaster info
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_execd info
leave the cell name as "default"
Enter an appropriate cluster name when requested
leave the spool dir as is
press "n" for no windows hosts!
press "y" (permissions are set correctly)
press "y" for all hosts in one domain
If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point!
press enter to accept the directory creation notification
enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters)
press enter to accept the next notice
enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs)
accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT
enter an email address that will be sent problem reports
press "n" to refuse to change the parameters you have just configured
press enter to accept the next notice
press "y" to install the startup scripts
press enter twice to confirm the following messages
press "n" for a file with a list of hosts
enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts)
skip shadow hosts for now (press "n")
choose "1" for normal configuration and agree with "y"
press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer

master# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
master# qconf -ah compute01.local
compute01.local added to administrative host list

master# yum -y install nfs-utils
master# vi /etc/exports
/BiO 192.168.56.0/24(rw,no_root_squash)

master# systemctl start rpcbind nfs-server
master# systemctl enable rpcbind nfs-server

SGE Client installation

compute01# yum -y install hwloc-devel
compute01# hostnamectl set-hostname compute01.local
compute01# vi /etc/hosts
192.168.56.101 qmaster.local qmaster
192.168.56.102 compute01.local compute01

compute01# groupadd -g 490 sgeadmin
compute01# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin

compute01# yum -y install nfs-utils
compute01# systemctl start rpcbind
compute01# systemctl enable rpcbind
compute01# mkdir /BiO
compute01# mount -t nfs 192.168.56.101:/BiO /BiO
compute01# vi /etc/fstab
192.168.56.101:/BiO /BiO nfs defaults 0 0

compute01# export SGE_ROOT=/BiO/gridengine
compute01# export SGE_CELL=default
compute01# cd $SGE_ROOT
compute01# ./install_execd
compute01# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/

[Apr 25, 2018] GridEngine cannot be installed on CentOS7

Apr 25, 2018 | github.com

nicoulaj commented on Dec 1 2016

FYI, I got a working version with SGE on CentOS 7 on my linked branch.

This is quick and dirty because I need it working right now, there are several issues:

[Apr 24, 2018] SGE Installation on Centos 7

Apr 24, 2018 | liv.ac.uk

From: JuanEsteban.Jimenez at mdc-berlin.de [mailto: JuanEsteban.Jimenez at mdc-berlin.de ]
Sent: 27 April 2017 03:54 PM
To: yasir at orionsolutions.co.in ; 'Maximilian Friedersdorff'; sge-discuss at liverpool.ac.uk
Subject: Re: [SGE-discuss] SGE Installation on Centos 7

I am running SGE on nodes with both 7.1 and 7.3. Works fine on both.

Just make sure that if you are using Active Directory/Kerberos for authentication and authorization, your DC's are capable of handling a lot of traffic/requests. If not, things like DRMAA will uncover any shortcomings.

Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800

====================

I installed SGE on Centos 7 back in January this year. If my recolection is correct, the procedure was analogous to the instructions for Centos 6. There were some issues with the firewalld service (make sure that it is not blocking SGE), as well as some issues with SSL.
Check out these threads for reference:

http://arc.liv.ac.uk/pipermail/sge-discuss/2017-January/001047.html
http://arc.liv.ac.uk/pipermail/sge-discuss/2017-January/001050.html

[May 08, 2017] Sample SGE scripts

May 08, 2017 | ctbp.ucsd.edu
  1. An example of simple APBS serial job.
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N serial_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    
    /soft/linux/pkg/apbs/bin/apbs inputfile >& outputfile
    
    
  2. An example script for running executable a.out in parallel on 8 CPUs. (Note: For your executable to run in parallel it must be compiled with parallel library like MPICH, LAM/MPI, PVM, etc.) This script shows file staging, i.e., using fast local filesystem /scratch on the compute node in order to eliminate speed bottlenecks.
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N parallel_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 8
    # requesting 10hrs wall clock time
    #$ -l h_rt=10:00:00
    #
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    set orig_dir=`pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    # copy input and support files to a temporary directory on compute node
    set temp_dir=/scratch/`whoami`.$$
    mkdir $temp_dir
    cp input_file support_file $temp_dir
    cd $temp_dir
    
    /opt/mpich/intel/bin/mpirun -v -machinefile $TMPDIR/machines \
               -np $NSLOTS $HOME/a.out ./input_file >& output_file
    
    # copy files back and clean up
    cp * $orig_dir
    rm -rf $temp_dir
    
    
  3. An example of SGE script for Amber users (parallel run, 4 CPUs, with input file generated on the fly):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N amber_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 4
    # requesting 6hrs wall clock time
    #$ -l h_rt=6:00:00
    #
    setenv MPI_MAX_CLUSTER_SIZE 2
    
    # export all environment variables to SGE 
    #$ -V
    
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    set in=./mdin
    set out=./mdout
    set crd=./inpcrd.equil
    
    cat <<eof > $in
     short md, nve ensemble
     &cntrl
       ntx=7, irest=1,
       ntc=2, ntf=2, tol=0.0000001,
       nstlim=1000,
       ntpr=10, ntwr=10000,
       dt=0.001, vlimit=10.0,
       cut=9.,
       ntt=0, temp0=300.,
     &end
     &ewald
      a=62.23, b=62.23, c=62.23,
      nfft1=64,nfft2=64,nfft3=64,
      skinnb=2.,
     &end
    eof
    
    set sander=/soft/linux/pkg/amber8/exe.parallel/sander
    set mpirun=/opt/mpich/intel/bin/mpirun
    
    # needs prmtop and inpcrd.equil files
    
    $mpirun -v -machinefile $TMPDIR/machines -np $NSLOTS \
       $sander -O -i $in -c $crd -o $out < /dev/null
    
    /bin/rm -f $in restrt
    
    

    Please note that if you are running parallel amber8 you must include the following in your .cshrc :
    # Set P4_GLOBMEMSIZE environment variable used to reserve memory in bytes
    # for communication with shared memory on dual nodes
    # (optimum/minimum size may need experimentation)
    setenv P4_GLOBMEMSIZE 32000000
    
  4. An example of SGE script for APBS job (parallel run, 8 CPUs, running example input file which is included in APBS distribution (/soft/linux/src/apbs-0.3.1/examples/actin-dimer):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N apbs-PARALLEL
    #$ -e apbs-PARALLEL.errout
    #$ -o apbs-PARALLEL.errout
    #
    # requesting 8 processors
    #$ -pe mpi 8
    
    echo -n "Running on: "
    hostname
    
    setenv APBSBIN_PARALLEL /soft/linux/pkg/apbs/bin/apbs-icc-parallel
    setenv MPIRUN /opt/mpich/intel/bin/mpirun
    
    echo "Starting apbs-PARALLEL calculation ..."  
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np 8 \
        $APBSBIN_PARALLEL apbs-PARALLEL.in >& apbs-PARALLEL.out
    
    echo "Done."
    
    
  5. An example of SGE script for parallel CHARMM job (4 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N charmm-test
    #$ -e charmm-test.errout
    #$ -o charmm-test.errout
    #
    # requesting 4 processors
    #$ -pe mpi 4
    # requesting 2hrs wall clock time
    #$ -l h_rt=2:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    setenv CHARMM /soft/linux/pkg/c31a1/bin/charmm.parallel.092204
    setenv MPIRUN /soft/linux/pkg/mpich-1.2.6/intel/bin/mpirun
    
    echo "Starting CHARMM calculation (using $NSLOTS processors)"
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
        $CHARMM < mbcodyn.inp > mbcodyn.out
    
    echo "Done."
    
    
  6. An example of SGE script for parallel NAMD job (8 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N namd-job
    #$ -e namd-job.errout
    #$ -o namd-job.out
    #
    # requesting 8 processors
    #$ -pe mpi 8
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    /soft/linux/pkg/NAMD/namd2.sh namd_input_file > namd2.log
    
    echo "Done."
    
    
  7. An example of SGE script for parallel Gromacs job (4 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N gromacs-job
    #$ -e gromacs-job.errout
    #$ -o gromacs-job.out
    #
    # requesting 4 processors
    #$ -pe mpich 4
    # requesting 8hrs wall clock time
    #$ -l h_rt=8:00:00
    #
    
    echo -n "Running on: "
    cat $TMPDIR/machines
    
    setenv MDRUN /soft/linux/pkg/gromacs/bin/mdrun-mpi
    setenv MPIRUN /soft/linux/pkg/mpich/intel/bin/mpirun
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
     $MDRUN -v -nice 0 -np $NSLOTS -s topol.tpr -o traj.trr \
      -c confout.gro -e ener.edr -g md.log
    
    echo "Done."
    

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).


[May 07, 2017] Why Won't My Job Run Correctly? ( aka How To Troubleshoot/Diagnose Problems)

May 07, 2017 | biowiki.org

Does your job show "Eqw" or "qw" state when you run qstat , and just sits there refusing to run? Get more info on what's wrong with it using:

$ qstat -j <job number>

Does your job actually get dispatched and run (that is, qstat no longer shows it - because it was sent to an exec host, ran, and exited), but something else isn't working right? Get more info on what's wrong with it using:

$ qacct -j <job number> (especially see the lines "failed" and "exit_status")

If any of the above have an "access denied" message in them, it's probably a permissions problem. Your user account does not have the privileges to read from/write to where you told it (this happens with the -e and -o options to qsub often). So, check to make sure you do. Try, for example, to SSH into the node on which the job is trying to run (or just any node) and make sure that you can actually read from/write to the desired directories from there. While you're at it, just run the job manually from that node, see if it runs - maybe there's some library it needs that the particular node is missing.

To avoid permissions problems, cd into the directory on the NFS where you want your job to run, and submit from there using qsub -cwd to make sure it runs in that same directory on all the nodes.

Not a permissions problem? Well, maybe the nodes or the queues are unreachable. Check with:

qstat -f

or, for even more detail:

qstat -F

If the "state" column in qstat -f has a big E , that host or queue is in an error state due to... well, something. Sometimes an error just occurs and marks the whole queue as "bad", which blocks all jobs from running in that queue, even though there is nothing otherwise wrong with it. Use qmod -c <queue list> to clear the error state for a queue.

Maybe that's not the problem, though. Maybe there is some network problem preventing the SGE master from communicating with the exec hosts, such as routing problems or a firewall misconfiguration. You can troubleshoot these things with qping , which will test whether the SGE processes on the master node and the exec nodes can communicate.

N.B.: remember, the execd process on the exec node is responsible for establishing a TCP/IP connection to the qmaster process on the master node , not the other way around. The execd processes basically "phone home". So you have to run qping from the exec nodes , not the master node!

Syntax example (I am running this on a exec node, and sheridan is the SGE master):

$ qping sheridan 536 qmaster 1

where 536 is the port that qmaster is listening on, and 1 simply means that I am trying to reach a daemon. Can't reach it? Make sure your firewall has a hole on that port, that the routing is correct, that you can ping using the good old ping command, that the qmaster process is actually up, and so on.

Of course, you could ping the exec nodes from the master node, too, e.g. I can see if I can reach exec node kosh like this:

$ qping kosh 537 execd 1

but why would you do such a crazy thing? execd is responsible for reaching qmaster , not the other way around.

If the above checks out, check the messages log in /var/log/sge_messages on the submit and/or master node (on our Babylon Cluster , they're both the node sheridan ):

$ tail /var/log/sge_messages

Personally, I like running:

$ tail -f /var/log/sge_messages

before I submit the job, and then submit a job in a different window. The -f option will update the tail of the file as it grows, so you can see the message log change "live" as your job executes and see what's happening as things take place.

(Note that the above is actually a symbolic link I put in to the messages log in the qmaster spool directory, i.e. /opt/sge/default/spool/qmaster/messages .)

One thing that commonly goes wrong is permissions. Make sure that the user that submitted the job using qsub actually has the permissions to write error, output, and other files to the paths you specified.

For even more precise troubleshooting... maybe the problem is unique only to some nodes(s) or some queue(s)? To pin it down, try to run the job only on some specific node or queue:

$ qsub -l hostname=<node/host name> <other job params>

$ qsub -l qname=<queue name> <other job params>

Maybe you should also try to SSH into the problem nodes directly and run the job locally from there, as your own user, and see if you can get any more detail on why it fails.

If all else fails...

Sometimes, the SGE master host will become so FUBARed that we have to resort to brute, traumatizing force to fix it. The following solution is equivalent to fixing a wristwatch with a bulldozer, but seems to cause more good than harm (although I can't guarantee that it doesn't cause long-term harm in favor of a short-term solution).

Basically, you wipe the database that keeps track of SGE jobs on the master host, taking any problem "stuck" jobs with it. (At least that's what I think this does...)

I've found this useful when:

The solution:

ssh sheridan
su -
service sgemaster stop
cd /opt/sge/default/
mv spooldb spooldb.fubared
mkdir spooldb
cp spooldb.fubared/sge spooldb/
chown -R sgeadmin:sgeadmin spooldb
service sgemaster start

Wipe spooldb.fubared when you are confident that you won't need its contents again.

[Feb 08, 2017] Sge,Torque, Pbs WhatS The Best Choise For A Ngs Dedicated Cluster

Feb 08, 2017 | www.biostars.org
Question: Sge,Torque, Pbs : What'S The Best Choise For A Ngs Dedicated Cluster ? 11 gravatar for abihouee 4.4 years ago by abihouee 110 abihouee 110 wrote:

Sorry, it may be off topics...

We plan to install a scheduler on our cluster (DELL blade cluster over Infiniband storage on Linux CentOS 6.3). This cluster is dedicated to do NGS data analysis.

It seems to me that the most current is SGE, but since Oracle bougth the stuff, there are several alternative developments ( OpenGridEngine , SonGridEngine , Univa Grid Engine ...)

An other possible scheluler is Torque / PBS .

I' m a little bit lost in this scheduler forest ! Is there someone with any experiment on this or who knows some existing benchmark ?

Thanks a lot. Audrey

next-gen analysis clustering • 15k views ADD COMMENT • link modified 2.1 years ago by joe.cornish826 4.4k • written 4.4 years ago by abihouee 110 2

I worked with SGE for years at a genome center in Vancouver. Seemed to work quite well. Now I'm at a different genome center and we are using LSF but considering switching to SGE, which is ironic because we are trying to transition from Oracle DB to PostGres to get away from Oracle... SGE and LSF seemed to offer similar functionality and performance as far as I can tell. Both clusters have several 1000 cpus.

ADD REPLY • link modified 4.3 years ago • written 4.3 years ago by Malachi Griffith 14k 1

openlava ( source code ) is an open-source fork of LSF that while lacking some features does work fairly well.

ADD REPLY • link written 2.1 years ago by Malachi Griffith 14k 1

Torque is fine, and very well tested; either of the SGE forks are widely used in this sort of environment, and has qmake, which some people are very fond of. SLURM is another good possibility.

ADD REPLY • link modified 2.1 years ago • written 2.1 years ago by Jonathan Dursi 250 10 gravatar for matted 4.4 years ago by matted 6.3k Boston, United States matted 6.3k wrote:

I can only offer my personal experiences, with the caveat that we didn't do a ton of testing and so others may have differing opinions.

We use SGE, which installs relatively nicely on Ubuntu with the standard package manager (the gridengine-* packages). I'm not sure what the situation is on CentOS.

We previously used Torque/PBS, but the scheduler performance seemed poor and it bogged down with lots of jobs in the queue. When we switched to SGE, we didn't have any problems. This might be a configuration error on our part, though.

When I last tried out Condor (several years ago), installation was quite painful and I gave up. I believe it claims to work in a cross-platform environment, which might be interesting if for example you want to send jobs to Windows workstations.

LSF is another option, but I believe the licenses cost a lot.

My overall impression is that once you get a system running in your environment, they're mostly interchangeable (once you adapt your submission scripts a bit). The ease with which you can set them up does vary, however. If your situation calls for "advanced" usage (MPI integration, Kerberos authentication, strange network storage, job checkpointing, programmatic job submission with DRMAA, etc. etc.), you should check to see which packages seem to support your world the best.

ADD COMMENT • link written 4.4 years ago by matted 6.3k 1

Recent versions of torque have improved a great deal for large numbers of jobs, but yes, that was a real problem.

I also agree that all are more or less fine once they're up and working, and the main way to decide which to use would be to either (a) just pick something future users are familiar with, or (b) pick some very specific things you want to be able to accomplish with the resource manager/scheduler and start finding out which best support those features/workflows.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250 4 gravatar for Jeremy Leipzig 4.4 years ago by Jeremy Leipzig 16k Philadelphia, PA Jeremy Leipzig 16k wrote:

Unlike PBS, SGE has qrsh , which is a command that actually run jobs in the foreground, allowing you to easily inform a script when a job is done. What will they think of next?

This is one area where I think the support you pay for going commercial might be worthwhile. At least you'll have someone to field your complaints.

ADD COMMENT • link modified 2.1 years ago • written 4.4 years ago by Jeremy Leipzig 16k 2

EDIT: Some versions of PBS also have qsub -W block=true that works in a very similar way to SGE qsrh.

ADD REPLY • link modified 4.4 years ago • written 4.4 years ago by Sean Davis 22k

you must have a newer version than me

>qsub -W block=true dothis.sh 
qsub: Undefined attribute  MSG=detected presence of an unknown attribute
>qsub --version
version: 2.4.11

ADD REPLY • link modified 4.4 years ago • written 4.4 years ago by Jeremy Leipzig 16k

For Torque and perhaps versions of PBS without -W block=true, you can use the following to switches. The behaviour is similar but when called, any embedded options to qsub will be ignored. Also, stderr/stdout is sent to the shell.

qsub -I -x dothis.sh
ADD REPLY • link modified 16 months ago • written 16 months ago by matt.demaere 0 1

My answer should be updated to say that any DRMAA-compatible cluster engine is fine, though running jobs through DRMAA (e.g. Snakemake --drmaa ) instead of with a batch scheduler may anger your sysadmin, especially if they are not familiar with scientific computing standards.

using qsub -I just to get a exit code is not ok

ADD REPLY • link written 2.1 years ago by Jeremy Leipzig 16k

Torque definitely allows interactive jobs -

qsub -I

As for Condor, I've never seen it used within a cluster; it was designed back in the day for farming out jobs between diverse resources (e.g., workstations after hours) and would have a lot of overhead for working within a homogeneous cluster. Scheduling jobs between clusters, maybe?

ADD REPLY • link modified 2.1 years ago • written 2.1 years ago by Jonathan Dursi 250 4 gravatar for Ashutosh Pandey 4.4 years ago by Ashutosh Pandey 10k Philadelphia Ashutosh Pandey 10k wrote:

We use Rocks Cluster Distribution that comes with SGE.

http://en.wikipedia.org/wiki/Rocks_Cluster_Distribution

ADD COMMENT • link written 4.4 years ago by Ashutosh Pandey 10k 1

+1 Rocks - If you're setting up a dedicated cluster, it will save you a lot of time and pain.

ADD REPLY • link written 4.3 years ago by mike.thon 30

I'm not a huge rocks fan personally, but one huge advantage, especially (but not only) if you have researchers who use XSEDE compute resources in the US, is that you can use the XSEDE campus bridging rocks rolls which bundle up a large number of relevant software packages as well as the cluster management stuff. That also means that you can directly use XSEDEs extensive training materials to help get the cluster's new users up to speed.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250 3 gravatar for samsara 4.3 years ago by samsara 470 The Earth samsara 470 wrote:

It has been more than a year i have been using SGE for processing NGS data. I have not experienced any problem with it. I am happy with it. I have not used any other scheduler except Slurm few times.

ADD COMMENT • link written 4.3 years ago by samsara 470 2 gravatar for richard.deborja 2.1 years ago by richard.deborja 80 Canada richard.deborja 80 wrote:

Used SGE at my old institute, currently using PBS and I really wish we had SGE on the new cluster. Things I miss the most, qmake and the "-sync y" qsub option. These two were completely pipeline savers. I also appreciated the integration of MPI with SGE. Not sure how well it works with PBS as we currently don't have it installed.

ADD COMMENT • link written 2.1 years ago by richard.deborja 80 1 gravatar for joe.cornish826 2.1 years ago by joe.cornish826 4.4k United States joe.cornish826 4.4k wrote:

NIH's Biowulf system uses PBS, but most of my gripes about PBS are more about the typical user load. PBS always looks for the next smallest job, so your 30 node run that will take an hour can get stuck behind hundreds (and thousands) of single node jobs that take a few hours each. Other than that it seems to work well enough.

In my undergrad our cluster (UMBC Tara) uses SLURM, didn't have as many problems there but usage there was different, more nodes per user (82 nodes with ~100 users) and more MPI/etc based jobs. However, a grad student in my old lab did manage to crash the head nodes because we were rushing to rerun a ton of jobs two days before a conference. I think it was likely a result of the head node hardware and not SLURM. Made for a few good laughs.

ADD COMMENT • link modified 2.1 years ago • written 2.1 years ago by joe.cornish826 4.4k 2

"PBS always looks for the next smallest job" -- just so people know, that's not something inherent to PBS. That's a configurable choice the scheduler (probably maui in this case) makes, but you can easily configure the scheduler so that bigger jobs so that they don't get starved out by little jobs that get "backfilled" into temporarily open slots.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250

Part of it is because Biowulf looks for the next smallest job but also prioritizes by how much cpu time a user has been consuming. If I've run 5 jobs with 30x 24 core nodes each taking 2 hours of wall time, I've used roughly 3600 CPU hours. If someone is using a single core on each node (simple because of memory requirements), they're basically at a 1:1 ratio between wall and cpu time. It will take a while for their CPU hours to catch up to mine.

It is a pain, but unlike math/physics/etc there are fewer programs in bioinformatics that make use of message passing (and when they do, they don't always need low-latency ICs), so it makes more sense to have PBS work for the generic case. This behavior is mostly seen on the ethernet IC nodes, there's a much smaller (245 nodes) system set up with infiniband for jobs that really need it (e.g. MrBayes, structural stuff).

Still I wish they'd try and strike a better balance. I'm guilty of it but it stinks when the queue gets clogged with memory intensive python/perl/R scripts that probably wouldn't need so much memory if they were written in C/C++/etc.

[Mar 02, 2016] Son of Grid engine version 8.1.9 is availble

Mar 02, 2016 | liv.ac.uk

README

This is Son of Grid Engine version v8.1.9.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.

The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage> , and see
    < http://arc.liv.ac.uk/downloads/SGE/support/  > if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.

[Apr 16, 2015] Undocumented Feature of load sensors

[gridengine users] Undocumented Feature of load sensors
Fritz Ferstl fferstl at univa.com
Thu Apr 16 15:15:36 UTC 2015
•Previous message: [gridengine users] Undocumented Feature of load sensors
•Next message: [gridengine users] Undocumented Feature of load sensors
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

It is certainly an intended feature, William. Always was since load
sensor were introduced in the late 90s.

The thought behind it was that you might have central system management
services which maintain host level information. You can then put the
load sensor on the system management server instead of having 1000s of
hosts query it. But you can use it for other stuff as well, of course.

Cheers,

Fritz

William Hay schrieb:
> It appears that you can have load sensors report values for individual hosts other than the one on which it runs. I've tested this by having a load sensor run on one host report different values for two different hosts and used qhost -F to verify that gridengine reports them.
>
> The possibility of doing this is implied by the format of load sensor reports but I've never seen it explicitly documented as possible or used elsewhere.
>
> Being able to use this would simplify certain aspects of the configuration of our production cluster so it would be useful to know if this is intended behavior
> and therefore something I can rely on or an implementation quirk.
>
> Opinions?

[Apr 9, 2015] Anyone have scripts for detecting users who bypass grid engine

Reuti reuti at staff.uni-marburg.de
Thu Apr 9 21:19:12 UTC 2015
Am 09.04.2015 um 23:09 schrieb Feng Zhang:

> I know that some people use ssh as rsh_command, which may have similar problem?

Not when you have a tight integration of `ssh` in SGE:

https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html section "SSH TIGHT INTEGRATION"

Then `ssh` can't spawn any process which escapes form SGE.

-- Reuti


> On Thu, Apr 9, 2015 at 3:46 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Am 09.04.2015 um 21:23 schrieb Chris Dagdigian:
>> 
>>> 
>>> I'm one of the people who has been arguing for years that technological methods for stopping abuse of GE systems never work in the long term because motivated users always have more time and interest than overworked admins so it's kind of embarrassing to ask this but ...
>>> 
>>> Does anyone have a script that runs on a node and prints out all the userland processes that are not explicitly a child of a sge_sheperd daemon?
>>> 
>>> I'm basically looking for a light way to scan a node just to see if there are users/tasks running that are outside the awareness of the SGE qmaster.  Back in the day when we talked about this it seemed that one easy method was just looking for user stuff that was not a child process of a SGE daemon process.
>>> 
>>> The funny thing is that it's not the HPC end users who do this. As the grid(s) get closer and closer to the enterprise I'm starting to see software developers and others trying to play games and then plead ignorance when asked "why did you SSH to a compute node and start a tomcat service out of your home directory?". heh.
>> 
>> Why allow `ssh` to a node at all? In my installations only the admins can do this. If users want to peek around on a node I have an interactive queue with a h_cpu limit of 60 seconds for this. So even login in to a node is controlled by SGE.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>> -chris
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Best,
> 
> Feng
> 

[Mar 30, 2015] Consumable configuration best practices question for hundreds of resources for specific group of nodes

[gridengine users] Consumable configuration best practices question for hundreds of resources for specific group of nodes
William Hay w.hay at ucl.ac.uk
Mon Mar 30 08:41:10 UTC 2015
•Previous message: [gridengine users] Consumable configuration best practices question for hundreds of resources for specific group of nodes
•Next message: [gridengine users] External Scheduler
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 29 Mar 2015 08:50:15 +0000
Yuri Burmachenko <yuribu at mellanox.com> wrote:


>
> Users will care about which cells they are using.

Could you confirm my understanding is correct below is correct:
The users of this system care which cells they need to use for reasons other than avoiding oversubscription of the cell.
Cell 25 is fundamentally different from cell 39 even when both are free.
The users want to be able to tell the scheduler which cells to use rather than being able to write a job script that can read a list of cells
to use from a file or similar.

If all the above is true then your 300 different complex_values are probably unavoidable but it won't be pretty.

>
> Our partial solution should allow the users to control/monitor/request/free these cells.
>
>
>
> I looked into the links https://arc.liv.ac.uk/trac/SGE/ticket/1426 and http://gridengine.eu/grid-engine-internals/102-univa-grid-engine-810-features-part-2-better-resource-management-with-the-rsmap-complex-2012-05-25 - I see that many consumable resources can be attached on host basis with RSMAP.
>
Not entirely AIUI (and we're not Univa customers) RSMAP resources can be associated with queues or the global host as well. Also you request the number of resources you want but UGE assigns the specific resources(cells in you case) that your job will use. If I'm understanding you correctly that won't work for you.

> We need to be able to attach these 300 consumable resources as shared between 4 nodes – is it possible? Maybe a separate queue for these 4 particular hosts with list of complex consumable resources?

That doesn't work because resources defined on a cluster queue exist for each queue instance.

Grid Engine doesn't have a simple way to associate a resource with a group of hosts other than the cluster as a whole. What you can do is define resource availability on the global pseudo host then add a restriction by some means to prevent usage other than on the hosts in question:

*You could define your queue configuration so that all queues on all other nodes have 0 of the resource available while the nodes with access say nothing about availability and therefore have access to the full resources defined on the global host.
*You could define the resources as having 0 availability on hosts other than the ones in question.
*You could probably also do the same with resource quotas.

The first of the above is probably simplest/least work assuming your existing queue configuration is simple.

> All cells are different and users will need to know which one they need to request. At this stage they all should be distinct.

OK. If users request a lot of different cells for individual jobs this will probably lead to long delays before jobs start. Said users will almost certainly want to request
a dynamic reservation for their jobs (-R y).

[Sep 20, 2014] README for Son of Grid Engine version v8.1.7

arc.liv.ac.uk

This is Son of Grid Engine version v8.1.7.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.

The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9. For some reason the el5 signatures won't verify on RHEL5, but they can be verified by transferring the rpms to an RHEL6 system.

More (S)RPMS may be available at http://jur-linux.org/rpms/el-updates/, thanks to Florian La Roche.

[Sep 20, 2014] Son of Grid Engine

Contents
  1. News
  2. Repositories/Source
  3. Building
  4. Bug reporting, patches, and mail lists
  5. History
  6. Copyright and Naming
  7. Related
  8. Other Resources
  9. Contact

The Son of Grid Engine is a community project to continue Sun's old gridengine ​free software project that used to live at ​http://gridengine.sunsource.net after Oracle shut down the site and stopped contributing code. (Univa now own the copyright - see below.) It will maintain copies of as much as possible/useful from the old site.

The idea is to encourage sharing, in the spirit of the original project, informed by long experience of free software projects and scientific computing support. Please contribute, and share code or ideas for improvement, especially any ideas for encouraging contribution.

This effort precedes Univa taking over gridengine maintenance and subsequently apparently making it entirely proprietary, rather than the originally-promised 'open core'. What's here was originally based on ​Univa's free code and was intended to be fed into that.

See also the ​gridengine.org site, in particular the ​mail lists hosted there. The gridengine.org users list is probably the best one to use for general gridengine discussions and questions which aren't specific to this project.

Currently most information you find for the gridengine v6.2u5 release will apply to this effort, but the non-free documentation that used to be available from Oracle has been expurgated and no-one has the time/interest to replace it. See also Other Resources, particularly ​extra information locally, and the ​download area.

This wiki isn't currently generally editable, but will be when spam protection is in place; yes it needs reorganizing and expanding. If you're a known past contributor to gridengine and would like to help, please get in touch for access or to make any other contributions.

[Dec 30, 2013] gridengine-6.2u5-10.el6.4.x86_64.rpm CentOS 6 / RHEL 6 ...

Download gridengine-6.2u5-10.el6.4.x86_64.rpm for CentOS 6 / RHEL 6 from EPEL repository. 10. Advanced search. About; Contact; ... /usr/share/gridengine/bin/lx26-amd64/qhost /usr/share/gridengine/bin/lx26 ... 2012-03-15 - Orion Poplawski <[email protected]> 6.2u5-10.2 - Use sge_/SGE_ in man pages.

pkgs.org/centos-6-rhel-6/epel-x86_64/gridengine-... More from pkgs.org

Univa Grid Engine Truth

What is Grid Engine?

Grid Engine is a job scheduler that has been around for years and it's FREE!! If you are already using it under an open source license you certainly don't need to buy it. Grid Engine started out as a Sun Microsystems product known as Sun Grid Engine (SGE). After Oracle purchased Sun it became Oracle Grid Engine.

Why is another company trying to sell Grid Engine if it is Free?

A small company called Univa has essentially taken away some of the Grid Engine development staff from Oracle is selling support bundled with what they feel is an upgraded source code that is no longer Open Source. This company wants to sell you Grid Engine support instead of you going to Oracle and buying it for essentially the same price. You can even get free Grid Engine Support here with the open source community, and here with the Oracle community.

And you can get the Oracle version here for free which is being developed just like the small company version is but WITH the blessing of Oracle who actually bought this product from Sun.

If you are looking at buying the univa company version of Grid Engine you might ask yourself what you are buying? Is there a free product that is the same? Yes, from Oracle and Source forge. Is there another more reputable version of the same product? Yes, from Oracle. Are there other schedulers out there that are more robust that you can buy? Yes, Platform Computing has an excellent product called LSF that can often be purchased for much less than univa grid engine can be. PBSWorks offers a scheduler that is very good as well as RTDA. There is even a new company that is developing the free Grid Engine source code as well as the core and is actively supporting the free community with support and upgrades called Scalable logic. They have even now come out with an upgraded free version of Grid Engine as Univa has attempted to but this version from Scalable Logic is free and totally open source. It has support for many Operating Systems including even Windows.

Are there risks in going with this version of Grid Engine from Univa?

It's possible that univa may tell you that you could be risking violation of software licensing agreements with Oracle or other parties by using certain versions of Grid Engine for free. They may try to use fear, uncertainty, and doubt (FUD) to scare you into buying with them in thinking that it will protect you from Oracle. It may, but before you buy you may want to check that out with Oracle and the open source community and find out for yourself because that may not be the real risk you face. What you may face with this small company is potentially more operational than legal.

If you think about it, they are essentially trying to make money off of a free open source product. This is not the most lucrative idea in the software world and makes the prospect of making money as a company doing this very difficult if not impossible. You might ask yourself if you think they are going to make it. They have carved out a software product and a team from one of the largest software companies in the world, trying to make money on a free product that Oracle bought with the Sun acquisition. If they do not make it and fail as a company, where will you be with your paid software subscription and product through them? If they do make it and then happen to gain the attention of Oracle and its Lawyers, where will you be if Oracle decides to take legal action against them, or just decides to shut them down? Do you really think that a small company with possibly faulty management and financials would have the resources to remain, let alone still be concerned with your support contract? Would your company be protected or could that liability extend to you as well? These might all be questions you would want to pose to Oracle or at least another party besides Univa if you decided on purchasing Grid Engine.

Either way, univa and its pay version of Grid Engine could be in a tough spot. No matter which way they go they may have a good chance of ending up insolvent or worse. If this happens where would your support contract with them be. Or worse still, what position would you be in with to Oracle at that point? Again, a very good question to ask Oracle. With all these risks it might be better to again look at the free version which even Oracle is offering as they themselves are showing commitment to Grid Engine and the enhancement of the free version.

[Oct 18, 2013] Oracle Grid Engine EOL

Effective October 22, 2013, Univa, a leader in Grid Engine technology, will assume product support for Oracle Grid Engine customers for the remaining term of their existing Oracle Grid Engine support contracts.

For continued access to Grid Engine product support from Univa, customers with an active support contract should visit support.univa.com, or contact Univa Support at [email protected] or 800.370.5320.

For more details on this announcement or future sales and support inquiries for Grid Engine, please visit www.univa.com/oracle or contact [email protected].

[Mar 03, 2013] Son of Grid Engine 8.1.3

Son of Grid Engine is a highly-scalable and versatile distributed resource manager for scheduling batch or interactive jobs on clusters or desktop farms. It is a community project to continue Sun's Grid... Engine.

It is competitive against proprietary systems and provides better scheduling features and scalability than other free DRMs like Torque, SLURM, Condor, and Lava

[Jun 12, 2012] Son of Grid Engine 8.1.0 available

SGE 8.1.0 is available from

http://arc.liv.ac.uk/downloads/SGE/releases/8.1.0

It corrects a few problems with the previous version, takes an overdue opportunity to adopt a more logical numbering now that tracking the Univa repo is irrelevant, and improves the RPM packaging.

The RPMs now include the GUI installer and the "herd" Hadoop integration built against a more recent Cloudera distribution. (The GUI installer was previously separate as the original IzPack packaging was non-distributable.)

Generally this distribution has hundreds of improvements not (freely?) available in others, including security fixes, maintained documentation, and easy building at least on recent GNU/Linux.

Univa Announces Grid Engine 8.1 to Further Evolve the Popular Software

Yahoo!

Univa, the Data Center Automation Company, announced today the release of Univa Grid Engine Version 8.1, the most widely deployed, distributed resource management software platform used by enterprises and research organizations across the globe. Univa Grid Engine is the industry-leading choice for workload management and integrating Big Data solutions while saving time and money through increased uptime and reduced total cost of ownership. Corporations in the industries of Oil and Energy, Life Sciences and Biology, and Semiconductors rely on Univa Grid Engine when they need mission-critical computing capacity to model and solve complex problems.

Key features include:

Jeppesen has implemented Univa Grid Engine to support their Crew & Fleet management products for distributing optimization jobs, RAVE compilations and Studio sessions. "Jeppesen has selected Univa Grid Engine as this was the most appealing alternative looking at both cost and Univa's ability to make future enhancements to the product," said Pete Catherall, Business Operations Manager, Jeppesen. "This is another example of that."

[May 07, 2012] The memories of a Product Manager The True Story of the Grid Engine Dream

April 25, 2012

After Wolfgang left Sun, - many fine people in Sun had to leave at that time - it was frustrating to see how our efforts to have two Sun Grid Engine products (one available by subscription and one available as free Open Source) failed because of management veto. On one hand we were under pressure to be profitable as a unit, on the other hand, our customers appeared to have no reason to pay even one cent for a subscription or license.

Oracle still has IP control of Grid Engine. Both Univa and Oracle decided to make no more contributions to the open source. While in Oracle open source policies are clear, Univa, a champion of open source for many years, has surprised the community. This has created an agitated thread on Grid Engine discussion group.

[May 07, 2012] Sun-Oracle Grid Engine 6.2 installation on Windows Nirmal's Haven

Sun Grid Engine 6.2 Update 2 introduced the support for Windows Operating systems to run as worker nodes. Sun or Oracle Grid Engine as it's being relabeled now is a distributed resource manager primarily used in HPC environment, but there's more widespread use now with all the new features introduced as part of Update 5.

Here I'm going to detail a quick how-to of getting Grid Engine installed and running on Windows hosts. This is more applicable for Windows XP and Windows Server 2003, some of additional prerequisites required on the Windows hosts are now standard in Windows Server 2008 and Windows 7.

[Dec 05, 2011] Son of Grid Engine 8.0.0d

Son of Grid Engine is a highly-scalable and versatile distributed resource manager for scheduling batch or interactive jobs on server farms or desktop farms. It is a community project to continue Sun's Grid...

[Jun 25, 2011] server farm Tricks Grid Engine License Juggling -

Bio-IT World

...NIBR had already chosen Sun Grid Engine Enterprise Edition (SGEEE) to run on the server farm. The BioTeam was asked to deploy SGEEE and integrate several FLEXlm-licensed scientific applications. Acceptance tests for determining success were rigorous. The server farm had to withstand test cases developed by the researchers while automatically detecting and correcting license-related job errors without human intervention.

The core problem turned out to be the most straightforward to solve. To prevent the NIBR server farm from running jobs when no licenses were available, the Grid Engine scheduler needed to become license aware. This was accomplished via a combination of "load sensor" scripts and specially configured Grid Engine "resources."

· Load sensor scripts give Grid Engine operators the ability to collect additional system measurements to help make scheduling or resource allocation decisions.

· Resources are a Grid Engine concept used primarily by users who require a particular need to be met in order for a job to complete successfully. A user-requested resource could be dynamic ("run job only on a system with at least 2 GB of free memory") or static ("run job on the machine with laser printer attached").

The NIBR plan involved creating custom resource attributes within Grid Engine so that scientists could submit jobs with the requirement "only run this job if a license is available." If licenses were available, the jobs would be dispatched immediately; if not, the jobs would be held until licenses were available.

To this point, the project was easy. Much more difficult - and more interesting - were efforts to meet NIBR acceptance tests.

The first minor headache resulted from trying to accurately automate querying of the FLEXlm license servers. One FLEXlm license server was an older version that only revealed the number of currently in-use licenses. This meant that the total number of available licenses (equally important) needed to be hard-coded into Grid Engine. NIBR researchers felt strongly that this could create server farm management problems, so the license server was upgraded to a version that allowed the load sensor script to learn how many licenses were available.

The next problem was figuring out how to automatically detect jobs that still managed to fail with license-related errors. The root cause of these failures is the loose integration between the FLEXlm license servers and Grid Engine. Race conditions may occur when Grid Engine launches server farm jobs that do not immediately check out their licenses from the FLEXlm server. Delays can cause Grid Engine's internal license values to get slightly out of sync with the real values held by the license server.

Nasty race conditions between license servers and server farm resource management systems such as Grid Engine are mostly unavoidable at present. The solution everyone is hoping for is FLEXlm support of an API (application programming interface) for advance license reservation and checkout. Applications such as Grid Engine could then directly hook into the FLEXlm system rather than rely on external polling methods. Until this occurs, we are left with half-measures and workarounds.

[Jun 25, 2011] Grid Engine for Users BioTeam Blog

Mar 10, 2011

Back in the day …

Way back in 2009 I placed an aging copy of my Grid Engine Administration training materials online. Response has been fantastic and it's still one of the more popular links on this blog.

Today

Well it's probably past time I did something similar aimed at people actually interested in using Grid Engine rather than just administering it.

It's not comprehensive or all that interesting but I am going to post a bunch of slides cherry picked from the collection of things I build custom training presentations from. Think of them as covering an intro-level view of Grid Engine use and usage.

Intro to Grid Engine Usage & Simple Workflows

There are two slide decks, both of which are mostly stripped of information that is unique to a particular client, customer or Grid Engine server farm.

The first presentation is a short and dry introduction aimed at a user audience – it explains what Grid Engine is, what it does and what the expectations of the users are. It then dives into commands and examples.

The second presentation is also aimed at a basic user audience but talks a bit more about workflows, pipelines and simple SGE features that make life a bit easier for people who need to do more than a few simple 'qsub' actions.

[Jun 23, 2011] ds-gridengine-167114

By abstracting end users from the specific machines processing the workload, machine failures can be taken in stride. When a machine fails, the workload it was processing can be requeued and rescheduled. While the machine remains down, new workload is scheduled around that machine, preventing end users from ever noticing the machine failure. In addition to the Oracle Grid Engine product's rich scheduling and workload management capabilities, it also has the ability to share resources among fixed services, such as between two Oracle Grid Engine server farms, resulting in even higher overall data center utilization. Included in this capability is the ability to reach out to a private or public cloud service provider to lease additional resources when needed. During peak workload periods, additional virtual machines can be leased from a cloud service provider to augment the on-site resources. When the workload subsides the leased cloud resources are released back to the cloud, minimizing the costs. Such cloud bursting capabilities allow an enterprise to handle regular and unpredictable peak workloads without resorting to purchasing excess additional

Son of Grid Engine

The Son of Grid Engine is a community project to continue Sun's old grid engine free software project that used to live at http://gridengine.sunsource.net, now that Oracle have shut down the site and are not contributing code. It will maintain copies of as much as possible/useful from the old site. Currently we do not have the old mail archives online, though articles from the old gridengine-users list from the last five years or so will be available soon, and Oracle have promised to donate artifacts from the old site, so we should be able to get complete copies of everything.

The idea is to encourage sharing, in the spirit of the original project, and informed by long experience of free software projects and scientific computing support. Please contribute, and share code or ideas for improvement, especially any ideas for encouraging contribution.

Currently any information you find for the grid engine v6.2u5 release will apply to this effort, including the v6.2u5 wiki documentation and pointers therefrom, including the grid engine.info site and its wiki. There may eventually also be useful info at the Oracle Grid Engine Forum. You should note its terms of use before posting there; they include even relinquishing moral rights.

This wiki isn't currently generally editable, but will be when spam protection is in place. If you're a known past contributor to grid engine and would like to help, please get in touch for access.

[Dec 23, 2010 ] Oracle Grid Engine Changes for a Bright Future at Oracle

Dec 23, 2010 | DanT's Grid Blog

For the past decade, Oracle Grid Engine has been helping thousands of customers marshal the enterprise technical computing processes at the heart of bringing their products to market. Many customers have achieved outstanding results with it via higher data center utilization and improved performance. The latest release of the product provides best-in-class capabilities for resource management including: Hadoop integration, topology-aware scheduling, and on-demand connectivity to the cloud.

Oracle Grid Engine has a rich history, from helping BMW Oracle Racing prepare for the America's Cup to helping isolate and identify the genes associated with obesity; from analyzing and predicting the world's financial markets to producing the digital effects for the popular Harry Potter series of films. Since 2001, the Grid Engine open source project has made Oracle Grid Engine functionality available for free to open source users. The Grid Engine open source community has grown from a handful of users in 2001 into the strong, self-sustaining community that it is now.

Today, we are entering a new chapter in Oracle Grid Engine's life. Oracle has been working with key members of the open source community to pass on the torch for maintaining the open source code base to the Open Grid Scheduler project hosted on SourceForge. This transition will allow the Oracle Grid Engine engineering team to focus their efforts more directly on enhancing the product. In a matter of days, we will take definitive steps in order to roll out this transition. To ensure on-going communication with the open source community, we will provide the following services:

Oracle is committed to enhancing Oracle Grid Engine as a commercial product and has an exciting road map planned. In addition to developing new features and functionality to continue to improve the customer experience, we also plan to release game-changing integrations with several other Oracle products, including Oracle Enterprise Manager and Oracle Coherence. Also, as Oracle's cloud strategy unfolds, we expect that the Oracle Grid Engine product's role in the overall strategy will continue to grow. To discuss our general plans for the product, we would like to invite you to join us for a live webcast on Oracle Grid Engine's new road map. Click here to register.

Next Steps:

Grid Computing Oracle Grid Engine Software Sun Microsystems

SGE 6.2u3 beta release

[Jan 19, 2010] Oracle Grid Engine Creators Move to Univa by Chris Preimesberger

2011-01-19 | eWeek.com

As a result, Univa will offer engineering support for current Oracle Grid Engine deployments and will release a new Univa version of the DRM by March.

Univa revealed Jan. 19 that the principal engineers from the Sun/Oracle Grid Engine team, including Grid Engine founder and original project owner Fritz Ferstl, have left Oracle and are joining the company.

As a result, Univa will now offer engineering support for current Oracle Grid Engine deployments and will release a new Univa version of Grid Engine before the end of the first quarter of 2011.

Oracle Grid Engine software is a distributed resource management (DRM) system that manages the distribution of users' workloads to the best available compute resources within the system. While compute resources in a typical data center have utilization rates that average only 10 percent to 25 percent, the Oracle Grid Engine can help a company increase utilization to 80, 90 or even 95 percent, Oracle said.

This significant improvement comes from the intelligent distribution of workload to the most appropriate available resources.

When users submit their work to Oracle Grid Engine as jobs, the software monitors the current state of all resources in the server farm and is able to assign these jobs to the best-suited resources. Oracle Grid Engine gives administrators both the flexibility to accurately model their computing environments as resources and to translate business rules into policies that govern the use of those resources, Oracle said.

"Combining the Grid Engine and Univa technology offerings was a once-in-a-lifetime opportunity that the new Univa EMEA team and I just couldn't miss," Ferstl said. "Now we'll be able to interact with and serve users worldwide investigating and understanding their data center optimization needs."

Lisle, Ill.-based Univa will concentrate on improving the Grid Engine for technical computing and HPC use cases in addition to promoting the continuity of the Grid Engine open-source community, Univa said.

[Nov 30, 2009] Sun Grid Engine for Dummies

Nov 30, 2009 | DanT's Grid Blog
Servers tend to be used for one of two purposes: running services or processing workloads. Services tend to be long-running and don't tend to move around much. Workloads, however, such as running calculations, are usually done in a more "on demand" fashion. When a user needs something, he tells the server, and the server does it. When it's done, it's done. For the most part it doesn't matter on which particular machine the calculations are run. All that matters is that the user can get the results. This kind of work is often called batch, offline, or interactive work. Sometimes batch work is called a job. Typical jobs include processing of accounting files, rendering images or movies, running simulations, processing input data, modeling chemical or mechanical interactions, and data mining. Many organizations have hundreds, thousands, or even tens of thousands of machines devoted to running jobs.

Now, the interesting thing about jobs is that (for the most part) if you can run one job on one machine, you can run 10 jobs on 10 machines or 100 jobs on 100 machines. In fact, with today's multi-core chips, it's often the case that you can run 4, 8, or even 16 jobs on a single machine. Obviously, the more jobs you can run in parallel, the faster you can get your work done. If one job takes 10 minutes on one machine, 100 jobs still only take ten minutes when run on 100 machines. That's much better than 1000 minutes to run those 100 jobs on a single machine. But there's a problem. It's easy for one person to run one job on one machine. It's still pretty easy to run 10 jobs on 10 machines. Running 1600 jobs on 100 machines is a tremendous amount of work. Now imagine that you have 1000 machines and 100 users all trying to running 1600 jobs each. Chaos and unhappiness would ensue.

To solve the problem of organizing a large number of jobs on a set of machines, distributed resource managers (DRMs) were created. (A DRM is also sometimes called a workload manager. I will stick with the term, DRM.) The role of a DRM is to take a list of jobs to be executed and distributed them across the available machines. The DRM makes life easier for the users because they don't have to track all their jobs themselves, and it makes life easier for the administrators because they don't have to manage users' use of the machines directly. It's also better for the organization in general because a DRM will usually do a much better job of keeping the machines busy than users would on their own, resulting in much higher utilization of the machines. Higher utilization effectively means more compute power from the same set of machines, which makes everyone happy.

Here's a bit more terminology, just to make sure we're all on the same page. A cluster is a group of machines cooperating to do some work. A DRM and the machines it manages compose a cluster. A cluster is also often called a grid. There has historically been some debate about what exactly a grid is, but for most purposes grid can be used interchangeably with cluster. Cloud computing is a hot topic that builds on concepts from grid/cluster computing. One of the defining characteristics of a cloud is the ability to "pay as you go." Sun Grid Engine offers an accounting module that can track and report on fine grained usage of the system. Beyond that, Sun Grid Engine now offers deep integration to other technologies commonly being used in the cloud, such as Apache Hadoop.

Onobre of the best ways to show Sun Grid Engine's flexibility is to take a look a some unusual use cases. These are by no means exhaustive, but they should serve to give you an idea of what can be done with the Sun Grid Engine software. Further Reading

For more information about Sun Grid Engine, here are some useful links:

Chi Hung Chan SGE Grid Job Dependency

SGE Grid Job Dependency

It is possible to describe SGE (Sun Grid Engine) job (or any other grid engine) dependency in a DAG (Directed Acyclic Graph) format. By taking advantage of the opensource Graphviz, it is very easy to document this dependency in DOT language format. Below shows you a sample DOT file:
$ cat job-dep.dot
digraph jobs101 {
        job_1 -> job_11;
        job_1 -> job_12;
        job_1 -> job_13;
        job_11 -> job_111;
        job_12 -> job_111;
        job_2 -> job_13;
        job_2 -> job_21;
        job_3 -> job_21;
        job_3 -> job_31;
}

With this DOT file, one can generate the graphical representation:

$ dot -Tpng -o job-dep.png job-dep.dot

It is also possible to derive the corresponding SGE commands by the following Tcl script.

$ cat ./dot2sge.tcl
#! /usr/local/bin/tclsh


if { $argc != 1 } {
        puts stderr "Usage: $argv0 "
        exit 1
}
set dotfile [lindex $argv 0]
if { [file exists $dotfile] == 0 } {
        puts stderr "Error. $dotfile does not exist"
        exit 2
}


# assume simple directed graph a -> b

set fp [open $dotfile r]
set data [read $fp]
close $fp


set sge_jobs {}
foreach i [split [lindex $data 2] {;}] {
        if { [regexp {(\w+)\s*->\s*(\w+)} $i x parent child] != 0 } {
                lappend sge_jobs $parent
                lappend sge_jobs $child

                lappend sge_job_rel($parent) $child
        }
}


# submit unique jobs, and hold
set queue all.q
set sge_unique_jobs [lsort -unique $sge_jobs]
foreach i $sge_unique_jobs {
        puts "qsub -h -q $queue -N $i job-submit.sh"
}


# alter the job dependency, but unhold after all the hold relationships are
# established
foreach i $sge_unique_jobs {
        if { [info exists sge_job_rel($i)] } {
                # with dependency
                puts "qalter -hold_jid [join $sge_job_rel($i) {,}] $i"
        }
}
foreach i $sge_unique_jobs {
        puts "qalter -h U $i"
}

Run this Tcl script to generate the SGE submission commands and alternation commands to register the job dependency

$ ./dot2sge.tcl job-dep.dot
qsub -h -q all.q -N job_1 job-submit.sh
qsub -h -q all.q -N job_11 job-submit.sh
qsub -h -q all.q -N job_111 job-submit.sh
qsub -h -q all.q -N job_12 job-submit.sh
qsub -h -q all.q -N job_13 job-submit.sh
qsub -h -q all.q -N job_2 job-submit.sh
qsub -h -q all.q -N job_21 job-submit.sh
qsub -h -q all.q -N job_3 job-submit.sh
qsub -h -q all.q -N job_31 job-submit.sh
qalter -hold_jid job_11,job_12,job_13 job_1
qalter -hold_jid job_111 job_11
qalter -hold_jid job_111 job_12
qalter -hold_jid job_13,job_21 job_2
qalter -hold_jid job_21,job_31 job_3
qalter -h U job_1
qalter -h U job_11
qalter -h U job_111
qalter -h U job_12
qalter -h U job_13
qalter -h U job_2
qalter -h U job_21
qalter -h U job_3
qalter -h U job_31

Below show the above proof-of-concept in action. So sit back....

#
# ----------below is a very simple script
#
$ cat job-submit.sh
#! /bin/sh
#$ -S /bin/sh

date
sleep 10


#
# ----------run all the qsub to submit jobs, but put them on hold
#
$ qsub -h -q all.q -N job_1 job-submit.sh
Your job 333 ("job_1") has been submitted.
$ qsub -h -q all.q -N job_11 job-submit.sh
Your job 334 ("job_11") has been submitted.
$ qsub -h -q all.q -N job_111 job-submit.sh
Your job 335 ("job_111") has been submitted.
$ qsub -h -q all.q -N job_12 job-submit.sh
Your job 336 ("job_12") has been submitted.
$ qsub -h -q all.q -N job_13 job-submit.sh
Your job 337 ("job_13") has been submitted.
$ qsub -h -q all.q -N job_2 job-submit.sh
Your job 338 ("job_2") has been submitted.
$ qsub -h -q all.q -N job_21 job-submit.sh
Your job 339 ("job_21") has been submitted.
$ qsub -h -q all.q -N job_3 job-submit.sh
Your job 340 ("job_3") has been submitted.
$ qsub -h -q all.q -N job_31 job-submit.sh
Your job 341 ("job_31") has been submitted.


#
# ----------show the status, all jobs are in hold position (hqw)
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      hqw   07/19/2007 21:04:34     1


#
# ----------register the job dependency
#
$ qalter -hold_jid job_11,job_12,job_13 job_1
modified job id hold list of job 333
   blocking jobs: 334,336,337
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_11
modified job id hold list of job 334
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_12
modified job id hold list of job 336
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_13,job_21 job_2
modified job id hold list of job 338
   blocking jobs: 337,339
   exited jobs:   NONE
$ qalter -hold_jid job_21,job_31 job_3
modified job id hold list of job 340
   blocking jobs: 339,341
   exited jobs:   NONE


#
# ----------release all the holds and let SGE to sort itself out
#
$ qalter -h U job_1
modified hold of job 333
$ qalter -h U job_11
modified hold of job 334
$ qalter -h U job_111
modified hold of job 335
$ qalter -h U job_12
modified hold of job 336
$ qalter -h U job_13
modified hold of job 337
$ qalter -h U job_2
modified hold of job 338
$ qalter -h U job_21
modified hold of job 339
$ qalter -h U job_3
modified hold of job 340
$ qalter -h U job_31
modified hold of job 341


#
# ----------query SGE stats
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      qw    07/19/2007 21:04:34     1


#
# ----------some jobs started to run
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      qw    07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      qw    07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1


#
# ----------output of all jobs, you can see job job_1/2/3 finished last
#
$ grep 2007 job_*.o*
job_111.o335:Thu Jul 19 21:05:36 SGT 2007
job_11.o334:Thu Jul 19 21:05:51 SGT 2007
job_12.o336:Thu Jul 19 21:05:51 SGT 2007
job_13.o337:Thu Jul 19 21:05:36 SGT 2007
job_1.o333:Thu Jul 19 21:06:06 SGT 2007
job_21.o339:Thu Jul 19 21:05:36 SGT 2007
job_2.o338:Thu Jul 19 21:05:51 SGT 2007
job_31.o341:Thu Jul 19 21:05:37 SGT 2007
job_3.o340:Thu Jul 19 21:05:52 SGT 2007

Another successful proof-of-concept. :-)

Labels: Graphviz, SGE, Tcl

posted by chihungchan at 9:32 PM

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Oracle Grid Engine documentation

Oracle Grid Engine - Wikipedia, the free encyclopedia

Guide to Using the Grid Engine The Particle Beam Physics Laboratory at the UCLA Department of Physics and Astronomy

How To Use Sun Grid Engine Main Biowiki

SUN Grid Engine - UU/Department of Information Technology

BeocatDocs-SunGridEngine - CIS Support

Sun

Oracle

Univa

Grid Engine in 2012 & Beyond

What the heck is going on with Grid Engine in 2012 and beyond? If you've found this page and have managed to keep reading, you are probably interested in Grid Engine and what it may look like in the future. This post will attempt to summarize what is currently available.

History of this site

This website was thrown together very quickly in early 2011 when Oracle announced it was taking Oracle Grid Engine in a new "closed source" direction. Very soon after the announcement, the open source SGE codebase was forked by multiple groups. Oracle had also been hosting the popular gridengine.sunsource.net site where documentation, HowTo's and a very active mailing list had become the default support channel for many SGE users and administrators.

This website was seen as a gathering point and central public portal for the various grid engine fork efforts. It was also a natural place to host a new "[email protected]" mailing list in an attempt to recreate the atmosphere found in the old "[email protected]" listserv community.

The new mailing list was a success but efforts to build a "Steering Committee" that would drive some sort of coordinated effort stalled throughout most of 2011. Truth be told, we probably don't need a central site or even a steering committee - the maintainers of the various forks all know each other and can easily trade patches, advice and collaborative efforts among themselves.

It's best simply to recast the gridengine.org site as a convenient place for information broadly of interest to all Grid Engine users, administrators and maintainers – mailing lists, news and pointers to information, software & resources.

Available Grid Engine Options

Open Source

"Son of Grid Engine"

URL: https://arc.liv.ac.uk/trac/SGE
News & Announcements: http://arc.liv.ac.uk/repos/darcs/sge/NEWS
Description: Baseline code comes from the Univa public repo with additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. Future releases may start to diverge from Univa as Univa pursues an "open core" development model. Maintainers have made efforts to make building binaries from source easier and the latest release offers RedHat Linux SRPMS and RPM files ready for download.
Support: Supported via the maintainers and the users mailing list.

"Open Grid Scheduler"

URL: http://gridscheduler.sourceforge.net/
Description: Baseline code comes from the last Oracle open source release with significant additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. No pre-compiled "courtesy binaries" available at the SourceForge site (just source code and instructions on how to build Grid Engine locally). In November 2011 a new company ScalableLogic announced plans to offer commercial support options for users of Open Grid Scheduler.
Support
: Supported via the maintainers and the users mailing list. Commercial support from ScalableLogic.

Commercial

"Univa Grid Engine"

URL: http://www.univa.com/products/grid-engine
Description: Commercial company selling Grid Engine, support and layered products that add features and functionality. Several original SGE developers are now employed by Univa. Evaluation versions and "48 cores for free" are available from the website.
Support: Univa supports their own products.

"Oracle Grid Engine"

URL: http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html
Description: Continuation of "Sun Grid Engine" after Oracle purchased Sun. This is the current commercial version of Oracle Grid Engine after Oracle discontinued the open source version of their product and went 100% closed-source.
Support: Oracle supports their own products, a web support forum for Oracle customers can be found at https://forums.oracle.com/forums/forum.jspa?forumID=859

Univa Grid Engine - Wikipedia, the free encyclopedia

Univa Grid Engine - Daniel's Blog about Grid Engine



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: February 27, 2021