Jobcontrol

Jobcontrol is a way to allow tasks to run asynchronously, and provides support for starting tasks on different machines.

For example, you may launch a task from a laptop (running Maestro) to a compute node, so that the task runs on several cores. Jobcontrol takes care of transferring input files from your laptop to the cluster and collecting results and log files once the job is complete.

Launching a job means running a command with -HOST <host entry argument>. A host entry is currently defined in schrodinger.hosts files.

Example:

$SCHRODINGER/ligprep -imae in.mae -omae out.mae

Running with no arguments runs on localhost. Adding -HOST bolt_cpu would submit the job to bolt.

The jobcontrol module contains four major sections:

  1. Job data interaction - Deals with getting information about existing jobs.

  2. Job launching - Deals with starting a subjob.

  3. Job backend interaction - Provides utilities for a Python script running as a job.

  4. Job Hosts.

Job Model

From the commandline perspective, a job consists of a short script that takes care of submitting the job, and will return with output of: JobId: <jobid>

If the command returns with a zero exit status and JobId, the job was successfully started. This should take seconds for a small job, or the time to negotiate start with the remote host. Then, the job is running in the background.

Running code under jobcontrol

Python scripts that run locally can be adapted to run remotely. jobcontrol will use launchapi if the script defines a function get_job_spec_from_args at the top level. $SCHRODINGER/run will use the information returned from that function when a -HOST option is used. For example:

$SCHRODINGER/run script.py -HOST localhost will execute the main function under jobcontrol on the localhost by using the information returned from get_job_spec_from_args.

Ordinary script

For a script that executes normally (myscript.py), you only need to make sure that your script is importable as a module. In this example, myscript will simply print out the hostname that the script is running on to show that our script that will have different outputs on different machines.

import socket

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

$SCHRODINGER/run myscript.py will print out your local hostname.

Add jobcontrol API

If we want to execute our script under jobcontrol, locally or remotely, we need to add a function at the top level that jobcontrol can use as a job specification. This function must be called get_job_spec_from_args. Here, we’re registering stderr and stdout so that we can see the output of the script.

import socket
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    """
    Return a JobSpecification necessary to run this script on a remote
    machine (e.g. under job control with the launch.py script).

    :type argv: list(str)
    :param argv: The list of command line arguments, including the script name
    at [0], matching $SCHRODINGER/run __file__ sys.argv
    """
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    job_builder.setStderr("myscript.log")
    job_builder.setStdout("myscript.log")
    return job_builder.getJobSpec()

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

Assuming that myscript.py is in the distribution on your local and remote computers:

$SCHRODINGER/run myscript.py will print out your local hostname.

$SCHRODINGER/run myscript.py -HOST bolt_cpu will log the hostname of bolt compute node

Register input and output files

Files that are transferred from the launch machine to the compute machine need to be registered by job control. In this example, we have an input maestro file and an output maestro file.

import os
import sys
from schrodinger import structure
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    mae_file = argv[1]
    output_mae_file = os.path.basename(mae_file) + "processed.mae"
    job_builder.setInputFile(mae_file)
    job_builder.setOutputFile(output_mae_file)
    job_builder.setStderr("myscript.log")
    job_builder.setStdout("myscript.log")
    return job_builder.getJobSpec()

def main():
    output_file = os.path.basename(sys.argv[1]) + "processed.mae"
    with structure.StructureReader(sys.argv[1]) as reader:
        with structure.StructureWriter(output_file) as writer:
            for ct in reader:
                ct.title = ct.title + "processed"
                writer.append(ct)

if __name__ == "__main__":
    main()

Execute using: $SCHRODINGER/run myscript.py foo.mae -HOST localhost

Using a jobname

Some jobs use the concept of a jobname, which is specified through command line or maestro to to determine the names of log files for the job.

import socket
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, use_jobname_log=True)
    return job_builder.getJobSpec()

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

Execute using: $SCHRODINGER/run myscript.py -JOBNAME foo -HOST localhost

Maestro Incorporation

A single maestro file from a job can be marked for incorporation into maestro, meaning that those structures will show up in the project table.

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    job_builder.setOutputFile("foo.mae", incorporate=True)
    return job_builder.getJobSpec()

Using $SCHRODINGER/run -FROM <product>

Some scripts require $SCHRODINGER/run -FROM <product> to run. In this case, we mark this when we a create JobSpecification:

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, schrodinger_product="scisol")
    return job_builder.getJobSpec()

Integration into af2

af2 is the framework that Schrodinger uses to write GUIs. Implement getJobSpec() in panel to create a job spec. We assume we want to execute myscript.py that we wrote above.:

def getJobSpec(self):
   driver_path = 'myscript.py'
   cmd = [driver_path, self.input_selector.structFile()]
   return driver.get_job_spec_from_args(cmd)

Integration with an Argument Parser

An argument parser is useful when we want to document, validate, and access command line arguments within a script. It is easy to integrate an argument parser into a script that uses jobcontrol.

import argparse
import os
import sys

from schrodinger import structure
from schrodinger.job import launchapi
from schrodinger.utils import cmdline

def parse_args(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument("inputfile", help="maestro file input")
    args = parser.parse_args(argv)
    return args

def get_job_spec_from_args(argv):
    # first argument is this script
    args_namespace = parse_args(argv[1:])
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, use_jobname_log=True)
    job_builder.setInputFile(args_namespace.inputfile)
    jobname = os.path.splitext(os.path.basename(args_namespace.inputfile))[0]
    job_builder.setJobname(jobname)
    return job_builder.getJobSpec()

def main(*argv):
    args = parse_args(argv)
    with structure.StructureReader(args.inputfile) as reader:
        for ct in reader:
            print(f"ct title={ct.title})

if __name__ == '__main__':
    cmdline.main_wrapper(main, *sys.argv[1:])

See documentation of full set of options using in code documentation.

Introduction to JobDJ

The JobDJ class is used to write driver scripts for “distributed jobs”, which involve one or more subjobs independently carrying out different parts of a larger computation in parallel. JobDJ can submit individual jobs to a queueing system (like SLURM or UGE) or an explicit list of compute machines.

JobDJ is a workflow tool that makes it possible to run multiple, potentially simultaneous jobs. It manages launching and state of all subjobs. It also provides a mechanism to enforce dependencies between jobs.

This document will only describe the most common use case for JobDJ, which is to run a number of independent subjobs under job control.

Usage

Logically, there are two steps to running a distributed job with JobDJ:

  1. specify the list of hosts on which subjobs will be run (normally handled automatically),

  2. define the subjobs, by specifying the command to use to start each one, and,

  3. let JobDJ run the jobs.

The Host List

The host list is defined when you create a JobDJ instance. If your script is running under job control, JobDJ will automatically use the host list specified on the commandline, unless you override that with an explicit host list, specified as a list of (host, max_jobs) tuples, where max_jobs is the maximum number of jobs to run at a time on each host.

The Subjobs

Add jobs to the JobDJ instance using the addJob() method. Jobs are defined as instances of a BaseJob subclass, such as JobControlJob. If you’re running job control jobs, you can just specify the command to start the job and a JobControlJob instance will be created for you.

Running the Subjobs

Run all jobs defined on a JobDJ instance using its run() method. This will then block until all subjobs have completed. If you want to take some action whenever a subjob’s status changes, you can pass status_change_callback to the run() method.

Examples

Basic Usage

In the simplest case, a driver script running under jobcontrol just needs to define one or more subjobs and call the JobDJ object’s run() method. JobDJ will run the jobs on the hosts specified on the command line. For example, you might write a driver script driver.py for running a set of Jaguar jobs like this:

import schrodinger.job import jobcontrol
import schrodinger.job import queue

jobdj = queue.JobDJ()
for infile in inputfiles:
    cmd = [‘jaguar’, ‘run’, infile]
    jobdj.addJob(cmd)
jobdj.run()