Jobcontrol

Jobcontrol is a way to allow tasks to run asynchronously, and provides support for starting tasks on a different machine than their execution.

For example, you may launch a task from a laptop (running Maestro) to a compute node, so that the task runs on several cores. Jobcontrol takes care of transferring input files from your laptop to the cluster and collecting results and log files once the job is complete.

How to launch a job

Launching a job means running a command with -HOST <host entry argument>. A host entry is currently defined in schrodinger.hosts files.

Example:

$SCHRODINGER/ligprep -imae in.mae -omae out.mae

Running with no arguments runs on localhost. Adding -HOST bolt_cpu would submit the job to bolt.

Job Model

From the commandline perspective, a job consists of a short script that takes care of submitting the job, and will return with output of: JobId: <jobid>

If the command returns with a zero exit status and JobId, the job was successfully started. This should take seconds for a small job, or the time to negotiate start with the remote host. Then, the job is running in the background.

Running code under jobcontrol

Python scripts that run locally can be adapted to run remotely. jobcontrol will use launchapi if the script defines a function get_job_spec_from_args at the top level. $SCHRODINGER/run will use the information returned from that function when a -HOST option is used. For example:

$SCHRODINGER/run script.py -HOST localhost will execute the main function under jobcontrol on the localhost by using the information returned from get_job_spec_from_args.

For documentation of full set of options.

Ordinary script

For a script that executes normally (myscript.py), you only need to make sure that your script is importable as a module. In this example, myscript will simply print out the hostname that the script is running on to show that our script that will have different outputs on different machines.

import socket

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

$SCHRODINGER/run myscript.py will print out your local hostname.

Add jobcontrol API

If we want to execute our script under jobcontrol, locally or remotely, we need to add a function at the top level that jobcontrol can use as a job specification. This function must be called get_job_spec_from_args. Here, we’re registering stderr and stdout so that we can see the output of the script:

import socket
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    """
    Return a JobSpecification necessary to run this script on a remote
    machine (e.g. under job control with the launch.py script).

    :type argv: list(str)
    :param argv: The list of command line arguments, including the script name
    at [0], matching $SCHRODINGER/run __file__ sys.argv
    """
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    job_builder.setStderr(“myscript.log”)
    job_builder.setStdout(“myscript.log”)
    return job_builder.getJobSpec()

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

Assuming that myscript.py is in the distribution on your local and remote computers:

$SCHRODINGER/run myscript.py will print out your local hostname.

$SCHRODINGER/run myscript.py -HOST bolt_cpu will log the hostname of bolt compute node

Register input and output files

Files that are transferred from the launch machine to the compute machine need to be registered by job control. In this example, we have an input maestro file and an output maestro file.

import os
import sys
from schrodinger import structure
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    mae_file = argv[1]
    output_mae_file = os.path.basename(mae_file) + "processed.mae"
    job_builder.setInputFile(mae_file)
    job_builder.setOutputFile(output_mae_file)
    job_builder.setStderr("myscript.log")
    job_builder.setStdout("myscript.log")
    return job_builder.getJobSpec()

def main():
    output_file = os.path.basename(sys.argv[1]) + "processed.mae"
    with structure.StructureReader(sys.argv[1]) as reader:
        with structure.StructureWriter(output_file) as writer:
            for ct in reader:
                ct.title = ct.title + "processed"
                writer.append(ct)

if __name__ == "__main__":
    main()

Execute using: $SCHRODINGER/run myscript.py foo.mae -HOST localhost

Using a jobname

Some jobs use the concept of a jobname, which is specified through command line or maestro to to determine the names of log files for the job.

import socket
from schrodinger.job import launchapi

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, use_jobname_log=True)
    return job_builder.getJobSpec()

def main():
     print(socket.gethostname())

if __name__ == "__main__":
    main()

Execute using: $SCHRODINGER/run myscript.py -JOBNAME foo -HOST localhost

Maestro Incorporation

A single maestro file from a job can be marked for incorporation into maestro, meaning that those structures will show up in the project table.

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv)
    job_builder.setOutputFile("foo.mae", incorporate=True)
    return job_builder.getJobSpec()

Using $SCHRODINGER/run -FROM <product>

Some scripts require $SCHRODINGER/run -FROM <product> to run. In this case, we mark this when we a create JobSpecification:

def get_job_spec_from_args(argv):
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, schrodinger_product="scisol")
    return job_builder.getJobSpec()

Integration into af2

af2 is the framework that Schrodinger uses to write GUIs. Implement getJobSpec() in panel to create a job spec. We assume we want to execute myscript.py that we wrote above.:

def getJobSpec(self):
   driver_path = 'myscript.py'
   cmd = [driver_path, self.input_selector.structFile()]
   return driver.get_job_spec_from_args(cmd)

Integration with an Argument Parser

An argument parser is useful when we want to document, validate, and access command line arguments within a script. It is easy to integrate an argument parser into a script that uses jobcontrol.

import argparse
import os
import sys

from schrodinger import structure
from schrodinger.job import launchapi
from schrodinger.utils import cmdline

def parse_args(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument("inputfile", help="maestro file input")
    args = parser.parse_args(argv)
    return args

def get_job_spec_from_args(argv):
    # first argument is this script
    args_namespace = parse_args(argv[1:])
    job_builder = launchapi.JobSpecificationArgsBuilder(argv, use_jobname_log=True)
    job_builder.setInputFile(args_namespace.inputfile)
    jobname = os.path.splitext(os.path.basename(args_namespace.inputfile))[0]
    job_builder.setJobname(jobname)
    return job_builder.getJobSpec()

def main(*argv):
    args = parse_args(argv)
    with structure.StructureReader(args.inputfile) as reader:
        for ct in reader:
            print("ct title={}".format(ct.title))

if __name__ == '__main__':
    cmdline.main_wrapper(main, *sys.argv[1:])

See documentation of full set of options using in code documentation.