schrodinger.job.jobcontrol module

Core job control for python.

There are currently four major sections of this module - “Job database,” “Job launching,” “Job backend,” and “Job hosts.” The job database section deals with getting info about existing Jobs, the job launching section deals with starting up a subjob, and the job backend section provides utilities for a python script running as a job.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.job.jobcontrol.DisplayStatus

Bases: enum.Enum

An enumeration.

WAITING = 'Waiting'
RUNNING = 'Running'
CANCELED = 'Canceled'
STOPPED = 'Stopped'
FAILED = 'Failed'
COMPLETED = 'Completed'
schrodinger.job.jobcontrol.jobhub()
schrodinger.job.jobcontrol.timestamp(msg)
exception schrodinger.job.jobcontrol.JobcontrolException

Bases: Exception

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception schrodinger.job.jobcontrol.JobLaunchFailure

Bases: schrodinger.job.jobcontrol.JobcontrolException, RuntimeError

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception schrodinger.job.jobcontrol.MissingFrontendException

Bases: schrodinger.job.jobcontrol.JobcontrolException

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception schrodinger.job.jobcontrol.MissingHostsFileException

Bases: schrodinger.job.jobcontrol.JobcontrolException

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception schrodinger.job.jobcontrol.UnreadableHostsFileException

Bases: schrodinger.job.jobcontrol.JobcontrolException

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class schrodinger.job.jobcontrol.Job(job_id, cpp_job=None)

Bases: object

A class to access a specific record in the job database.

A Job instance is always a snapshot of the job record at a specific point in time. It is only updated when the readAgain method is explicitly invoked.

__init__(job_id, cpp_job=None)

Initialize a read-only Job object.

Parameters:job_id (str) – Unique identifier for a job
readAgain()

Reread the database. Calling this routine is necessary to get fresh values.

isComplete()

Returns True if the job is complete.

This method uses a native mmjob logic to determine if the job is complete or not.

isQueued()

Returns True if the job a batch queue job.

succeeded()

Returns False if the job was killed, died or fizzled. Returns True if ExitStatus is finished.

Raises an exception if the job isn’t completed, so use isComplete() before calling.

setStatusIncorporated()

Set the status of the job to “incorporated” if the job has completed.

isStatusIncorporated()

Check if job status is incorporated or not. We need to retrieve job status via jobhub because if JOB_SERVER feature flag is ON, then job’s incorporation status is not stored in the job record, but it is stored in the project.

@return: True if job is already incorporated @rtype: bool

isIncorporatable()

Check if job is incorporatable or not. We need to retrieve job status via jobhub because if JOB_SERVER feature flag is ON, then job’s incorporation status is not stored in the job record, but it is stored in the project. If job is incorporated, then such job should not be incorporatable. Also if job disposition setting is ignore, then job is not incorporatable.

wait_before_kill()
kill()

Kill the job if it is running.

kill_for_smart_distribution()

Kill the job for smart distribution if it is running.

wait(max_interval=60, throw_on_failure=False)

Wait for the job to complete; sleeping up to ‘max_interval’ seconds between each database check. (Interval increase gradually from 2 sec up to the maximum.)

NOTE: Do not use if your program is running in Maestro, as this will make Maestro unresponsive while the job is running.

Parameters:throw_on_failure (bool) – whether to raise an exception if not succeeded
Raises:RuntimeError – if the job did not succeed. The error message will contain the last 20 lines of the job’s logfile (if available).
download()

Download the output of the job into job’s launch directory. No-op in legacy jobcontrol.

get(attr, default=None)

This function will always raise an error, but is provided to guide users to a new syntax.

summary()

Return a string summarizing all current Job attributes.

getDuration()

Returns the wallclock running time of the job if it is complete. This does not include time is submission status. Returns time in seconds. If the job is not complete, returns None.

Return type:int or None
BatchId

Return the batch id, if running on an HPC queueing system. Otherwise return None.

Return type:str or None
Dir

Return the absolute path of the launch directory.

Return type:str
ExitCode
Host

Return the hostname of the host which launched this job.

Return type:str
HostEntry

Return the name of the host entry this job was launched to.

Return type:str
LaunchTime

Return a string timestamp for the time that the job was launched. This will before the job starts running, as soon as it is registered with jobcontrol as a job to be run.

Return type:str
JobId

Return an identifier for a job.

Return type:str
Name

Returns a string representing -JOBNAME that was specified on launch. This may be an empty string.

Return type:str
ParentJobId

Return the jobid of a parent job. If the job does not have a parent, raise an AttributeError.

Processors

For a batch job, returns the number of queue slots attached to this job. For a local job, return the number of CPU cores allowed to be used.

Program

Return descriptive text for the name of the program running this job, e.g. Jaguar. This field is optional and may return an empty string.

Return type:str
Project

Return the job’s project name field. This will be an empty string if no project is set.

Return type:str
QueueHost

Return the hostname of the submission node of a HPC queueing system

Return type:str
StructureOutputFile

Return the name of the file returned by the job that will get incorporated into a project of Maestro. Returns an empty string if no file is specified.

Return type:str
DisplayStatus

Return a user-focused status that indicates the current state of the job.

Return type:DisplayStatus
StatusChangeReason

Returns a human-readable reason that a job entered its current state, such as “job canceled by the user.” If the reason was not recorded or is not particularly interesting (e.g. normal transition from waiting to running) it may be the empty string.

Return type:str
Status

Get the Status of the job.

Return type:str
StartTime

Return a string for the starting time of the job. If the job has not started yet, for example if it is on the queue, it will return AttributeError

Return type:str
StopTime

Return a string for the completion time of a job. If this is not finished yet, this will return an attribute error.

Return type:str
Viewname

Return a representation of name used to filter jobs in maestro. May be empty.

Return type:str
ExitStatus

Get the ExitStatus of the job.

JobDir

Return the directory where the job is run. This will be an empty string if the job has not yet started.

Return type:str
JobHost

Return the hostname where the job is run. This will be an empty string if the job has not yet started.

Return type:str
JobSchrodinger

Get the JobSchrodinger of the job.

Envs

Return a list of environment varaibles that are set by job, in addition to a default environment on a machine. in format [“SPECIAL_VAR=0”, “SPECIAL_VAR2=yes”]

:rtype list(str)

Errors

Return possible error messages associated with a job. This will only return values in legacy jobcontrol.

Return type:list(str)
LogFiles

Get list of log files associated with a log. May be an empty list.

Return type:list(str)
SubJobs

Return list of subjob job ids.

Return type:list(str)
Commandline
User

Return the username of user who launched the job.

Return type:str
getApplicationHeaderFields(default=None)
Returns:An OrderedDict of essential jobcontrol keyword:value pairs used to standardize application log files.
Parameters:default (any) – Value assigned to a keyword if the corresponding attribute is not defined.

Keywords include: ‘JobId’, ‘Name’, ‘Program’, ‘Host’, ‘Dir’, ‘HostEntry’, ‘Queue’, ‘JobHost’, ‘JobDir’, ‘JobSchrodinger’, ‘Commandline’, and ‘StartTime’.

getApplicationHeaderString(field_sep=' : ')
Returns:A string of essential jobcontrol parameters, in a preferred order, with simple formatting.
Parameters:field_sep (str) – String that delimits the keyword and value.
Note:‘Queue’ only appears in the header string if it is defined in the jobrecord.

Example:

backend = schrodinger.job.jobcontrol.get_backend()
if backend:
    print backend.getJob().getApplicationHeaderString()
getInputFiles()
InputFiles

Return list of files that will be transferred to the local job directory on launch.

Return type:list(str)
JobDB
OrigLaunchDir

Return the hostname of the oldest ancestor of this job.

OrigLaunchHost

Return the launch directory of the oldest ancestor of this job.

getOutputFiles()
OutputFiles

Return a list of possible filenames that are generated as output for this job. This list can grow while the backend is running, as output files are registered by the backend.

Return type:list(str)
getProgressAsPercentage()

Get the value of backend job progress in terms of percentage. Return 0.0 when a job is not yet in running state.

getProgressAsSteps()

Get the value of backend job progress in terms of steps and totalsteps. Return (0,1) when a job is not yet in ‘running’ state.

getProgressAsString()

Get the value of backend job progress in terms of descriptive text. Return “The job has not yet started.” when a job is not yet in running state.

purgeRecord()

Purge the job record for the job from the database.

schrodinger.job.jobcontrol.get_active_jobs()

Returns list of jobs that are not completed.

Returns:list of Job objects
schrodinger.job.jobcontrol.get_jobs_by_program(program_name)

Find jobs with a specific Program attribute.

Parameters:program_name (str) – program name
Return type:list(Job)
Returns:The list of jobs in the jobdb that are associated with the given program. Each item of the list is a Job object.
schrodinger.job.jobcontrol.launch_job(cmd, print_output=False, expandvars=True, launch_dir=None, timeout=None)

Run a process under job control and return a Job object. For a process to be under job control, it must print a valid JobId: line to stdout. If such a line isn’t printed, a RuntimeError will be raised.

The cmd argument should be a list of command arguments (including the executable) as expected by the subprocess module.

If the executable is present in $SCHRODINGER or $SCHRODINGER/utilities, an absolute path does not need to be specified.

NOTE: UI events will be processed while jlaunch is executing.

Parameters:
  • print_output (bool) – Determines if the output from jlaunch is printed to the terminal or not. Output will be logged (to stderr by default) if Python or JobControl debugging is turned on or if there is a launch failure, even if ‘print_output’ is False.
  • expandvars (bool) – If True, any environment variables of the form $var or ${var} will be expanded with their values by the os.path.expandvars function.
  • lauch_dir – Launch the job from the specified directory.
  • timeout (float or None) – Timeout (in seconds) to be applied while waiting for the job control launch process to start or finish. This allows launch_job() to return sooner if the job is unable to launch. If None, the process will run without a timeout.
Raises:
  • RuntimeError – If there is a problem launching the job (e.g., no JobId gets printed). If running within Maestro, an error dialog will first be shown to the user.
  • OSError – If launch_dir doesn’t exist.
schrodinger.job.jobcontrol.prepend_schrodinger_run(cmd)

Check if a command executes a Python script and prepend $SCHRODINGER/run to the command if it does not already begin with it.

Parameters:cmd (list(str)) – Command to prepend $SCHRODINGER/run to.
schrodinger.job.jobcontrol.fix_cmd(cmd, expandvars=True)

A function to clean up the command passed to launch_job.

Parameters:
  • cmd (list of strings) – A command in a form that can be passed to subprocess.Popen.
  • expandvars (bool) – If True, any environment variables of the form $var or ${var} will be expanded with their values by the os.path.expandvars function.
Returns:

The command to be launched

schrodinger.job.jobcontrol.list2jmonitorcmdline(cmdlist)

Turn a command in list form to a single string that can be executed by jmonitor.

schrodinger.job.jobcontrol.input_file_arguments(job_spec, launch_parameters, write_output)

Return a set of file arguments (a list of (option, value) tuples) corresponding to the input files of a given job. If any of the input files are missing, raises an error.

schrodinger.job.jobcontrol.file_arguments_for_launch_command(file_args)

Given a set of “raw” file arguments, return the set of those to be used on an actual command line. If the given set is too long, the arguments will be written to an argfile. (It is the responsibility of the caller to remove that file after use.)

schrodinger.job.jobcontrol.total_file_arguments_length(args)

Determine the total length of the given set of file arguments (which is a list of 2-tuples) as they would be represented on the command line.

schrodinger.job.jobcontrol.write_argfile(file_args)

Write a set of file arguments to a temporary “argfile” (one option-value pair per line) and return the name of that file. (The caller is responsible for removing it.)

Parameters:file_args – A list of (option, value) tuples
schrodinger.job.jobcontrol.launch_from_job_spec(job_spec, launch_parameters, display_commandline=None)

Launch a job based on its specification.

Parameters:
Returns:

A schrodinger.job.jobcontrol.Job object.

schrodinger.job.jobcontrol.get_backend()

A convenience function to see if we’re running under job control. If so, return a _Backend object. Otherwise, return None.

schrodinger.job.jobcontrol.get_runtime_path(pathname)

Return the runtime path for the input file ‘pathname’.

If the pathname is of a type that job control will not copy to the job directory or no runtime file can be found, returns the original path name.

schrodinger.job.jobcontrol.under_job_control()

Returns True if this process is running under job control; False otherwise.

class schrodinger.job.jobcontrol.Host(name)

Bases: object

A class to encapsulate host info from the schrodinger.hosts file.

Use the module level functions get_host or get_hosts to create Host instances.

Variables:
  • name – Label for the Host.
  • user – Username by which to run jobs.
  • processors – Number of processors for the host/cluster.
  • tmpdir – Temporary/scratch directory to use for jobs. List
  • schrodinger – $SCHRODINGER installation to use for jobs.
  • env – Variables to set in the job environment. List.
  • gpgpu – GPGPU entries. List.
  • queue – Queue entries only. Queue type (e.g., SGE, PBS).
  • qargs – Queue entries only. Optional arguments passed to the queue submission command.
__init__(name)

Create a named Host object. The various host attributes must be set after object instatiation.

Only host-entry fields can be public attributes of a Host object. Attributes introduced to capture other information about the entry must be private (named with a leading underscore.)

to_hostentry()

Return a string representation of the Host object suitable for including in a hosts file.

getHost()

Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.

setHost(host)

Store host as _host to allow us to use a property for the ‘host’ attr.

host

Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.

isQueue()

Check to see whether the host represents a batch queue. Returns True if the host is a traditional queue or a grid host.

schrodinger.job.jobcontrol.get_hostfile()

Return the name of the schrodinger.hosts file last used by get_hosts(). The file is found using the standard search path ($SCHRODINGER_HOSTS, local dir, $HOME/.schrodinger, $SCHRODINGER).

schrodinger.job.jobcontrol.hostfile_is_empty(host_filepath)

Return if the given host_filepath host is empty, meaning it contains only the localhost entry. If the host_filepath str is empty or invalid, then this function will raise an invalid path exception - IOError.

Parameters:host_filepath (str) – schrodinger.hosts file to use.
Returns:bool
schrodinger.job.jobcontrol.get_installed_hostfiles(root_dir='')

Return the pathname for the schrodinger.hosts file installed in the most recent previous installation directory we can find.

If a root pathname is passed in, previous installations are searched for there. Otherwise, we look in the standard install locations.

schrodinger.job.jobcontrol.get_hosts()

Return a list of all Hosts in the schrodinger.hosts file. After this is called, get_hostfile() will return the pathname for the schrodinger.hosts file that was used. Raises UnreadableHostsFileException or MissingHostsFileException on error.

schrodinger.job.jobcontrol.hostfile_is_valid(fname)
Parameters:fname (str) – The full path of the host file to validate
Returns:a (bool, str) tuple indicating whether the host file is valid
Return type:tuple
schrodinger.job.jobcontrol.get_host(name)

Return a Host object for the named host. If the host is not found, we return a Host object with the provided name and details that match localhost. This matches behavior that jobcontrol uses. Raises UnreadableHostsFileException or MissingHostsFileException on error.

schrodinger.job.jobcontrol.host_str_to_list(hosts_str)

Convert a hosts string (Ex: “galina:1 monica:4”) to a list of tuples. First value of each tuple is the host, second value is # of cpus.

schrodinger.job.jobcontrol.host_list_to_str(host_list)

Converts a hosts list (Ev: [ (‘host1’,1), (‘host2’, 10) ] ) to a string. Output example: “host1:1,host2:10”

schrodinger.job.jobcontrol.get_command_line_host_list()

Return a list of (host, ncpu) tuples corresponding to the host list that is specified on the command line.

This function is meant to be called by scripts that are running under a toplevel job control script but are not running under jlaunch.

The host list is determined from the following sources:
  1. SCHRODINGER_NODELIST
  2. JOBHOST (if only a single host is specified)
  3. “localhost” (if no host is specified)

If no SCHRODINGER_NODELIST is present in the environment, None is returned.

schrodinger.job.jobcontrol.get_backend_host_list()

Return a list of (host, ncpu) tuples corresponding to the host list as determined from the SCHRODINGER_NODEFILE.

This function is meant to be called from scripts that are running under jlaunch (i.e. backend scripts).

Returns None if SCHRODINGER_NODEFILE is not present in the environment.

schrodinger.job.jobcontrol.calculate_njobs(host_list=None)

Derive the number of jobs from the specified host list. This function is useful to determine number of subjobs if user didn’t specified the ‘-NJOBS’ option.

Parameters:host_list (str or list(tuple)) – String of hosts along with optional number of subjobs -HOST my_cluster:20 or list of tuples of hosts, typically one element [(my_cluster, 20)]

If host list is not specified then it uses get_command_line_host_list() to determine njobs, else uses the user provided host list.

schrodinger.job.jobcontrol.is_valid_hostname(hostname)

Checks if the hostname is valid.

Parameters:
  • hostname – host name
  • type – string
schrodinger.job.jobcontrol.get_jobname(filename=None)

Figure out the jobname from the first available source: 1) the SCHRODINGER_JOBNAME environment variable (comes from -JOBNAME during startup); 2) the job control backend; 3) the basename of a given filename.

Parameters:filename (str) – if provided, and the jobname can’t otherwise be determined, (e.g., running outside job control with no -FILENAME argument), construct a jobname from its basename.
Returns:jobname (may be None if filename was not provided)
Return type:str
schrodinger.job.jobcontrol.register_job_output(job)

Registers the output and log files assocaited with the given job to the backend if running under jobcontrol.

Parameters:job (jobcontrol.Job) – job from which to extract output/log files