schrodinger.job.jobcontrol module¶
Core job control for python.
There are currently four major sections of this module - “Job database,” “Job launching,” “Job backend,” and “Job hosts.” The job database section deals with getting info about existing Jobs, the job launching section deals with starting up a subjob, and the job backend section provides utilities for a python script running as a job.
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.job.jobcontrol.
DisplayStatus
¶ Bases:
enum.Enum
An enumeration.
-
WAITING
= 'Waiting'¶
-
RUNNING
= 'Running'¶
-
CANCELED
= 'Canceled'¶
-
STOPPED
= 'Stopped'¶
-
FAILED
= 'Failed'¶
-
COMPLETED
= 'Completed'¶
-
-
schrodinger.job.jobcontrol.
jobhub
()¶
-
schrodinger.job.jobcontrol.
timestamp
(msg)¶
-
exception
schrodinger.job.jobcontrol.
JobcontrolException
¶ Bases:
Exception
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.job.jobcontrol.
JobLaunchFailure
¶ Bases:
schrodinger.job.jobcontrol.JobcontrolException
,RuntimeError
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.job.jobcontrol.
MissingFrontendException
¶ Bases:
schrodinger.job.jobcontrol.JobcontrolException
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.job.jobcontrol.
MissingHostsFileException
¶ Bases:
schrodinger.job.jobcontrol.JobcontrolException
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.job.jobcontrol.
UnreadableHostsFileException
¶ Bases:
schrodinger.job.jobcontrol.JobcontrolException
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.job.jobcontrol.
Job
(job_id, cpp_job=None)¶ Bases:
object
A class to access a specific record in the job database.
A Job instance is always a snapshot of the job record at a specific point in time. It is only updated when the
readAgain
method is explicitly invoked.-
__init__
(job_id, cpp_job=None)¶ Initialize a read-only Job object.
Parameters: job_id (str) – Unique identifier for a job
-
readAgain
()¶ Reread the database. Calling this routine is necessary to get fresh values.
-
isComplete
()¶ Returns True if the job is complete.
This method uses a native mmjob logic to determine if the job is complete or not.
-
isQueued
()¶ Returns True if the job a batch queue job.
-
succeeded
()¶ Returns False if the job was killed, died or fizzled. Returns True if ExitStatus is finished.
Raises an exception if the job isn’t completed, so use isComplete() before calling.
-
setStatusIncorporated
()¶ Set the status of the job to “incorporated” if the job has completed.
-
isStatusIncorporated
()¶ Check if job status is incorporated or not. We need to retrieve job status via jobhub because if JOB_SERVER feature flag is ON, then job’s incorporation status is not stored in the job record, but it is stored in the project.
@return: True if job is already incorporated @rtype: bool
-
isIncorporatable
()¶ Check if job is incorporatable or not. We need to retrieve job status via jobhub because if JOB_SERVER feature flag is ON, then job’s incorporation status is not stored in the job record, but it is stored in the project. If job is incorporated, then such job should not be incorporatable. Also if job disposition setting is ignore, then job is not incorporatable.
-
wait_before_kill
()¶
-
kill
()¶ Kill the job if it is running.
-
kill_for_smart_distribution
()¶ Kill the job for smart distribution if it is running.
-
wait
(max_interval=60, throw_on_failure=False)¶ Wait for the job to complete; sleeping up to ‘max_interval’ seconds between each database check. (Interval increase gradually from 2 sec up to the maximum.)
NOTE: Do not use if your program is running in Maestro, as this will make Maestro unresponsive while the job is running.
Parameters: throw_on_failure (bool) – whether to raise an exception if not succeeded Raises: RuntimeError – if the job did not succeed. The error message will contain the last 20 lines of the job’s logfile (if available).
-
download
()¶ Download the output of the job into job’s launch directory. No-op in legacy jobcontrol.
-
get
(attr, default=None)¶ This function will always raise an error, but is provided to guide users to a new syntax.
-
summary
()¶ Return a string summarizing all current Job attributes.
-
getDuration
()¶ Returns the wallclock running time of the job if it is complete. This does not include time is submission status. Returns time in seconds. If the job is not complete, returns None.
Return type: int or None
-
BatchId
¶ Return the batch id, if running on an HPC queueing system. Otherwise return None.
Return type: str or None
-
Dir
¶ Return the absolute path of the launch directory.
Return type: str
-
ExitCode
¶
-
Host
¶ Return the hostname of the host which launched this job.
Return type: str
-
HostEntry
¶ Return the name of the host entry this job was launched to.
Return type: str
-
LaunchTime
¶ Return a string timestamp for the time that the job was launched. This will before the job starts running, as soon as it is registered with jobcontrol as a job to be run.
Return type: str
-
JobId
¶ Return an identifier for a job.
Return type: str
-
Name
¶ Returns a string representing -JOBNAME that was specified on launch. This may be an empty string.
Return type: str
-
ParentJobId
¶ Return the jobid of a parent job. If the job does not have a parent, raise an AttributeError.
-
Processors
¶ For a batch job, returns the number of queue slots attached to this job. For a local job, return the number of CPU cores allowed to be used.
-
Program
¶ Return descriptive text for the name of the program running this job, e.g. Jaguar. This field is optional and may return an empty string.
Return type: str
-
Project
¶ Return the job’s project name field. This will be an empty string if no project is set.
Return type: str
-
QueueHost
¶ Return the hostname of the submission node of a HPC queueing system
Return type: str
-
StructureOutputFile
¶ Return the name of the file returned by the job that will get incorporated into a project of Maestro. Returns an empty string if no file is specified.
Return type: str
-
DisplayStatus
¶ Return a user-focused status that indicates the current state of the job.
Return type: DisplayStatus
-
StatusChangeReason
¶ Returns a human-readable reason that a job entered its current state, such as “job canceled by the user.” If the reason was not recorded or is not particularly interesting (e.g. normal transition from waiting to running) it may be the empty string.
Return type: str
-
Status
¶ Get the Status of the job.
Return type: str
-
StartTime
¶ Return a string for the starting time of the job. If the job has not started yet, for example if it is on the queue, it will return AttributeError
Return type: str
-
StopTime
¶ Return a string for the completion time of a job. If this is not finished yet, this will return an attribute error.
Return type: str
-
Viewname
¶ Return a representation of name used to filter jobs in maestro. May be empty.
Return type: str
-
ExitStatus
¶ Get the ExitStatus of the job.
-
JobDir
¶ Return the directory where the job is run. This will be an empty string if the job has not yet started.
Return type: str
-
JobHost
¶ Return the hostname where the job is run. This will be an empty string if the job has not yet started.
Return type: str
-
JobSchrodinger
¶ Get the JobSchrodinger of the job.
-
Envs
¶ Return a list of environment varaibles that are set by job, in addition to a default environment on a machine. in format [“SPECIAL_VAR=0”, “SPECIAL_VAR2=yes”]
:rtype list(str)
-
Errors
¶ Return possible error messages associated with a job. This will only return values in legacy jobcontrol.
Return type: list(str)
-
LogFiles
¶ Get list of log files associated with a log. May be an empty list.
Return type: list(str)
-
SubJobs
¶ Return list of subjob job ids.
Return type: list(str)
-
Commandline
¶
-
User
¶ Return the username of user who launched the job.
Return type: str
-
getApplicationHeaderFields
(default=None)¶ Returns: An OrderedDict of essential jobcontrol keyword:value pairs used to standardize application log files. Parameters: default (any) – Value assigned to a keyword if the corresponding attribute is not defined. Keywords include: ‘JobId’, ‘Name’, ‘Program’, ‘Host’, ‘Dir’, ‘HostEntry’, ‘Queue’, ‘JobHost’, ‘JobDir’, ‘JobSchrodinger’, ‘Commandline’, and ‘StartTime’.
-
getApplicationHeaderString
(field_sep=' : ')¶ Returns: A string of essential jobcontrol parameters, in a preferred order, with simple formatting. Parameters: field_sep (str) – String that delimits the keyword and value. Note: ‘Queue’ only appears in the header string if it is defined in the jobrecord. Example:
backend = schrodinger.job.jobcontrol.get_backend() if backend: print backend.getJob().getApplicationHeaderString()
-
getInputFiles
()¶
-
InputFiles
¶ Return list of files that will be transferred to the local job directory on launch.
Return type: list(str)
-
JobDB
¶
-
OrigLaunchDir
¶ Return the hostname of the oldest ancestor of this job.
-
OrigLaunchHost
¶ Return the launch directory of the oldest ancestor of this job.
-
getOutputFiles
()¶
-
OutputFiles
¶ Return a list of possible filenames that are generated as output for this job. This list can grow while the backend is running, as output files are registered by the backend.
Return type: list(str)
-
getProgressAsPercentage
()¶ Get the value of backend job progress in terms of percentage. Return 0.0 when a job is not yet in running state.
-
getProgressAsSteps
()¶ Get the value of backend job progress in terms of steps and totalsteps. Return (0,1) when a job is not yet in ‘running’ state.
-
getProgressAsString
()¶ Get the value of backend job progress in terms of descriptive text. Return “The job has not yet started.” when a job is not yet in running state.
-
purgeRecord
()¶ Purge the job record for the job from the database.
-
-
schrodinger.job.jobcontrol.
get_active_jobs
()¶ Returns list of jobs that are not completed.
Returns: list of Job objects
-
schrodinger.job.jobcontrol.
get_jobs_by_program
(program_name)¶ Find jobs with a specific Program attribute.
Parameters: program_name (str) – program name Return type: list(Job) Returns: The list of jobs in the jobdb that are associated with the given program. Each item of the list is a Job
object.
-
schrodinger.job.jobcontrol.
launch_job
(cmd, print_output=False, expandvars=True, launch_dir=None, timeout=None)¶ Run a process under job control and return a Job object. For a process to be under job control, it must print a valid JobId: line to stdout. If such a line isn’t printed, a RuntimeError will be raised.
The cmd argument should be a list of command arguments (including the executable) as expected by the subprocess module.
If the executable is present in $SCHRODINGER or $SCHRODINGER/utilities, an absolute path does not need to be specified.
NOTE: UI events will be processed while jlaunch is executing.
Parameters: - print_output (bool) – Determines if the output from jlaunch is printed to the terminal or not. Output will be logged (to stderr by default) if Python or JobControl debugging is turned on or if there is a launch failure, even if ‘print_output’ is False.
- expandvars (bool) – If True, any environment variables of the form
$var
or${var
} will be expanded with their values by theos.path.expandvars
function. - lauch_dir – Launch the job from the specified directory.
- timeout (float or None) – Timeout (in seconds) to be applied while waiting for the job control launch process to start or finish. This allows launch_job() to return sooner if the job is unable to launch. If None, the process will run without a timeout.
Raises: - RuntimeError – If there is a problem launching the job (e.g., no JobId gets printed). If running within Maestro, an error dialog will first be shown to the user.
- OSError – If launch_dir doesn’t exist.
-
schrodinger.job.jobcontrol.
prepend_schrodinger_run
(cmd)¶ Check if a command executes a Python script and prepend $SCHRODINGER/run to the command if it does not already begin with it.
Parameters: cmd (list(str)) – Command to prepend $SCHRODINGER/run to.
-
schrodinger.job.jobcontrol.
fix_cmd
(cmd, expandvars=True)¶ A function to clean up the command passed to launch_job.
Parameters: - cmd (list of strings) – A command in a form that can be passed to subprocess.Popen.
- expandvars (bool) – If True, any environment variables of the form
$var
or${var
} will be expanded with their values by theos.path.expandvars
function.
Returns: The command to be launched
-
schrodinger.job.jobcontrol.
list2jmonitorcmdline
(cmdlist)¶ Turn a command in list form to a single string that can be executed by jmonitor.
-
schrodinger.job.jobcontrol.
input_file_arguments
(job_spec, launch_parameters, write_output)¶ Return a set of file arguments (a list of (option, value) tuples) corresponding to the input files of a given job. If any of the input files are missing, raises an error.
-
schrodinger.job.jobcontrol.
file_arguments_for_launch_command
(file_args)¶ Given a set of “raw” file arguments, return the set of those to be used on an actual command line. If the given set is too long, the arguments will be written to an argfile. (It is the responsibility of the caller to remove that file after use.)
-
schrodinger.job.jobcontrol.
total_file_arguments_length
(args)¶ Determine the total length of the given set of file arguments (which is a list of 2-tuples) as they would be represented on the command line.
-
schrodinger.job.jobcontrol.
write_argfile
(file_args)¶ Write a set of file arguments to a temporary “argfile” (one option-value pair per line) and return the name of that file. (The caller is responsible for removing it.)
Parameters: file_args – A list of (option, value) tuples
-
schrodinger.job.jobcontrol.
launch_from_job_spec
(job_spec, launch_parameters, display_commandline=None)¶ Launch a job based on its specification.
Parameters: - job_spec (schrodinger.job.launchapi.JobSpecification) – Data defining the job.
- launch_parameters (schrodinger.job.launchparams.LaunchParameters) – Data defining how the job is run
- display_commandline (str) – commandline attribute of resulting job. Most cases will require this value to be specified, optional value to make it easier to refactor out in the future.
Returns: A schrodinger.job.jobcontrol.Job object.
-
schrodinger.job.jobcontrol.
get_backend
()¶ A convenience function to see if we’re running under job control. If so, return a _Backend object. Otherwise, return None.
-
schrodinger.job.jobcontrol.
get_runtime_path
(pathname)¶ Return the runtime path for the input file ‘pathname’.
If the pathname is of a type that job control will not copy to the job directory or no runtime file can be found, returns the original path name.
-
schrodinger.job.jobcontrol.
under_job_control
()¶ Returns True if this process is running under job control; False otherwise.
-
class
schrodinger.job.jobcontrol.
Host
(name)¶ Bases:
object
A class to encapsulate host info from the schrodinger.hosts file.
Use the module level functions get_host or get_hosts to create Host instances.
Variables: - name – Label for the Host.
- user – Username by which to run jobs.
- processors – Number of processors for the host/cluster.
- tmpdir – Temporary/scratch directory to use for jobs. List
- schrodinger – $SCHRODINGER installation to use for jobs.
- env – Variables to set in the job environment. List.
- gpgpu – GPGPU entries. List.
- queue – Queue entries only. Queue type (e.g., SGE, PBS).
- qargs – Queue entries only. Optional arguments passed to the queue submission command.
-
__init__
(name)¶ Create a named Host object. The various host attributes must be set after object instatiation.
Only host-entry fields can be public attributes of a Host object. Attributes introduced to capture other information about the entry must be private (named with a leading underscore.)
-
to_hostentry
()¶ Return a string representation of the Host object suitable for including in a hosts file.
-
getHost
()¶ Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.
-
setHost
(host)¶ Store host as _host to allow us to use a property for the ‘host’ attr.
-
host
¶ Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.
-
isQueue
()¶ Check to see whether the host represents a batch queue. Returns True if the host is a traditional queue or a grid host.
-
schrodinger.job.jobcontrol.
get_hostfile
()¶ Return the name of the schrodinger.hosts file last used by get_hosts(). The file is found using the standard search path ($SCHRODINGER_HOSTS, local dir, $HOME/.schrodinger, $SCHRODINGER).
-
schrodinger.job.jobcontrol.
hostfile_is_empty
(host_filepath)¶ Return if the given host_filepath host is empty, meaning it contains only the localhost entry. If the host_filepath str is empty or invalid, then this function will raise an invalid path exception - IOError.
Parameters: host_filepath (str) – schrodinger.hosts file to use. Returns: bool
-
schrodinger.job.jobcontrol.
get_installed_hostfiles
(root_dir='')¶ Return the pathname for the schrodinger.hosts file installed in the most recent previous installation directory we can find.
If a root pathname is passed in, previous installations are searched for there. Otherwise, we look in the standard install locations.
-
schrodinger.job.jobcontrol.
get_hosts
()¶ Return a list of all Hosts in the schrodinger.hosts file. After this is called, get_hostfile() will return the pathname for the schrodinger.hosts file that was used. Raises UnreadableHostsFileException or MissingHostsFileException on error.
-
schrodinger.job.jobcontrol.
hostfile_is_valid
(fname)¶ Parameters: fname (str) – The full path of the host file to validate Returns: a (bool, str) tuple indicating whether the host file is valid Return type: tuple
-
schrodinger.job.jobcontrol.
get_host
(name)¶ Return a Host object for the named host. If the host is not found, we return a Host object with the provided name and details that match localhost. This matches behavior that jobcontrol uses. Raises UnreadableHostsFileException or MissingHostsFileException on error.
-
schrodinger.job.jobcontrol.
host_str_to_list
(hosts_str)¶ Convert a hosts string (Ex: “galina:1 monica:4”) to a list of tuples. First value of each tuple is the host, second value is # of cpus.
-
schrodinger.job.jobcontrol.
host_list_to_str
(host_list)¶ Converts a hosts list (Ev: [ (‘host1’,1), (‘host2’, 10) ] ) to a string. Output example: “host1:1,host2:10”
-
schrodinger.job.jobcontrol.
get_command_line_host_list
()¶ Return a list of (host, ncpu) tuples corresponding to the host list that is specified on the command line.
This function is meant to be called by scripts that are running under a toplevel job control script but are not running under jlaunch.
- The host list is determined from the following sources:
- SCHRODINGER_NODELIST
- JOBHOST (if only a single host is specified)
- “localhost” (if no host is specified)
If no SCHRODINGER_NODELIST is present in the environment, None is returned.
-
schrodinger.job.jobcontrol.
get_backend_host_list
()¶ Return a list of (host, ncpu) tuples corresponding to the host list as determined from the SCHRODINGER_NODEFILE.
This function is meant to be called from scripts that are running under jlaunch (i.e. backend scripts).
Returns None if SCHRODINGER_NODEFILE is not present in the environment.
-
schrodinger.job.jobcontrol.
calculate_njobs
(host_list=None)¶ Derive the number of jobs from the specified host list. This function is useful to determine number of subjobs if user didn’t specified the ‘-NJOBS’ option.
Parameters: host_list (str or list(tuple)) – String of hosts along with optional number of subjobs -HOST my_cluster:20 or list of tuples of hosts, typically one element [(my_cluster, 20)] If host list is not specified then it uses get_command_line_host_list() to determine njobs, else uses the user provided host list.
-
schrodinger.job.jobcontrol.
is_valid_hostname
(hostname)¶ Checks if the hostname is valid.
Parameters: - hostname – host name
- type – string
-
schrodinger.job.jobcontrol.
get_jobname
(filename=None)¶ Figure out the jobname from the first available source: 1) the SCHRODINGER_JOBNAME environment variable (comes from -JOBNAME during startup); 2) the job control backend; 3) the basename of a given filename.
Parameters: filename (str) – if provided, and the jobname can’t otherwise be determined, (e.g., running outside job control with no -FILENAME argument), construct a jobname from its basename. Returns: jobname (may be None if filename was not provided) Return type: str
-
schrodinger.job.jobcontrol.
register_job_output
(job)¶ Registers the output and log files assocaited with the given job to the backend if running under jobcontrol.
Parameters: job (jobcontrol.Job) – job from which to extract output/log files