schrodinger.test.stu.run module¶
Contains TestJCJob
, TestSPJob
, TestQueue
, and Runner
classes.
TestJCJob
is a test utility-specific subclass of queue.JobControlJob.
TestSPJob
is a test utility-specific subclass of queue.SubprocessJob. These
classes allow the test utility to track some information that we care about as
a job executes (for instance job duration and test_id). TestQueue
is a
subclass of queue.JobDJ, and allows further control on reporting.
Runner
controls all job running parameters. It is also responsible for
actually running the jobs and requesting their workups. The meat is in
Runner.__call__
.
@copyright: Schrodinger, Inc. All rights reserved.
@todo: Merge TestQueue
and Runner
(QA-604)
-
exception
schrodinger.test.stu.run.
JobDJError
[source]¶ Bases:
RuntimeError
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.test.stu.run.
TestJob
[source]¶ Bases:
object
Interface for the TestJCJob and TestSPJob classes
-
property
duration
¶
-
property
exit_status
¶ Returns an exit status from the list of available Jobcontrol exit statuses.
For TestJCJobs this comes from job.ExitStatus, which will return a Jobcontrol exit status (from Job.pm in MMSHARE_EXEC) if a job object (i.e. schrodinger.job.jobcontrol.job) is found. If a job object is not found, it could be because the job is not started or it failed to launch. These special cases are handled by setting the initial exit status to “Job not started” and resetting it to “Falied to launch” if the job fails to launch (see TestJCJob.docommand).
For TestSPJobs a we limit the available exit statuses to “finished”, “died”, and “killed” because there is no job object (i.e. schrodinger.job.jobcontrol.job) associated with TestSPJobs. Like TestJCJobs the initial exit status is set to “Job not started” and is reset it to “Failed to launch” if the job fails to launch (see TestSPJob.docommand).
-
property
test_id
¶
-
property
-
class
schrodinger.test.stu.run.
TestJCJob
(command, command_dir, test_id=None, test=None, timeout=None, runs_locally=False, **kwargs)[source]¶ Bases:
schrodinger.job.queue.JobControlJob
,schrodinger.test.stu.run.TestJob
- Like a normal JobControlJob, but::
Ignore failures to launch.
Be aware of scriptID.
Easily access job duration and exit status.
-
__init__
(command, command_dir, test_id=None, test=None, timeout=None, runs_locally=False, **kwargs)[source]¶ Overridden to add the test_id.
- Parameters
command (list) – Command to be run.
command_dir (str) – Directory to run in.
test_id (int or str) – Unique identifier of script.
test (TestScript object.) – The representation of all test data for the job that is being run.
timeout (int or None) – Duration in seconds after which to kill the job. None is never.
runs_locally (bool) – Should this job be launched on localhost (never remote hosts)?
-
doCommand
(host='localhost', *args, **kwargs)[source]¶ Overridden to ignore errors. Executes the command described by self._command. The parent class has two required arguments, but has the call signature
doCommand(*args, **kwargs)
, hence its usage here.
-
property
duration
¶ Duration of the job, according to the job record. Implemented as a property to provide consistent interface with
TestSPJob
. Also gives duration for RUNNING jobs, so as to be consistent withTestSPJob
.- Return type
int or None
-
property
exit_status
¶ Exit status of the job, according to the job record. Implemented as a property to provide consistent interface with
TestSPJob
.- Return type
str
-
addFinalizer
(function: Callable[[BaseJob], None], run_dir: str = None)¶ Add a function to be invoked when the job completes successfully.
See also the add_multi_job_finalizer function.
-
addGroupPrereq
(job: schrodinger.job.queue.BaseJob)¶ Make all jobs connected to
job
prerequisites of all jobs connected to this Job.
-
addPrereq
(job: schrodinger.job.queue.BaseJob)¶ Add a job that is an immediate prerequisite for this one.
-
cancel
()¶ Send kill request to jobcontrol managed job. This method will eventually deprecate JobControlJob.kill
-
cancelSubmitted
() → bool¶ If the job is still in the ‘submitted’ state, cancel it, purge the jobrecord and set the job handle to None.
Return True if this was successful, False otherwise.
-
debugStatus
(status)¶
-
finalize
()¶ Clean up after a job successfully runs.
-
genAllJobs
(seen: Set[BaseJob] = None) → Generator[schrodinger.job.queue.BaseJob, None, None]¶ A generator that yields all jobs connected to this one.
-
genAllPrereqs
(seen=None) → Generator[schrodinger.job.queue.BaseJob, None, None]¶ A generator that yields all jobs that are prerequisites on this one.
-
getCommand
() → List[str]¶ Return the command used to run this job.
-
getCommandDir
() → str¶ Return the launch/command directory name. If None is returned, the job will be launched in the current directory.
-
getDuration
() → Optional[int]¶ Return the duration of the Job as recorded by job server. The duration does not include queue wait time.
If the job is running or has not launched, returns None.
Note that this method makes a blocking call to the job server.
-
getJob
() → Optional[schrodinger.job.jobcontrol.Job]¶ Return the job record as a schrodinger.job.jobcontrol.Job instance.
Returns None if the job hasn’t been launched.
-
getJobDJ
() → schrodinger.job.queue.JobDJ¶ Return the JobDJ instance that this job has been added to.
-
getPrereqs
()¶ Return a set of all immediate prerequisites for this job.
-
getStatusStrings
() → Tuple[str, str, str]¶ Return a tuple of status strings for printing by
JobDJ
.The strings returned are (status, jobid, host).
-
hasExited
() → bool¶ Returns True if this job finished, successfully or not.
-
hasStarted
() → bool¶ Returns True if this job has started (not waiting)
-
infoStatus
(status)¶
-
init_count
= 0¶
-
isComplete
() → bool¶ Returns True if this job finished successfully
-
kill
()¶ Send kill request to jobcontrol managed job
-
maxFailuresReached
(msg: str)¶ Print an error summary, including the last 20 lines from each log file in the LogFiles list of the job record.
-
postCommand
()¶ A method to restore things to the pre-command state.
-
preCommand
()¶ A method to make pre-command changes, like cd’ing to the correct directory to run the command in.
-
retryFailure
(max_retries: int = 0) → bool¶ This method will be called when the job has failed, and JobDJ needs to know whether the job should be retried or not.
JobDJ’s value for the max_retries parameter is passed in, to be used when the job doesn’t have its own max_retries value.
Return True if this job should be retried, otherwise False.
-
run
(*args, **kwargs)¶ Run the job.
- The steps taken are as follows:
Execute the preCommand method for things like changing the working directory.
Call the doCommand to do the actual work of computation or job launching.
Call the postCommand method to undo the changes from the preCommand that need to be undone.
-
setup
()¶ A method to do initial setup; executed after
preCommand
, just beforedoCommand
.
-
setupTestEnvironment
(host='localhost')¶ Set environment variables used during testing.
SCHRODINGER_LICENSE_CHECKOUTS to a file in the directory: SHARED-2727 SCHRODINGER_STU_TEST_ID to current test. (SHARED-3352) Remove TOPLEVEL_HOST_ARGS for Jaguar. (SHARED-4089)
-
property
state
¶ Return the current state of the job.
Note that this method can be overridden by subclasses that wish to provide for restartability at a higher level than unpickling
BaseJob
instances. For example, by examining some external condition (e.g. presence of output files) the state JobState.DONE could be returned immediately and the job would not run.
-
property
test_id
¶
-
update
()¶ Checks for changes in job status, and updates the object appropriately (marks for restart, etc).
- Raises
RuntimeError – if an unknown Job Status or ExitStatus is encountered.
-
usesJobServer
() → bool¶ Detect, by looking at the jobId, whether this job uses a job server. Since the jobId is only set once, cache the answer (_uses_job_server) once it is established.
-
warnStatus
(status)¶
-
class
schrodinger.test.stu.run.
TestSPJob
(command, command_dir=None, test_id=None, test=None, timeout=None, **kwargs)[source]¶ Bases:
schrodinger.job.queue.SubprocessJob
,schrodinger.test.stu.run.TestJob
- Like a normal SubprocessJob job, but::
Ignore failures to launch.
Kill subjobs when killing this job.
Be aware of scriptID.
Access job duration and status.
-
__init__
(command, command_dir=None, test_id=None, test=None, timeout=None, **kwargs)[source]¶ Overridden to add the test_id. :type command: list :param command: Command to be run. :type command_dir: str :param command_dir: Directory to run in. :type test_id: int or str :param test_id: Unique identifier of script.
-
preCommand
(*args, **kwargs)[source]¶ Overridden to open standard files for recording standard error and standard out. Also marks the start time of the job. :rtype: None
-
doCommand
(*args, **kwargs)[source]¶ Overridden to ignore errors. Executes the command described by self._command.
-
getStatusStrings
()[source]¶ Return a tuple of status strings for printing by
JobDJ
.- Return type
tuple
- Returns
(status, jobid, host)
-
property
exit_status
¶ Exit status of the job, according to subprocess. Implemented as a property to provide consistent interface with
TestJCJob
.- Return type
str
-
addFinalizer
(function: Callable[[BaseJob], None], run_dir: str = None)¶ Add a function to be invoked when the job completes successfully.
See also the add_multi_job_finalizer function.
-
addGroupPrereq
(job: schrodinger.job.queue.BaseJob)¶ Make all jobs connected to
job
prerequisites of all jobs connected to this Job.
-
addPrereq
(job: schrodinger.job.queue.BaseJob)¶ Add a job that is an immediate prerequisite for this one.
-
cancel
()¶ Send termination request to subprocess managed job. This method will eventually deprecate SubprocessJob.kill
-
debugStatus
(status)¶
-
property
duration
¶
-
finalize
()¶ Clean up after a job successfully runs.
-
genAllJobs
(seen: Set[BaseJob] = None) → Generator[schrodinger.job.queue.BaseJob, None, None]¶ A generator that yields all jobs connected to this one.
-
genAllPrereqs
(seen=None) → Generator[schrodinger.job.queue.BaseJob, None, None]¶ A generator that yields all jobs that are prerequisites on this one.
-
getCommand
() → List[str]¶ Return the command used to run this job.
-
getCommandDir
() → str¶ Return the launch/command directory name. If None is returned, the job will be launched in the current directory.
-
getDuration
() → Optional[int]¶ Return the CPU time of the job in seconds.
If the job is still running, returns None.
-
getJobDJ
() → schrodinger.job.queue.JobDJ¶ Return the JobDJ instance that this job has been added to.
-
getPrereqs
()¶ Return a set of all immediate prerequisites for this job.
-
hasExited
() → bool¶ Returns True if this job finished, successfully or not.
-
hasStarted
() → bool¶ Returns True if this job has started (not waiting)
-
infoStatus
(status)¶
-
init_count
= 0¶
-
isComplete
() → bool¶ Returns True if this job finished successfully
-
maxFailuresReached
(msg: str)¶ This is a method that will be called after the job has failed and the maximum number of failures per
JobDJ
run has been reached. After invoking this method,JobDJ
will raise aRuntimeError
and the process will exit.
-
postCommand
()¶ A method to restore things to the pre-command state.
-
run
(*args, **kwargs)¶ Run the job.
- The steps taken are as follows:
Execute the preCommand method for things like changing the working directory.
Call the doCommand to do the actual work of computation or job launching.
Call the postCommand method to undo the changes from the preCommand that need to be undone.
-
runsLocally
() → bool¶ Return True if the job runs on the
JobDJ
control host, False if not. Jobs that run locally don’t need hosts.There is no limit on the number of locally run jobs.
-
setup
()¶ A method to do initial setup; executed after
preCommand
, just beforedoCommand
.
-
setupTestEnvironment
(host='localhost')¶ Set environment variables used during testing.
SCHRODINGER_LICENSE_CHECKOUTS to a file in the directory: SHARED-2727 SCHRODINGER_STU_TEST_ID to current test. (SHARED-3352) Remove TOPLEVEL_HOST_ARGS for Jaguar. (SHARED-4089)
-
property
state
¶ Return the current state of the job.
Note that this method can be overridden by subclasses that wish to provide for restartability at a higher level than unpickling
BaseJob
instances. For example, by examining some external condition (e.g. presence of output files) the state JobState.DONE could be returned immediately and the job would not run.
-
property
test_id
¶
-
warnStatus
(status)¶
-
class
schrodinger.test.stu.run.
TestQueue
(hosts=None, verbosity='quiet', timeout=None)[source]¶ Bases:
schrodinger.job.queue.JobDJ
Like a normal JobDJ, but:
Print the script ID at status points
Run workups
@todo: Workup should be run in here. @todo: Move Runner.addJob to TestQueue.addJob
-
printStatus
(job=None, action=None)[source]¶ Prints the status of
JobDJ
and the action/status for the job.If no job is specified, prints the status header.
If no action is specified, the
status_string
attribute of the job is used.
-
formatStatus
(job=None, action=None)[source]¶ Override to print script ID. :type job:
queue.BaseJob
:param job:queue.BaseJob
object of interest. :type action: str :param action: Status to be printed. :rtype: None
-
addJob
(job, add_connected=True, timeout=None, **kwargs)[source]¶ Add a job to run. If
job
is not aBaseJob
instance, aBaseJob
instance is constructed withjob
as the first argument. The defaultBaseJob
class for theJobDJ
instance can be specified in the constructor forJobDJ
.Additional keyword arguments are passed on to the job constructor.
All job prerequisites and dependencies need to be specified before adding a job to
JobDJ
.- Parameters
add_connected – If True, for jobs with dependencies only one job per connected group should be added and all connected jobs will be discovered and added automatically. If False, it is the user’s responsibility to make sure that any prerequisites of a job are also added.
-
property
active_jobs
¶
-
property
all_jobs
¶
-
disableSmartDistribution
()¶ Disable smart distribution of jobs.
Smart distribution allows subjobs to run on the machine that JobDJ is running on when JobDJ itself is running under a queuing system. This is usually desirable since the JobDJ process doesn’t generally consume significant computational resources and you don’t want to leave a queue slot mostly idle.
-
property
done_jobs
¶ Successfully completed jobs, sorted into the order they were marked as completed by JobDJ.
-
dump
(filename: pathlib.Path)¶ Pickle the
JobDJ
instance to the specified file name.
-
property
failed_jobs
¶
-
getActiveProcCounts
() → Dict[str, int]¶ Return a dictionary containing the number of active jobs on each host.
-
hasStarted
() → bool¶ Returns True if JobDJ has started already
-
isComplete
() → bool¶ Returns True if JobDJ has completed, False otherwise.
-
killJobs
()¶ Kill all active jobs
-
markForRestart
(job: schrodinger.job.queue.BaseJob, action: str)¶ Mark a job as dead, but make sure that it gets restarted.
- Parameters
action – Describes the reason the job is being restarted.
-
run
(*, status_change_callback: Optional[Callable[[schrodinger.job.queue.BaseJob], None]] = None, periodic_callback: Optional[Callable[[schrodinger.job.queue.BaseJob], None]] = None, callback_interval: int = 300, restart_failed: bool = True)¶ Call this method to run all jobs that have been added. The method will return control when all jobs have completed.
- Parameters
status_change_callback – A function to call every time a job status changes. For example, JobState.RUNNING->JobState.DONE. This function takes a single argument of a schrodinger.job.queue.BaseJob object.
periodic_callback – A command to call periodically, regardless of whether job status has changed or not. The function will be called without any arguments.
callback_interval – The interval at which the periodic interval will be called. This time is only approximately enforced and will depend on the timing delay settings (e.g. MONITOR_DELAY).
restart_failed – True (default) if previously failed jobs should be restarted, False if not.
-
setHostList
(host_list: List[Tuple[str, int]])¶ Define compute hosts to run subjobs on.
Active jobs are not affected by a change in the host list.
- Parameters
host_list – A list of (<host_entry_name>, <maximum_concurrent_subjobs>) tuples, where <host_entry_name> is a string and <maximum_concurrent_subjobs> is an integer.
-
property
total_active
¶ The number of jobs currently running.
-
property
total_added
¶ The number of individual jobs that have been added to the JobDJ instance.
-
property
total_failed
¶ The number of jobs that have failed.
-
property
total_finished
¶ The number of jobs that have finished successfully.
-
property
waiting_jobs
¶ Jobs waiting to be started.
-
class
schrodinger.test.stu.run.
Runner
(ui)[source]¶ Bases:
object
Runner controls all job running parameters within the backend test utility code. It is also responsible for actually running the jobs and requesting their workups. The meat is in
Runner.addScript
andRunner.__call__
.-
__init__
(ui)[source]¶ Initialize Runner Class
- Parameters
ui (
interface.UserInterface
) – Contains information about the user interface (i.e. the command line arguments)
-
tests
¶ keys off test_id, values are L{testscripts.TestScript}.
-
addScript
(test)[source]¶ Add test to be executed by self.__call__. Adds the test information to self.job_runner.
- Parameters
test (testscripts.TestScript) – Test to be executed.
test_id (str or int or NoneType) – Name of the test. Typically its test number. If None, Test ID is determined directly from the test.
- Return type
-
-
schrodinger.test.stu.run.
get_xvfb_cmd
()[source]¶ xvfb-run needs the -a (auto server number) option, except on CentOS 7 (and possibly other OSes) where that option is superceded by the -d (auto display) option. As a bonus, the long option –auto-display is shown in the help on CentOS 7 but does not actually work. Furthermore, xvfb-run does not have a –version flag which could be used to reason about supported options. To determine what flags are supported, try to run a no-op command with
xvfb-run -d
, and if that fails, use -a.