Package schrodinger :: Package pipeline :: Module stage :: Class Stage
[hide private]
[frames] | no frames]

Class Stage

    UserDict.UserDict --+    
                        |    
UserDict.IterableUserDict --+
                            |
                           Stage
Known Subclasses:

Instance Methods [hide private]
 
__init__(self, stagename, specs=None, allow_extra_keywords=False, cleanup=True, inpipeline=False)
This is the Stage class.
 
validateValues(self, preserve_errors=False)
Validates the stored keywords.
 
getVerbosity(self)
Return verbosity of thos stage (for JobDJ)
 
mainProduct(self)
If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string.
 
setMainProduct(self, product)
Specify which product this stage is part of.
 
addExpectedInput(self, position, type, required=True)
A stage can accept one or more pipeio input objects.
 
addExpectedOutput(self, position, type, always=True)
A stage can return one or more pipeio objects.
 
__getitem__(self, keyname)
Returns the value for specified keyword, or default value.
 
getStageDirectory(self)
Return the directory in which the stage is running
 
requiredProduct(self, product, min_release=None)
Specify a product that is required for this stage to run; optionally minimum version.
 
_minVersionPresent(self, product, min_release=None)
Internal: whether minimum version of product is installed.
 
checkProducts(self)
Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version.
 
getRuntimePath(self, filename)
Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.
 
requiredProductRuntime(self, product, min_release=None)
Similar to requiredProduct() but can be used to specify required products at runtime.
 
outputRequested(self, position)
Returns True if the user requested optional output at <position>
 
setOutput(self, position, obj)
Use this method at the end of operate() to set the output.
 
getOutput(self, position)
Returns the output IO object of the stage at specified position.
 
_inputCheck(self, position, input)
Makes sure that position is valid, that the specified input object is OK for the position.
 
__getstate__(self)
This method is called by the pickle module to return the state to be serialized by dump().
 
dump(self)
This method dumps all the variables of the Stage to a restart file.
 
checkFile(self, file, error='File does not exist:')
Raise exception if specified file does not exist.
 
checkFiles(self, files, error='File does not exist')
Raise expetion if any file does not exist.
 
log(self, *args)
Prints specified objects to the stage log file.
 
lognoret(self, *args)
Prints specified objects to the stage log file.
 
info(self, text)
Print an info line to the log file
 
debug(self, text)
Print a debug line to the log file
 
warning(self, text)
Print a warning line to the log file
 
error(self, text)
Print an error line to the log file
 
exit(self, text='')
Print an error line to the log file and exit with code 1
 
reportParameters(self, fh=None)
Print the value of each keyword for this stage to the stream specified as <fh>.
 
checkParameters(self)
OVERWRITE: Make sure that all parameters are valid.
 
hasStarted(self)
Returns True if this stage has started.
 
hasCompleted(self)
Returns True if this stage's operate() exited successfully.
 
checkInputs(self)
OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.
 
getName(self)
Return stagename (jobname of the stage)
 
getInputNames(self)
Return a dictionary of variable name of the inputs at each position.
 
setInput(self, position, name=None, obj=None)
Specify an input to use for this stage.
 
getInput(self, position)
Use in operate() to get the input object for specified position.
 
iterInputs(self)
Iterate through input objects: (position, obj)
 
setOutputName(self, position, varname)
Is called by Pipeline when starting the stage.
 
getOutputNames(self)
Return a list of output names for each position (dict)
 
getOutputName(self, position)
Return the output name for specified position
 
setJobOptions(self, subjob_hosts=None, njobs=None, adjust=None, force=None, subjob_local=None, max_retries=None, cleanup=None)
Tell this stage how to run the subjobs None for njobs means determine automatically
 
getHostList(self)
Returns a list of hosts to run the subjobs on.
 
getHostStr(self)
Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.
 
getLocal(self)
Whether subjobs should be run with -LOCAL.
 
getNJobs(self)
Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.
 
_calcStsPerJob(self, total_mol, njobs)
Determine the number of structure that should go into each subjob
 
_adjustNJobs(self, total_mol, min_job_size, max_job_size)
Adjusts the specified number of jobs so that the job size ranges between specificed min and max.
 
getAdjustedNJobs(self, total_mol, min_job_size, max_job_size)
Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option.
 
JobDJOptions(self)
Returns a dictionary of options to pass to JobDJ: hosts, local, max_retries, default_max_retries, verbosity
 
setJobDJOptions(self, jobdj)
Use this method to adjust the specified queue.JobDJ instance to the VSW settings.
 
getJobDJ(self, **kwargs)
Returns a pre-set JobDJ instance for the stage to use.
 
getMaxRetries(self)
Return the number of max restarts to use.
 
getNCpus(self)
Returns the total number of processors specified in the host string.
 
getCleanupRequested(self)
Stages should clean up after themselves if this returns True
 
_readMessageFromDriver(self)
Read the LAST message that the pipeline sent to this stage.
 
updateJobdj(self, jobdj)
Gets called periodically in order to update JobDJ's hosts.
 
runJobDJ(self, jobdj)
 
run(self, idle_function=None, restart_file=None, verbosity=None, logfh=None)
Run the stage.
 
addOutputFile(self, filename)
Adds the specified file to the stage's job control record.
 
operate(self)
OVERWRITE: Perform an operation on the input Objects.
 
genFileName(self, extension=None, filenum=None, start=None, end=None)
Generate a file name to be used by the stage.
 
genOutputFileName(self, position, extension='', filenum=None, start=None, end=None)
Generate a file name to be used by the stage when writing files for the output position <position>.
 
waitForFileStatus(self, file, sleeptime=0.2, timeout=300000)
If the file does not exist, returs False; otherwise waits for the file to finish writing and returs Frue.
 
waitForFiles(self, files, sleeptime=0.2, timeout=1000000)
Waits until the files are fully written.

Inherited from UserDict.IterableUserDict: __iter__

Inherited from UserDict.UserDict: __cmp__, __contains__, __delitem__, __len__, __repr__, __setitem__, clear, copy, get, has_key, items, iteritems, iterkeys, itervalues, keys, pop, popitem, setdefault, update, values

Class Methods [hide private]

Inherited from UserDict.UserDict: fromkeys

Class Variables [hide private]

Inherited from UserDict.UserDict: __hash__

Method Details [hide private]

__init__(self, stagename, specs=None, allow_extra_keywords=False, cleanup=True, inpipeline=False)
(Constructor)

 

This is the Stage class. Derive your own class from it.
  stagename - full name for this stage (<jobname>-<stagename>)
  specs - ConfigObj specification for the supported keywords
  allow_extra_keywords - Whether to allow keywords that are not in the specification.
  cleanup - Whether to remove intermediate files
  inpipeline - Whether the state is running within a Python Pipeline.
               If the stage is manually created, do NOT set this flag.
               Python Pipeline will set it as needed.

Overrides: UserDict.UserDict.__init__

validateValues(self, preserve_errors=False)

 

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

mainProduct(self)

 

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return "ligprep" Used by Pipeline.

setMainProduct(self, product)

 

Specify which product this stage is part of. Will determine which host the subjobs are run on.

addExpectedInput(self, position, type, required=True)

 

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(self, position, type, always=True)

 

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

__getitem__(self, keyname)
(Indexing operator)

 

Returns the value for specified keyword, or default value.
Use as follows:
  precision_value = stageobj['PRECISION']

Get values for keywords this way from stage's operate() to determine
how to run. If the user did not specify a value for the keyword,
default value is returned.

Raises KeyError if a keyword is not supported by the stage.

Overrides: UserDict.UserDict.__getitem__

requiredProduct(self, product, min_release=None)

 

Specify a product that is required for this stage to run; optionally minimum version. Example: product="mmshare", min_release=16103 Overwrites any previous min_release versions for this product.

checkProducts(self)

 

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

requiredProductRuntime(self, product, min_release=None)

 

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn't know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

getOutput(self, position)

 

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

_inputCheck(self, position, input)

 

Makes sure that position is valid, that the specified input object is OK for the position. If Input is required for this position, input.check() is run.

__getstate__(self)

 

This method is called by the pickle module to return the state to be serialized by dump().

Since Job objects and callable functions can not be pickled, it removes them from the a copy of the instance __dict__ attribute. _backend needs to be removed so that mmjobbe_terminate() would not be called more than once when the job exits.

dump(self)

 

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

checkFile(self, file, error='File does not exist:')

 

Raise exception if specified file does not exist. The message that is printed can be specified.

log(self, *args)

 

Prints specified objects to the stage log file. Obsolete. Use stage.info/debug/warning/error instead.

lognoret(self, *args)

 

Prints specified objects to the stage log file. No EOF return

reportParameters(self, fh=None)

 

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

getInputNames(self)

 

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

setInput(self, position, name=None, obj=None)

 

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

getInput(self, position)

 

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

setOutputName(self, position, varname)

 

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setJobOptions(self, subjob_hosts=None, njobs=None, adjust=None, force=None, subjob_local=None, max_retries=None, cleanup=None)

 

Tell this stage how to run the subjobs
None for njobs means determine automatically

subjob_hosts - list of hosts to run subjobs on
njobs - number of subjobs to generate
adjust - whether to adjust njobs such that job size is within limits
force - whether to continue with job if subjobs fail
max_retries - number of times to attempt to restart a subjob
              If not specified, use SCHRODINGER_MAX_RETRIES or 2.
cleanup - whether to delete intermediate files

getHostList(self)

 

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] NOTE: whether or not to run with -LOCAL should be up to the stage. Pass this value to JobDJ.

getLocal(self)

 

Whether subjobs should be run with -LOCAL. Pass this value to JobDJ

getNJobs(self)

 

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

_adjustNJobs(self, total_mol, min_job_size, max_job_size)

 

Adjusts the specified number of jobs so that the job size ranges between specificed min and max. Also calculates the number of ligands that will be in each subjob.

getAdjustedNJobs(self, total_mol, min_job_size, max_job_size)

 

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getJobDJ(self, **kwargs)

 

Returns a pre-set JobDJ instance for the stage to use. It already has it's hosts, local, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries(self)

 

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus(self)

 

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

_readMessageFromDriver(self)

 

Read the LAST message that the pipeline sent to this stage. This message will tell this stage how many processors to use.

updateJobdj(self, jobdj)

 

Gets called periodically in order to update JobDJ's hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

run(self, idle_function=None, restart_file=None, verbosity=None, logfh=None)

 

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: "quiet", "normal", and "verbose"
    "quiet" -   only warnings and errors are printed
    "normal" -  stage progress is printed - default
    "verbose" - additional debugging info is printed

logfh - where to send the loggin output

addOutputFile(self, filename)

 

Adds the specified file to the stage's job control record. File must be specified as local (not absolute) path.

operate(self)

 

OVERWRITE: Perform an operation on the input Objects. use self.setOutput(position, obj) to set output objects

genFileName(self, extension=None, filenum=None, start=None, end=None)

 

Generate a file name to be used by the stage.
Returns string: "<full-stagename>-<filenum><extension>"
            or: "<full-stagename>-<start>_<end><extension>"
            or: "<full-stagename><extension>"
            or: "<full-stagename>", etc.
Depending on given options.

genOutputFileName(self, position, extension='', filenum=None, start=None, end=None)

 

Generate a file name to be used by the stage when writing
files for the output position <position>.
Returns string: "<full-varname>-<filenum><extension>"
            or: "<full-varname>-<start>_<end><extension>"
            or: "<full-varname><extension>"
            or: "<full-varname>", etc.
Depending on given options.

waitForFileStatus(self, file, sleeptime=0.2, timeout=300000)

 

If the file does not exist, returs False; otherwise waits for the file to finish writing and returs Frue. Obsolete

waitForFiles(self, files, sleeptime=0.2, timeout=1000000)

 

Waits until the files are fully written. Default timeout 11 days. Obsolete.