schrodinger.pipeline.stages.filtering module

Stages related to filtering structures.

Main stage uses $SCHRODINGER/utilities/ligfilter.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.pipeline.stages.filtering.LigFilterStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Stage interface for the LigFilter utility.

The keywords specific to this stage are…

FILTER_FILE File name for a Ligfilter criteria file.

CONDITIONS String list of Ligfilter criteria. Ignored

if a FILTER_FILE is specified.

The stage takes one set of input structure files and generates one set of corresponding output files.

__init__(*args, **kwargs)[source]

See class docstring.

setupJobs()[source]

Sets up LigFilter jobs, which are distributed via JobDJ. There will be one Ligfilter job for each input file. This method clears the working directory of previous output and log files (if any). Raises a RuntimeError if there is no FILTER_FILE and no CONDITIONS.

processJobOutputs()[source]

Reports the number of structures that passed the stage. Raises a RuntimeError if any Ligfilter .log file is missing the summary information.

operate()[source]

Perform an operation on the input files. There are setup, running, and post-processing steps, and the stage records its current status so that it can be restarted in that step if there is a failure. Raises a RuntimeError if the JobDJ run() method fails, or if the stage finishes with an improper status.

JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file

class schrodinger.pipeline.stages.filtering.SubSetStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Stage for making a subset of the input files based on a list of titles or other property.

The keywords specific to this stage are:

PROPERTY Property to filter on (default s_m_title) VALUE_FILE File containing a list of values to keep (one per line) if a FILTER_FILE is specified.

The stage takes one set of input structure files and generates one set of corresponding output files.

NOTE: SMILES format files can only be filtered on the title.

__init__(*args, **kwargs)[source]

See class docstring.

filterSmilesFile(ligfile, outfile)[source]

Filter the file <ligfile> by title based on VALUE_FILE and add the output file to self.output_files list

filterFile(ligfile, outfile)[source]

Filter the file <ligfile> based on PROPERTY & VALUE_FILE and add the output file to self.output_files list

readValueList()[source]
operate()[source]

Perform the filtering operation on the input files.

JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file

class schrodinger.pipeline.stages.filtering.DrugLikeSplitStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Stage for splitting the input ligand structures into 3 groups:

Drug-like ligands: -dl.maegz Course ligands: -co.maegz Left-over ligands: -lo.maegz

Based on the specified criteria.

If at least one variant of a root matches all criteria for a drug-like ligand, then all variants of that root are included in the dl.maegz output file.

If none match, then if at least one variant matches all criteria for coarse ligand, then all variants are included in the co.maegz file.

Otherwise all variants are included in the lo.maegz file.

All variants for the same root must be listed in blocks (next to each other) in the input file and have the same title.

Also labels variants by setting s_vsw_variant field to <title>-# Where # is a variant number (1 to n).

__init__(*args, **kwargs)[source]

This is the Stage class. Derive your own class from it.

Parameters
  • stagename – full name for this stage (<jobname>-<stagename>)

  • specs – ConfigObj specification for the supported keywords

  • allow_extra_keywords – Whether to allow keywords that are not in the specification.

  • cleanup – Whether to remove intermediate files

  • inpipeline – Whether the state is running within a Python Pipeline. If the stage is manually created, do NOT set this flag. Python Pipeline will set it as needed.

  • driver_dir – Directory in which the driver is running.

operate()[source]

Perform an operation on the input files.

qualify(st)[source]

Returns module constant classification for the structure: Drug-like, coarse, etc.

writeRoot(root_sts, current_root, current_root_qualification)[source]
JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file

class schrodinger.pipeline.stages.filtering.MergeDuplicatesStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Stage for removing duplicate variants within each root (compound) based on unique SMILES strings.

NOTE: All variants of each root must be in consecutive order in the input.

__init__(*args, **kwargs)[source]

This is the Stage class. Derive your own class from it.

Parameters
  • stagename – full name for this stage (<jobname>-<stagename>)

  • specs – ConfigObj specification for the supported keywords

  • allow_extra_keywords – Whether to allow keywords that are not in the specification.

  • cleanup – Whether to remove intermediate files

  • inpipeline – Whether the state is running within a Python Pipeline. If the stage is manually created, do NOT set this flag. Python Pipeline will set it as needed.

  • driver_dir – Directory in which the driver is running.

operate()[source]

Perform an operation on the input files.

JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file

class schrodinger.pipeline.stages.filtering.GeneratePropsStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Ev:85070

For each structure in the input set, runs: $SCHRODINGER/run generate_ligfilter_properties.py, which generates properties specified via the ligfilter.predefined_function_dict.

__init__(*args, **kwargs)[source]

See class docstring.

setupJobs()[source]

Sets up subjobs, which are distributed via JobDJ. There will be one subjob per each input file.

processJobOutputs()[source]

Analyzes the output files from the subjobs. Raises a RuntimeError if there are any errors with the subjobs.

operate()[source]

Perform an operation on the input files. There are setup, running, and post-processing steps, and the stage records its current status so that it can be restarted in that step if there is a failure. Raises a RuntimeError if the JobDJ run() method fails, or if the stage finishes with an improper status.

JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file

class schrodinger.pipeline.stages.filtering.ChargeFilterStage(*args, **kwargs)[source]

Bases: schrodinger.pipeline.stage.Stage

Stage-based class for filtering a set of structure files by total charge.

MIN_CHARGE and MAX_CHARGE are the two keywords specific to this stage. If a structure has a total charge within [MIN_CHARGE,MAX_CHARGE] (inclusive), it is retained; otherwise, the structure is filtered out.

The stage takes one input structure file set and generates one set of corresponding output structure files.

__init__(*args, **kwargs)[source]

See class docstring.

operate()[source]

Read all the structures in the input files. If a structure’s total charge is between MIN_CHARGE and MAX_CHARGE, write it to a the corresponding output file. Raises an IOError if there is a problem reading an input file or writing an output file, and raises a SystemExit if there are no output structures.

JobDJOptions()

Returns a dictionary of options to pass to JobDJ: hosts, max_retries, default_max_retries, verbosity

__contains__(key)
__len__()
addExpectedInput(position, type, required=True)

A stage can accept one or more pipeio input objects. Use this method to specify the type of input object that is expected at each position.

position - an integer starting at 1. type - structures/grids/etc. required - whether this input always needs to be specified

addExpectedOutput(position, type, always=True)

A stage can return one or more pipeio objects. Use this method to specify the type of object that will be returned and whether or not it will always be produced by the stage.

position - an integer starting at 1. type - structures/grids/etc. always - whether this output is always produced

addOutputFile(filename)

Adds the specified file to the stage’s job control record. File must be specified as local (not absolute) path.

checkFile(file, error='File does not exist:')

Raise exception if specified file does not exist. The message that is printed can be specified.

checkFiles(files, error='File does not exist')

Raise expetion if any file does not exist.

checkInputs()

OVERWRITE: Return False if something is wrong with the input files or the parameter, otherwise return True.

checkParameters()

OVERWRITE: Make sure that all parameters are valid.

checkProducts()

Raises RuntimeError if any of the required products are not installed or the version installed is less that minimum required version. It is possible to override this method. See ligprep.py for example.

clear() → None. Remove all items from D.
copy()
debug(text)

Print a debug line to the log file

dump()

This method dumps all the variables of the Stage to a restart file. Run it every time an important step is performed.

error(text)

Print an error line to the log file

exit(text='')

Print an error line to the log file and exit with code 1

classmethod fromkeys(iterable, value=None)
genFileName(extension=None, filenum=None, start=None, end=None)

Generate a file name to be used by the stage. Returns string:

"<full-stagename>-<filenum><extension>"
"<full-stagename>-<start>_<end><extension>"
"<full-stagename><extension>"
"<full-stagename>", etc.

Depending on given options.

genOutputFileName(position, extension='', filenum=None, start=None, end=None)

Generate a file name to be used by the stage when writing files for the output position <position>. Returns strings:

"<full-varname>-<filenum><extension>"
"<full-varname>-<start>_<end><extension>"
"<full-varname><extension>"
"<full-varname>", etc.

Depending on given options.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getAdjustedNJobs(total_mol, min_job_size, max_job_size)

Returns the desired number of subjobs, and adjusts it for the the specified min & max job sizes if the user specified ADJUST option. If the number of desired jobs was specified by the user, the number of available cpus is used or 10, whichever is smaller. Specify the number of input ligands and the smallest and largest desired job sizes (Generally job lengths of 1 minute & 24 hours).

getCleanupRequested()

Stages should clean up after themselves if this returns True

getHostList()

Returns a list of hosts to run the subjobs on. localhost:1 may be in the list as well. Ideally, pass the output to JobDJ. Format: [ (host1,ncpus), (host2,ncpus) ] Pass this value to JobDJ.

getHostStr()

Just like getHostList() but instead of returning a list, returns a host string to be passed to the -HOST argument.

getInput(position)

Use in operate() to get the input object for specified position. Returns None if invalid position is specified.

getInputNames()

Return a dictionary of variable name of the inputs at each position. Key:position, value:name

getJobDJ(**kwargs)

Returns a pre-set JobDJ instance for the stage to use. It already has it’s hosts, max_retries, max_failures, default_max_retries, and verbosity set.

getMaxRetries()

Return the number of max restarts to use. If -max_retries is specified, returns that value; otherwise if SCHRODINGER_MAX_RETRIES is defined, returns that value; otherwise returns default of 2. Pass this value to JobDJ.

getNCpus()

Returns the total number of processors specified in the host string. For queued hosts with no CPU# specification, 10 is added.

getNJobs()

Returns the requested target number of subjobs, and whether or not to adjust that number if it is unreasonable.

If -NJOBS was not specified, the # of CPUs or 10 is returned (whichever is smaller).

Used by Glide DockingStage and _adjustNJobs()

getName()

Return stagename (jobname of the stage)

getOutput(position)

Returns the output IO object of the stage at specified position. Use this method after running the stage to get its output objects

getOutputName(position)

Return the output name for specified position

getOutputNames()

Return a list of output names for each position (dict)

getRuntimePath(filename)

Return the runtime-path of a file that user specified Prints an error and exits if file does not exist.

getStageDirectory()

Return the directory in which the stage is running

getVerbosity()

Return verbosity of thos stage (for JobDJ)

hasCompleted()

Returns True if this stage’s operate() exited successfully.

hasStarted()

Returns True if this stage has started.

info(text)

Print an info line to the log file

initNonPersistent(pipeline)

Gets called after the Stage joins pipeline. Meant to be used to initialize non-persistent context.

items() → a set-like object providing a view on D’s items
iterInputs()

Iterate through input objects: (position, obj)

keys() → a set-like object providing a view on D’s keys
lognoret(*args)

Prints specified objects to the stage log file. No EOF return

mainProduct()

If a stage has a main product associated with it, the stage should overwrite this method with a method that returns the product string. For example, the LigPrepStage.mainProduct() will return “ligprep” Used by Pipeline.

outputRequested(position)

Returns True if the user requested optional output at <position>

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

reportParameters(fh=None)

Print the value of each keyword for this stage to the stream specified as <fh>. Used by Pipeline

requiredProduct(product)

Specify a product that is required for this stage to run; optionally minimum version.

Example: product=”mmshare”

requiredProductRuntime(product)

Similar to requiredProduct() but can be used to specify required products at runtime. For example, ConvertStage doesn’t know what products are required for conversion until runtime. Raises RuntimeError if product is not installed.

run(idle_function=None, restart_file=None, verbosity=None, logfh=None)

Run the stage.

idle_function - function to call when idle

restart_file - file to periodically dump this instance to

verbosity - there are three verbosity levels: “quiet”, “normal”, and “verbose”

“quiet” - only warnings and errors are printed “normal” - stage progress is printed - default “verbose” - additional debugging info is printed

logfh - where to send the loggin output

runJobDJ(jobdj)
setInput(position, name=None, obj=None)

Specify an input to use for this stage. position - input specified is for this position name - Variable name of this IO object obj - the IO object

This method is called by Pipeline.

setJobDJOptions(jobdj)

Use this method to adjust the specified queue.JobDJ instance to the VSW settings.

setJobOptions(subjob_hosts=None, njobs=None, adjust=None, force=None, max_retries=None, cleanup=None)

Tell this stage how to run the subjobs

Parameters
  • subjob_hosts – list of hosts to run subjobs on

  • njobs – number of subjobs to generate. None means determine automatically.

  • adjust – whether to adjust njobs such that job size is within limits

  • force – whether to continue with job if subjobs fail

  • max_retries – number of times to attempt to restart a subjob If not specified, use SCHRODINGER_MAX_RETRIES or 2.

  • cleanup – whether to delete intermediate files

setMainProduct(product)

Specify which product this stage is part of. Will determine which host the subjobs are run on.

setOutput(position, obj)

Use this method at the end of operate() to set the output.

setOutputName(position, varname)

Is called by Pipeline when starting the stage. Tell the stage what name to save each output under.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

updateJobdj(jobdj)

Gets called periodically in order to update JobDJ’s hosts. Will ask Pipeline for CPUS when needed, and will tell Pipeline when they are no longer needed.

validateValues(preserve_errors=False)

Validates the stored keywords. This is done by converting <self> to a ConfigObj instance, and calling validate() on it. The validated keywords are then updated back to <self>. This is done as part of Ev:87429

values() → an object providing a view on D’s values
warning(text)

Print a warning line to the log file