Package schrodinger :: Package pipeline :: Package stages :: Module pull :: Class PullStage
[hide private]
[frames] | no frames]

Class PullStage

    UserDict.UserDict --+        
                        |        
UserDict.IterableUserDict --+    
                            |    
                  stage.Stage --+
                                |
                               PullStage


Stage for extracting a subset of ligands/compounds (and their variants)
from a second set of ligand files, given the number or percent to keep
from a first set of ligand files.  The first set must be ordered; i.e.,
the ligands to keep appear earliest in the set.  Compounds in the second
set that don't appear in the first set are ignored.

The keywords specific to this stage are...

    UNIQUEFIELD        The ligand property that identifies a compound.
                       Multiple ligands with the same UNIQUEFIELD value
                       are considered variants of the compound.  The
                       default is the structure title ('s_m_title').

    NUM_TO_KEEP        The number of compounds from the first set to
                       extract from the second set.  The compounds kept
                       are those that appear earliest in the first set.

    PERCENT_TO_KEEP    The percentage of compounds from the first set to
                       extract from the second set.  The compounds kept
                       are those that appear earliest in the first set.
                       Ignored if NUM_TO_KEEP is used.

    KEEP_CHARGES       If set to True, then after the ligand is pulled from
                       the original file, the partial charges are set to
                       what they were in the second file (For QPLD).
                       NOTE: If set, only ONE ligand with specified field
                       should be in the first file set.
                       NOTE: solvation_charge property is assummed to be
                       equal to the partial_charge.

    CHARGE_PROPERTY    If KEEP_CHARGES is True, the charges are taken from
                       this atom-level property.  Default is 'r_m_charge1'.
                       These charges from the first set are placed in the
                       'r_m_charge1' (partial_charge) and 'r_m_charge2'
                       (solvation_charge) properties of the pulled set, not
                       the CHARGE_PROPERTY in the pulled set.

If neither NUM_TO_KEEP or PERCENT_TO_KEEP are present, all compounds in
the first set are extracted from the second set.  In all cases, every
variant of a kept compound from the second set is extracted.

This stage takes one ordered input structure set that identifies the
ligands to extract, a second input structure set from which those ligands
are extracted, and it generates one output structure file set, each
containing about 100,000 structures.

Issues:
Keeping the first N compounds may not choose compounds from Glide results
in the proper manner. If the variants of a compound are chemically
distinct, it's appropriate to choose the top compounds based on GlideScore.
However, if the variants differ only in their conformations (such as when
saving multiple poses per ligand), Emodel (not GlideScore) should be used
to determine the representative variant of the compound for GlideScore
comparison with other compounds. Hopefully, this doesn't come up
often, because it would be very tricky to reflect the proper ordering
in the input files if there are both Ligprep-type variants and
conformation-variants.

Instance Methods [hide private]
 
__init__(self, *args, **kwargs)
See class docstring.
 
operate(self)
Extract a subset of the first set compounds from the second set of ligand input structure files.
 
pullCompounds(self)

Inherited from stage.Stage: JobDJOptions, __getitem__, __getstate__, addExpectedInput, addExpectedOutput, addOutputFile, checkFile, checkFiles, checkInputs, checkParameters, checkProducts, debug, dump, error, exit, genFileName, genOutputFileName, getAdjustedNJobs, getCleanupRequested, getHostList, getHostStr, getInput, getInputNames, getJobDJ, getLocal, getMaxRetries, getNCpus, getNJobs, getName, getOutput, getOutputName, getOutputNames, getRuntimePath, getStageDirectory, getVerbosity, hasCompleted, hasStarted, info, iterInputs, log, lognoret, mainProduct, outputRequested, reportParameters, requiredProduct, requiredProductRuntime, run, runJobDJ, setInput, setJobDJOptions, setJobOptions, setMainProduct, setOutput, setOutputName, updateJobdj, validateValues, waitForFileStatus, waitForFiles, warning

Inherited from UserDict.IterableUserDict: __iter__

Inherited from UserDict.UserDict: __cmp__, __contains__, __delitem__, __len__, __repr__, __setitem__, clear, copy, get, has_key, items, iteritems, iterkeys, itervalues, keys, pop, popitem, setdefault, update, values

Class Methods [hide private]

Inherited from UserDict.UserDict: fromkeys

Class Variables [hide private]

Inherited from UserDict.UserDict: __hash__

Method Details [hide private]

__init__(self, *args, **kwargs)
(Constructor)

 

See class docstring.

Overrides: UserDict.UserDict.__init__

operate(self)

 

Extract a subset of the first set compounds from the second set of ligand input structure files. Makes a list of compounds (identified by UNIQUEFIELD) from the first set, pares down the list according to the NUM_TO_KEEP or PERCENT_TO_KEEP setting, and extract all variants of those compounds from the second set. Raises a RuntimeError if any input file cannot be read, if there is a problem accessing the UNIQUEFIELD property of any ligand, if there is a problem writing any output file, or if no ligands are extracted.

Overrides: stage.Stage.operate