Package schrodinger :: Package pipeline :: Package stages :: Module combine :: Class DataFusionMergeStage
[hide private]
[frames] | no frames]

Class DataFusionMergeStage

    UserDict.UserDict --+        
                        |        
UserDict.IterableUserDict --+    
                            |    
                  stage.Stage --+
                                |
                               DataFusionMergeStage


This stage takes in results from a Glide docking job, Phase Shape job,
and a Canvas 2D fingerprint job and combines them based on the Z-score
for each method.

This stage is used by Data Fusion workflow (data_fusion_backend.py).

Z-score = the number of standard deviations above or below the mean of the
          distribution of scores for that method.
Final score = For each compound, the sum of z-scores for the 3 methods.
              Compounds will be sorted by the final score in the output.

STAGE Merge Results
    Input 1: Docking PV file
    Input 2: Phase Shape output file
    Input 3: Canvas 2D Fingerprints output file
    Output: Merged Maestro file of compounds sorted by the final score.

Properties included in the output:
1. Glide score
2. Phase Shape score
3. Similarity score
4. Consensus/final score (average of z-scores)

WARNING: This stage assumes that only one ligand with the same UNIQUEFIELD
exists in each input set.

Instance Methods [hide private]
 
__init__(self, *args, **kwargs)
Creates the stage instance, and passes the <args> and <kwargs> to the stage.Stage's constructor.
 
operate(self)
The only overridden & required method in this class.
 
calcZScores(self, scores_by_compound, more_negative_is_better=False)
Takes in a dictionary where keys are compound IDs, and values are scores, and returns a dict of z-scores (keys are compound IDs also).
 
calcConsensusScore(self, scores, top_n)
Given a list of scores, select the best <top_n> of them, and calculate their average.

Inherited from stage.Stage: JobDJOptions, __getitem__, __getstate__, addExpectedInput, addExpectedOutput, addOutputFile, checkFile, checkFiles, checkInputs, checkParameters, checkProducts, debug, dump, error, exit, genFileName, genOutputFileName, getAdjustedNJobs, getCleanupRequested, getHostList, getHostStr, getInput, getInputNames, getJobDJ, getLocal, getMaxRetries, getNCpus, getNJobs, getName, getOutput, getOutputName, getOutputNames, getRuntimePath, getStageDirectory, getVerbosity, hasCompleted, hasStarted, info, iterInputs, log, lognoret, mainProduct, outputRequested, reportParameters, requiredProduct, requiredProductRuntime, run, runJobDJ, setInput, setJobDJOptions, setJobOptions, setMainProduct, setOutput, setOutputName, updateJobdj, validateValues, waitForFileStatus, waitForFiles, warning

Inherited from UserDict.IterableUserDict: __iter__

Inherited from UserDict.UserDict: __cmp__, __contains__, __delitem__, __len__, __repr__, __setitem__, clear, copy, get, has_key, items, iteritems, iterkeys, itervalues, keys, pop, popitem, setdefault, update, values

Class Methods [hide private]

Inherited from UserDict.UserDict: fromkeys

Class Variables [hide private]

Inherited from UserDict.UserDict: __hash__

Method Details [hide private]

__init__(self, *args, **kwargs)
(Constructor)

 

Creates the stage instance, and passes the <args> and <kwargs> to the stage.Stage's constructor.

Overrides: UserDict.UserDict.__init__

operate(self)

 

The only overridden & required method in this class. Called by the Pipeline to run this stage's main code.

Overrides: stage.Stage.operate

calcZScores(self, scores_by_compound, more_negative_is_better=False)

 

Takes in a dictionary where keys are compound IDs, and values are
scores, and returns a dict of z-scores (keys are compound IDs also).
Z-score is calculated by:
    z-score = (score-average) / std-deviations

If the number of compounds is 1, then the z-score will be <None>.

@param scores_by_compound: Dictionary of scores (e.g. Glide score)
    keyed by compound ID.
@type scores_by_compound: dict

@param more_negative_is_better: If set to True, more negavtive scores
    are considered to be better, and will result in higher Z-scores.
@type more_negative_is_better: bool

@return: Z-Scores, in a dictionary keyed by compound ID.
@rtype: dict