schrodinger.pipeline.stages.combine module

Core stages for Pipeline mechanism.

CombineStage
Stage for combining multiple structure PipeIO sets into one PipeIO set and labelling each ligand by the input set from which it came.
DataFusionMergeStage
Stage for merging results in a Data Fusion workflow.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.pipeline.stages.combine.CombineStage(*args, **kwargs)

Bases: schrodinger.pipeline.stage.Stage

Stage for combining multiple Structures objects into one. Source object can be labeled by supplying LABEL keywords

The keywords specific to this stage are…

LABELFIELD An optional property added to each ouput structure
that holds the label for the input set whence it came.

LABELS List of labels for the input sets.

The stage takes up to ten input structure files sets and generates one output structure file set.

operate()

Combine all the input files from all input sets into one set, optionally labelling each structure according to the set from which it originated. Raises a RuntimeError if there is a problem reading an input file or writing an output file.

class schrodinger.pipeline.stages.combine.DataFusionMergeStage(*args, **kwargs)

Bases: schrodinger.pipeline.stage.Stage

This stage takes in results from a Glide docking job, Phase Shape job, and a Canvas 2D fingerprint job and combines them based on the Z-score for each method.

This stage is used by Data Fusion workflow (data_fusion_backend.py).

Z-score = the number of standard deviations above or below the mean of the
distribution of scores for that method.
Final score = For each compound, the sum of z-scores for the 3 methods.
Compounds will be sorted by the final score in the output.
STAGE Merge Results
Input 1: Docking PV file Input 2: Phase Shape output file Input 3: Canvas 2D Fingerprints output file Output: Merged Maestro file of compounds sorted by the final score.

Properties included in the output: 1. Glide score 2. Phase Shape score 3. Similarity score 4. Consensus/final score (average of z-scores)

WARNING: This stage assumes that only one ligand with the same UNIQUEFIELD exists in each input set.

operate()

The only overridden & required method in this class. Called by the Pipeline to run this stage’s main code.

calcZScores(scores_by_compound, more_negative_is_better=False)

Takes in a dictionary where keys are compound IDs, and values are scores, and returns a dict of z-scores (keys are compound IDs also). Z-score is calculated by:

z-score = (score-average) / std-deviations

If the number of compounds is 1, then the z-score will be <None>.

Parameters:
  • scores_by_compound (dict) – Dictionary of scores (e.g. Glide score) keyed by compound ID.
  • more_negative_is_better (bool) – If set to True, more negavtive scores are considered to be better, and will result in higher Z-scores.
Returns:

Z-Scores, in a dictionary keyed by compound ID.

Return type:

dict

calcConsensusScore(scores, top_n)

Given a list of scores, select the best <top_n> of them, and calculate their average.