schrodinger.analysis.enrichment.calculator module

This module contains the class for generating the default enrichment report.

Example metrics from two different screens:

The enrichment metrics from example_A are generally more favorable than those from example_B.

Enrichment Report

Actives file: example_A_actives.txt Results: example_A_dock_pv.rept Total actives: 117 Total ligands(actives+decoys): 1117 Number of ranked actives: 117

BEDROC(alpha=160.9, alpha*Ra=16.8534): 1.000 BEDROC(alpha=20.0, alpha*Ra=2.0949): 0.914 BEDROC(alpha=8.0, alpha*Ra=0.8380): 0.868 ROC: 0.92 RIE: 7.65 Area under accumulation curve: 0.87 Ave. Number of outranking decoys: 82

Count and percentage of actives in top N% of decoy results. # Actives (1%|2%|5%|10%|20%): 90| 90| 92| 94| 97 % Actives (1%|2%|5%|10%|20%): 76.9| 76.9| 78.6| 80.3| 82.9

Enrichment Factors with respect to N% sample size. EF (1%|2%|5%|10%|20%): 9.5| 9.5| 9.4| 7.7| 4.1 EF*(1%|2%|5%|10%|20%): 77| 38| 16| 8| 4.1 EF’(1%|2%|5%|10%|20%): 2.9e+02|1.7e+02| 54| 23| 9.9 Eff(1%|2%|5%|10%|20%): 0.974| 0.949| 0.88| 0.779| 0.611

Enrichment Factors with respect to N% actives recovered. EF (40%|50%|60%|70%|80%|90%|100%): 9.3| 9.4| 9.4| 9.2| 5.7| 2| 1.3 EF*(40%|50%|60%|70%|80%|90%|100%): 4e+02|5e+02|6e+02|2.3e+02| 13| 2.2| 1.4 EF’(40%|50%|60%|70%|80%|90%|100%): 3.8e+02|4.3e+02|4.7e+02|4.3e+02| 38| 4.7| 2.7 FOD(40%|50%|60%|70%|80%|90%|100%): 9e-05|0.0003|0.0004|0.0006|0.003| 0.03| 0.08

Enrichment Report

Actives file: example_B_actives.txt Results: example_B_dock_pv.rept Total actives: 62 Total ligands(actives+decoys): 1062 Number of ranked actives: 62

BEDROC(alpha=160.9, alpha*Ra=9.3934): 0.703 BEDROC(alpha=20.0, alpha*Ra=1.1676): 0.256 BEDROC(alpha=8.0, alpha*Ra=0.4670): 0.323 ROC: 0.72 RIE: 3.02 Area under accumulation curve: 0.71 Ave. Number of outranking decoys: 281

Count and percentage of actives in top N% of decoy results. # Actives (1%|2%|5%|10%|20%): 8| 8| 9| 13| 23 % Actives (1%|2%|5%|10%|20%): 12.9| 12.9| 14.5| 21.0| 37.1

Enrichment Factors with respect to N% sample size. EF (1%|2%|5%|10%|20%): 12| 6.5| 2.9| 2.1| 1.6 EF*(1%|2%|5%|10%|20%): 13| 6.5| 2.9| 2.1| 1.9 EF’(1%|2%|5%|10%|20%): 23| 12| 5.3| 3.4| 2.3 Eff(1%|2%|5%|10%|20%): 0.856| 0.732| 0.488| 0.354| 0.299

Enrichment Factors with respect to N% actives recovered. EF (40%|50%|60%|70%|80%|90%|100%): 1.8| 2| 1.9| 2| 1.6| 1.6| 1 EF*(40%|50%|60%|70%|80%|90%|100%): 1.9| 2.1| 2| 2.1| 1.6| 1.6| 1.1 EF’(40%|50%|60%|70%|80%|90%|100%): 2.3| 2.2| 2.2| 2.2| 2.1| 2| 1.6 FOD(40%|50%|60%|70%|80%|90%|100%): 0.1| 0.1| 0.1| 0.2| 0.2| 0.2| 0.3

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.analysis.enrichment.calculator.Calculator(actives, results, total_decoys=0)

Bases: object

A class to report default set of enrichment terms for a screen. By default, a report containing a suite of metrics is directed to standard out.

Note:

This is not the preferred way to obtain enrichment metrics. Please consider using parser and metric functions directly in enrichment_input.py and metrics.py if possible.

Variables:
  • ef_precision (int) – Number of decimals when reporting EF values. Default = 2
  • efp_precision (int) – Number of decimals when reporting EF’ values. Default = 2
  • efs_precision (int) – Number if decimals when reporting EF* values. Default = 2
  • eff_precision (int) – Number of decimals when reporting Eff values. Default = 3
  • fod_precision (int) – Number of decimals when reporting FOD values. Default = 1
ef_precision = 2
efs_precision = 2
efp_precision = 2
eff_precision = 3
fod_precision = 1
__init__(actives, results, total_decoys=0)
Parameters:
  • actives (str or list(str)) – File name or a list of strings containing all active titles. If a file name is provided, the input should be a valid csv or structure file, a raw text file containing one line per title is also acceptable. Duplicate titles are discarded, only the first occurrence is recorded.
  • results (str or list(str) or list(structure.Structure)) – File name, a list of strings, or a list of structure.Structure containing the virtual screening result ordered by the scoring metric. If a file name is provided, the input should be a valid csv file or structure file. Duplicate titles are discarded, only the first occurrence is recorded.
  • total_decoys (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
calcEF(n_sampled_set, min_actives=None)
calcEFStar(n_sampled_decoy_set, min_actives=None)
calcEFP(n_sampled_decoy_set, min_actives=None)
calcFOD(fraction_of_actives)
calcEFF(fraction_of_decoys)
calcActivesInN(n_sampled_set)
calcActivesInNStar(n_sampled_set)
calcAveNumberOutrankingDecoys()
calcBEDROC(alpha=20.0)
calcRIE(alpha=20.0)
calcAUAC()
calcROC()
calcMWUROC(alpha=0.05)
calcDEF(n_sampled_set, min_actives=None)
calcDEFStar(n_sampled_decoy_set, min_actives=None)
calcDEFP(n_sampled_decoy_set, min_actives=None)
calculateSensitivity(rank)
calculateSpecificity(rank)
getPercentScreenCurvePoints()
getActiveRankCsvRows()
getROCCurvePoints()
getROCAreaRomberg(lower_limit=0.0, upper_limit=1.0)
savePlot(png_file='plot.png', title='Screen Results', xlabel='1-Specificity', ylabel='Sensitivity')
static format(value, precision=2)
Parameters:
  • value (float or None) – Float value to format as string.
  • precision (int) – Number of digits after the decimal.
Returns:

a string representation of the passed value. If the value is None then the returned string is ‘n/a’. Uses %g formatting idiom so large values are returned as exponentials.

Return type:

str

report(file_handle=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, header='', footer='')

Prints text summary of results to the file_handle.

Parameters:
  • header (str) – Header for the report.
  • footer (str) – Footer for the report.
  • file_handle (file) – File handle-like object, default is sys.stdout.
getCsvRows()

Return a list of two lists, the first inner list contains all metric names, the other contains all corresponding metric values.

Returns:a list of header and enrichment value tuples.
Return type:list