Package schrodinger :: Package analysis :: Module enrichment :: Class Calculator
[hide private]
[frames] | no frames]

Class Calculator

object --+
         |
        Calculator
Known Subclasses:


A class to calculate enrichment terms for a screen.

API examples
------------

# Ex. 1)  Reading screen result data from file.
efcalc = enrichment.Calculator(
    actives_file_name = "my_actives.txt",  # Active titles, one per line.
    results = "screen_results.rept", # Glide report file.
    total_decoys = 1000
)
efcalc.calculateMetrics() # Calculate a default suite of terms.
efcalc.report() # Print default report to standard out.
efcalc.savePlot() # Create default graph png.
print efcalc.calcBEDROC(alpha=20) # Print the BEDROC metric value.

# Ex. 2) Using a structure sequence as screen result data.
results = []
for st in structure.StructureReader('iglur_dock_pv.maegz', 2):
    results.append(st)

efcalc = enrichment.Calculator(
    actives_file_name = "my_actives.txt", # Active titles, one per line.
    results = results, # Iterable sequence of structure.Structure objects.
    total_decoys = 1000
)
efcalc.calculateMetrics()
efcalc.report()


Class data
----------
table_sep (string)
    Token used to separate column fields.  Default is a None,
    i.e. whitespace.

rept_file_ext (list)
    List of parsable Glide report file extensions.

csv_file_ext (list)
    List of parsable csv file extensions.

table_file_ext (list)
    List of parsable table file extensions.

ef_precision (int)
    Number of decimals when reporting EF values.  Default = 2

efp_precision (int)
    Number of decimals when reporting EF' values.  Default = 2

efs_precision (int)
    Number if decimals when reporting EF* values.  Default = 2

eff_precision (int)
    Number of decimals when reporting Eff values.  Default = 3

fod_precision (int)
    Number of decimals when reporting FOD values.  Default = 1


Instance data
-------------
total_actives (int)
    The number of all active ligands in the screen, ranked
    and unranked.

total_ligands (int)
    The number of the total number of ligands (actives and
    unknowns/decoys) used in the screen.

active_ranks (list):
    List of *unadjusted* integer ranks for the actives found in the
    screen.  For example, a screen result that placed three actives
    as the first three ranks has an active_ranks list of = [1, 2, 3].

adjusted_active_ranks (list)
    Modified active ranks; each rank is improved by the number of
    preceding actives.  For example, a screen result that placed
    three actives as the first three ranks, [1, 2, 3], has adjusted
    ranks of [1, 1, 1].  In this way, actives are not penalized by
    being outranked by other actives.

active_titles (list)
    List of strings; the titles of the known actives in the screen.

missing_active_titles (list)
    List of strings; the titles of ligands not discovered in the
    screen results.

title_ranks (dict):
    *Unadjusted* integer rank keys for title.  Not available for
    table inputs, or other screen results that don't list the title.

active_fingerprint (dict):
    Title keys for fingerprint.  Not available for screen results
    that don't include title and structure information.

Instance Methods [hide private]
 
__init__(self, actives_file_name, results, total_decoys, legend_label=None)
x.__init__(...) initializes x; see help(type(x)) for signature
 
__repr__(self)
Returns: Returns the string equivalent of the instance type with constructor arguments.
 
_setMaxEfValue(self, max_ef_value)
 
_getMaxEfValue(self)
 
parseInput(self)
Sets instance data members from parsed input actives and results files.
 
_parseActiveTitles(self)
Assigns active_titles and total_actives by parsing a structure file or text file.
 
_parseTableResults(self)
Assigns active_titles and total_actives by parsing a table file.
 
_parseGlideReptResults(self)
Sets the active_ranks data member with ranks from a Glide report file.
 
_parseStructureFileResults(self)
Sets the active_ranks data member with ranks from a structure file.
 
_parseCsvResults(self)
Sets the active_ranks data member with ranks from a csv file.
 
_parseStructureResults(self)
Sets the active_ranks data member with ranks from a sequence of structures.
 
_setMinTcTotalActives(self)
Sets min_Tc_total_actives, a float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs.
 
_getDiversityFactor(self, active_ranks)
@return: a float representation of the diversity factor.
 
calculateMetrics(self)
Sets a suite of enrichment factor terms as instance data members.
 
calcEF(self, n_sampled_set, min_actives=None)
@return: the Enrichment factor (EF) for the given sample size of the screen results.
 
calcEFStar(self, n_sampled_decoy_set, min_actives=None)
@return: the Enrichment factor* (EF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands.
 
calcEFP(self, n_sampled_decoy_set, min_actives=None)
@return: the Enrichment Factor prime (EF') for a given sample size.
 
calcDEF(self, n_sampled_set, min_actives=None)
@return: Diverse Enrichment factor (DEF) for the given sample size of the screen results.
 
calcDEFStar(self, n_sampled_decoy_set, min_actives=None)
@return: Diverse Enrichment factor (DEF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands.
 
calcDEFP(self, n_sampled_decoy_set, min_actives=None)
@return: Diverse Enrichment Factor prime (DEF') for a given sample size.
 
calcFOD(self, fraction_of_actives)
@return: the average fraction of decoys outranking the given fraction, provided as a float, of known active ligands.
 
calcEFF(self, fraction_of_decoys)
@return: a float for the active recovery Efficiency (EFF) at a particular sample set size.
 
calcActivesInN(self, n_sampled_set)
Returns: the number of the known active ligands found in a given sample size.
 
calcActivesInNStar(self, n_sampled_set)
Returns: the number of the known active ligands found in a given sample size.
 
calcAveNumberOutrankingDecoys(self)
@return: the average number of decoys that outranked the actives.
 
calcBEDROC(self, alpha=20.0)
Returns: a tuple of two floats, the first represents the area under the curve for the Boltzmann-enhanced discrimination of ROC (BEDROC) analysis, the second is the alpha*Ra term.
float
calcRIE(self, alpha=20.0)
Returns: a float for the Robust Initial Enhancement (RIE).
 
calculateSensitivity(self, rank)
Calculates sensitivity at a particular rank, defined as: Se(rank) = found_actives / total_actives
 
calculateSpecificity(self, rank)
Calculates specificity at a particular rank, defined as: Sp(rank) = discarded_decoys / total_decoys
float
calcAUAC(self)
Returns: A float representation of the Area Under the Accumulation Curve.
 
calcROC(self)
@return: A float representation of the Receiver Operator Characteristic area underneath the curve.
 
calcMWUROC(self, alpha=0.05)
@return: tuple of ROC AUC, the standard error, and estimated confidence interval (lower and upper bounds).
 
getPercentScreenCurvePoints(self)
Returns: List of (%Screen, %Actives Found) tuples for the active ranks.
list
getActiveRankCsvRows(self)
Returns: a list of active Title, Rank, Sensitivity, Specificity, %Actives Found, %Screen tuples.
list of tuples
getROCCurvePoints(self)
Calculates set of points in ROC curve along each active rank.
float
getROCAreaRomberg(self, lower_limit=0.0, upper_limit=1.0)
Returns: Receiver Operator Characteristic area under the curve as defined by a Romberg integration between arbitrary points along 1-Sp (domain: 0-1).
 
_getSe(self, fraction_of_screen)
 
savePlot(self, png_file='plot.png', title='Screen Results', xlabel='1-Specificity', ylabel='Sensitivity')
Returns: None.
 
report(self, file_handle=sys.stdout, header='', footer='')
Returns: None.
list
getCsvRows(self)
Returns: a list of header and enrichment value tuples.
 
_getActiveSampleSizeStar(self, fraction_of_actives)
Returns: The size of the decoy sample set required to recover the specified fraction of actives.
 
_getActiveSampleSize(self, fraction_of_actives)
Returns: the size of the sample set required to recover the specified fraction of actives.
 
_getDecoySampleSize(self, fraction_of_decoys)
Returns: the size of the sample set required to recover the specified fraction of decoys.
 
format(self, value, precision=2)
Returns: a string representation of the passed value.
 
_logZeroDivisionError(self, warn_message='')
Logs a common mathmatical error at the info level.

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods [hide private]
list
parseCanvasCsv(cls, input_csv)
Returns: A list of csv subfiles generated by parsing the input csv file as a Canvas Similarity Matrix.
list
_splitCanvasCsv(cls, input_csv)
Returns: A list of file names for the generated csv sub-files.
 
_sortCanvasCsv(cls, input_csv, sort_option='descending')
Class Variables [hide private]
  table_sep = None
hash(x)
  rept_file_ext = ['.rept']
  csv_file_ext = ['.csv', '.CSV']
  table_file_ext = ['.tbl', '.txt']
  ef_precision = 2
  efs_precision = 2
  efp_precision = 2
  eff_precision = 3
hash(x)
  fod_precision = 1
  title_rank_re = re.compile(r'^\s*(\d+)\s*(.+)\s+\d+')
Properties [hide private]
  max_ef_value

Inherited from object: __class__

Method Details [hide private]

__init__(self, actives_file_name, results, total_decoys, legend_label=None)
(Constructor)

 

x.__init__(...) initializes x; see help(type(x)) for signature

Parameters:
  • actives_file_name (string) - Path to the file of active titles. May be a structure file or a text file with one title per line.
  • results (string or sequence of structure.Structures) - Path to the file of screen results, or an ordered sequence of structure.Structure objects. File can be a structure file, CSV file, report file, or table file. See cmdline_doc string for the list of supported input screen results file formats.
  • total_decoys (integer) - The number of the unknowns, aka decoys, used in the screen.
  • legend_label (str) - label used for plot legends
Overrides: object.__init__

__repr__(self)
(Representation operator)

 

repr(x)

Returns:
Returns the string equivalent of the instance type with constructor arguments.
Overrides: object.__repr__

_parseActiveTitles(self)

 

Assigns active_titles and total_actives by parsing a structure file or text file. The file must be formatted such that there is one ligand title per line.

_parseTableResults(self)

 

Assigns active_titles and total_actives by parsing a table file. (Deprecated)

A table file is formatted such that the first column is the rank of each retrieved active, the second column is the cummulative count of actives found, and the last row contains the total number of ligands screened and the total count of the actives possible to find in the screen.

_parseGlideReptResults(self)

 

Sets the active_ranks data member with ranks from a Glide report file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank.

_parseStructureFileResults(self)

 

Sets the active_ranks data member with ranks from a structure file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank.

_parseCsvResults(self)

 

Sets the active_ranks data member with ranks from a csv file. Rows are presumed to be in order, and a 'Title' field is required.

parseCanvasCsv(cls, input_csv)
Class Method

 
Parameters:
  • input_csv (string) - Path to the file to parse. First column contains the titles for the hits, The second and subsequent columns are the probes (active compounds).
Returns: list
A list of csv subfiles generated by parsing the input csv file as a Canvas Similarity Matrix. The output sub-file names have the form <basename>.<index>.<title>.csv. They are sorted by descending values (1.0->0.0).

_splitCanvasCsv(cls, input_csv)
Class Method

 
Returns: list
A list of file names for the generated csv sub-files. Parses the csv and generates one csv file per query.

_sortCanvasCsv(cls, input_csv, sort_option='descending')
Class Method

 
Parameters:
  • input_csv (string) - Path to the csv file to sort. The csv is expected to have two columns with the values in the second.
  • sort_option (module constant) - Order the column of values ascending or descending. The default order is descending.

_parseStructureResults(self)

 

Sets the active_ranks data member with ranks from a sequence of structures. Also assigns active_fingerprint and missing_active_titles data members.

_setMinTcTotalActives(self)

 

Sets min_Tc_total_actives, a float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs. Tc is scaled from 0.0-1.0, where a 1.0 indicates a high degree of similarity.

_getDiversityFactor(self, active_ranks)

 

@return:
    a float representation of the diversity factor.
@rtype:
    float

df = (1- min_Tc_actives)/(1 - min_Tc_total_actives)

calculateMetrics(self)

 

Sets a suite of enrichment factor terms as instance data members. See cmdline_doc for description of metrics.

The standard suite includes the attributes: self.ave_num_outranking_decoys self.bedroc20 (alpha=20.0) self.bedroc160_9 (alpha=160.9) self.bedroc8_0 (alpha=8.0) self.roc self.rie self.auac self.ef_40 (EF 40% of actives) self.ef_50 (EF 50% of actives) self.ef_60 (EF 60% of actives) self.ef_70 (EF 70% of actives) self.ef_80 (EF 80% of actives) self.ef_90 (EF 90% of actives) self.ef_100 (EF 100% of actives) self.ef_1pct (EF top 1% of total ligands) self.ef_2pct (EF top 2% of total ligands) self.ef_5pct (EF top 5% of total ligands) self.ef_10pct (EF top 10% of total ligands) self.ef_20pct (EF top 20% of total ligands) self.efs_40 (EF* 40% of actives) self.efs_50 (EF* 50% of actives) self.efs_60 (EF* 60% of actives) self.efs_70 (EF* 70% of actives) self.efs_80 (EF* 80% of actives) self.efs_90 (EF* 90% of actives) self.efs_100 (EF* 100% of actives) self.efs_1pct (EF* top 1% of total decoys) self.efs_2pct (EF* top 2% of total decoys) self.efs_5pct (EF* top 5% of total decoys) self.efs_10pct (EF* top 10% of total decoys) self.efs_20pct (EF* top 20% of total decoys) self.efp_40 (EF' 40% of actives) self.efp_50 (EF' 50% of actives) self.efp_60 (EF' 60% of actives) self.efp_70 (EF' 70% of actives) self.efp_80 (EF' 80% of actives) self.efp_90 (EF' 90% of actives) self.efp_100 (EF' 100% of actives) self.efp_1pct (EF' top 1% of total decoys) self.efp_2pct (EF' top 2% of total decoys) self.efp_5pct (EF' top 5% of total decoys) self.efp_10pct (EF' top 10% of total decoys) self.efp_20pct (EF' top 20% of total decoys) self.fod_40 (FOD 40% of actives) self.fod_50 (FOD 50% of actives) self.fod_60 (FOD 60% of actives) self.fod_70 (FOD 70% of actives) self.fod_80 (FOD 80% of actives) self.fod_90 (FOD 90% of actives) self.fod_100 (FOD 100% of actives) self.eff_1pct (Eff top 1% of total decoys) self.eff_2pct (Eff top 2% of total decoys) self.eff_5pct (Eff top 5% of total decoys) self.eff_10pct (Eff top 10% of total decoys) self.eff_20pct (Eff top 20% of total decoys) self.actives_in_top_1_pct (of total ligands) self.actives_in_top_2_pct (of total ligands) self.actives_in_top_5_pct (of total ligands) self.actives_in_top_10_pct (of total ligands) self.actives_in_top_20_pct (of total ligands) self.pct_actives_in_top_1_pct (of total ligands) self.pct_actives_in_top_2_pct (of total ligands) self.pct_actives_in_top_5_pct (of total ligands) self.pct_actives_in_top_10_pct (of total ligands) self.pct_actives_in_top_20_pct (of total ligands) self.actives_in_top_1_pct_star (of total decoys) self.actives_in_top_2_pct_star (of total decoys) self.actives_in_top_5_pct_star (of total decoys) self.actives_in_top_10_pct_star (of total decoys) self.actives_in_top_20_pct_star (of total decoys) self.pct_actives_in_top_1_pct_star (of total decoys) self.pct_actives_in_top_2_pct_star (of total decoys) self.pct_actives_in_top_5_pct_star (of total decoys) self.pct_actives_in_top_10_pct_star (of total decoys) self.pct_actives_in_top_20_pct_star (of total decoys) self.def_1pct (DEF top 1% of actives) self.def_2pct (DEF top 2% of actives) self.def_5pct (DEF top 5% of actives) self.def_10pct (DEF top 10% of actives) self.def_20pct (DEF top 20% of actives) self.defs_1pct (DEF* top 1% of actives) self.defs_2pct (DEF* top 2% of actives) self.defs_5pct (DEF* top 5% of actives) self.defs_10pct (DEF* top 10% of actives) self.defs_20pct (DEF* top 20% of actives) self.defp_1pct (DEF' top 1% of total decoys) self.defp_2pct (DEF' top 2% of total decoys) self.defp_5pct (DEF' top 5% of total decoys) self.defp_10pct (DEF' top 10% of total decoys) self.defp_20pct (DEF' top 20% of total decoys)

calcEF(self, n_sampled_set, min_actives=None)

 

@return:
    the Enrichment factor (EF) for the given sample size of the
    screen results.  If the fewer than min_actives are found
    in the set, or the calculation raises a ZeroDivisionError,
    the returned value is None.

@param n_sampled_set:
    The number of ranked results for which to calculate
    the enrichment factor.
@type n_sampled_set:
    integer

@param min_actives:
    The number of actives that must be within the n_sampled_set,
    otherwise the returned EF value is None.
@type min_actives:
    integer


EF is defined as::

        n_actives_in_sampled_set / n_sampled_set
  EF =  ----------------------------------------
             total_actives / total_ligands

where 'n_sampled_set' is the number of *all* ranks in which
to search for actives.

calcEFStar(self, n_sampled_decoy_set, min_actives=None)

 

@return:
    the Enrichment factor* (EF*) for the given sample size of
    the screen results, calculated with respect to the total
    decoys instead of the more traditional total ligands.
    If the fewer than min_actives are found in the set the
    returned value is None.

@param n_sampled_decoy_set:
    The number of ranked decoys for which to calculate the
    enrichment factor.
@type n_sampled_decoy_set:
    integer

@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF value is None.
@type min_actives:
    integer

Here, EF* is defined as::

         n_actives_in_sampled_set / n_sampled_decoy_set
  EF* =  ----------------------------------------------
              total_actives / total_decoys

where 'n_sampled_decoy_set' is the number of *decoy* ranks in
which to search for actives.

calcEFP(self, n_sampled_decoy_set, min_actives=None)

 

@return:
    the Enrichment Factor prime (EF') for a given sample size.
    If the fewer than min_actives are found in the set the
    returned value is None.

@param n_sampled_decoy_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_decoy_set:
    integer

@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF' value is None.
@type min_actives:
    integer


EF' is defined as::

                   n_actives_sampled_set
   EF' = -------------------------------------------
         cummulative_sum(frac. decoys/frac. actives)

calcDEF(self, n_sampled_set, min_actives=None)

 

@return:
    Diverse Enrichment factor (DEF) for the given sample size of
    the screen results.  If the fewer than min_actives are found
    in the set, or the calculation raises a ZeroDivisionError,
    the returned value is None.

@param n_sampled_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_set:
    integer

@param min_actives:
    The number of actives that must be within the n_sampled_set,
    otherwise the returned EF value is None.
@type min_actives:
    integer


DEF is defined as::

              1 - (min_similarity_among_actives_in_sampled_set)
  DEF = EF * --------------------------------------------------
              1 - (min_similarity_among_all_actives)

where 'n_sampled_set' is the number of *all* ranks in which
to search for actives.

calcDEFStar(self, n_sampled_decoy_set, min_actives=None)

 

@return:
    Diverse Enrichment factor (DEF*) for the given sample size
    of the screen results, calculated with respect to the total
    decoys instead of the more traditional total ligands.
    If the fewer than min_actives are found in the set the
    returned value is None.

@param n_sampled_decoy_set:
    The number of ranked decoys for which to calculate the
    enrichment factor.
@type n_sampled_decoy_set:
    integer

@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF value is None.
@type min_actives:
    integer


Here, DEF* is defined as::

                   1 - (min_similarity_among_actives_in_sampled_set)
  DEF = EF_star * --------------------------------------------------
                        1 - (min_similarity_among_all_actives)

where 'n_sampled_decoy_set' is the number of *decoy* ranks in
which to search for actives.

calcDEFP(self, n_sampled_decoy_set, min_actives=None)

 

@return:
    Diverse Enrichment Factor prime (DEF') for a given sample
    size.  If the fewer than min_actives are found in the set
    the returned value is None.

@param n_sampled_decoy_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_decoy_set:
    integer

@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF' value is None.
@type min_actives:
    integer


DEF' is defined as::

               1 - (min_similarity_among_actives_in_sampled_set)
  DEF' = EF' * --------------------------------------------------
                    1 - (min_similarity_among_all_actives)

calcFOD(self, fraction_of_actives)

 

@return:
    the average fraction of decoys outranking the given
    fraction, provided as a float, of known active ligands.
    The returned value is None if a) the calculation raises as
    ZeroDivisionError, or b) fraction_of_actives is generates
    more actives than are ranked, or c) the fraction_of_actives
    is greater than 1.0

@param fraction_of_actives:
    Decimal notation of the fraction of sampled actives, used
    to set the sampled set size.
@type fraction_of_actives:
    float

FOD is defined as::

                       __
             1         \    number_outranking_decoys_in_sampled_set
  FOD = -------------  /   ---------------------------------------
         num_actives   --         total_decoys

calcEFF(self, fraction_of_decoys)

 

@return:
    a float for the active recovery Efficiency (EFF) at a
    particular sample set size.  The returned value is None if
    the calculation raises a ZeroDivisionError.

@param fraction_of_decoys:
    The size of the set is in terms of the number of decoys
    in the screen.  For example, given 1000 decoys and
    fraction_of_decoys=0.20, actives that appear within the
    first 200 ranks are counted.
@type fraction_of_decoys:
    float


EFF is defined as::

                     frac. actives in sample
  EFF = (2* -----------------------------------------------) - 1
            frac actives in sample + frac. decoys in sample

calcActivesInN(self, n_sampled_set)

 
Parameters:
  • n_sampled_set (integer) - The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.
Returns:
the number of the known active ligands found in a given sample size.

calcActivesInNStar(self, n_sampled_set)

 
Parameters:
  • n_sampled_set (integer) - The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.
Returns:
the number of the known active ligands found in a given sample size.

calcAveNumberOutrankingDecoys(self)

 

@return:
    the average number of decoys that outranked the actives.

The rank of each active is adjusted by the number of outranking
actives.  The number of outranking decoys is then defined as the
adjusted rank of that active minus one.  The number of outranking
decoys is calculated for each docked active and averaged.

calcBEDROC(self, alpha=20.0)

 
Parameters:
  • alpha (float) - Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha=20 weights the first ~8% of the screen, alpha=10 weights the first ~10% of the screen, alpha=50 weights the first ~3% of the screen results.
Returns:
a tuple of two floats, the first represents the area under the curve for the Boltzmann-enhanced discrimination of ROC (BEDROC) analysis, the second is the alpha*Ra term.

calcRIE(self, alpha=20.0)

 
Parameters:
  • alpha (float) - Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha=20 weights the first ~8% of the screen, alpha=10 weights the first ~10% of the screen, alpha=50 weights the first ~3% of the screen results.
Returns: float
a float for the Robust Initial Enhancement (RIE).

calculateSensitivity(self, rank)

 

Calculates sensitivity at a particular rank, defined as:
    Se(rank) = found_actives / total_actives

@param rank: active rank at which to calculate the specificity
@type rank: int

@return: sensitivity of the screen at a given rank
@rtype: float

calculateSpecificity(self, rank)

 

Calculates specificity at a particular rank, defined as:
    Sp(rank) = discarded_decoys / total_decoys

@param rank: active rank at which to calculate the specificity
@type rank: int

@return: specificity of the screen at a given rank
@rtype: float

calcAUAC(self)

 
Returns: float
A float representation of the Area Under the Accumulation Curve.

calcROC(self)

 

@return:
    A float representation of the Receiver Operator Characteristic
    area underneath the curve.  Typically interpreted as the
    probability an active will appear before an inactive.
    A value of 1.0 reflects ideal performance, a value of 0.5
    reflects a performance on par with random selection.

Clasically ROC area is defined as:

         AUAC     Ra
  ROC = ------ - -----
          Ri      2Ri

    Where AUAC is the area under the accumulation curve, Ri is
    the ratio of inactives, Ra is the ratio of actives.

A different method is used here in order to account for unranked
actives - see PYTHON-3055 & PYTHON-3106

calcMWUROC(self, alpha=0.05)

 

@return:
    tuple of ROC AUC, the standard error, and estimated confidence
    interval (lower and upper bounds).

@param alpha:
    the signficance level.  Default is 0.05 (95% confidence interval)
@type alpha:
    float


Here, the ROC AUC is based on the Mann-Whitney-Wilcoxon U.
The U value is calculated directly::
U = R - ((n_a(n_a+1))/2)
where n_a is the number of actives and R is the sum of their ranks.

ROC AUC = ((n_a*n_i) - U)/(n_a*n_i)
n_a is the number of actives, n_i is the number of decoys.

SE = sqrt((A(1-A) + (n_a-1)(Q - A^2) + (n_i -1)(q - A^2))/(n_a*n_i))

CI = SE * scipy.stats.t.ppf((1+(1-alpha))/2.0, ((n_a+n_i)-1))

getPercentScreenCurvePoints(self)

 
Returns:
List of (%Screen, %Actives Found) tuples for the active ranks.

getActiveRankCsvRows(self)

 
Returns: list
a list of active Title, Rank, Sensitivity, Specificity, %Actives Found, %Screen tuples.

Note: this list may grow, but the relative order of the columns should remain fixed.

getROCCurvePoints(self)

 

Calculates set of points in ROC curve along each active rank.

Returns: list of tuples
list of (1 - specificity, sensitivity, rank) tuples

getROCAreaRomberg(self, lower_limit=0.0, upper_limit=1.0)

 
Returns: float
Receiver Operator Characteristic area under the curve as defined by a Romberg integration between arbitrary points along 1-Sp (domain: 0-1).

savePlot(self, png_file='plot.png', title='Screen Results', xlabel='1-Specificity', ylabel='Sensitivity')

 
Parameters:
  • png_file (string) - Path to output file, default is 'plot.png'.
  • title (string) - Plot title, default is 'Screen Results'.
  • xlabel (string) - x-axis label, default is '1-Specificity'.
  • ylabel (string) - y-axis label, default is 'Sensitivity'.
Returns:
None. Saves a image of the ROC plot, Sensitivity v 1-Specificity, to a png file.

report(self, file_handle=sys.stdout, header='', footer='')

 
Parameters:
  • file_handle (file) - File handle-like object, default is sys.stdout.
Returns:
None. Prints text summary of results to the file_handle.

getCsvRows(self)

 
Returns: list
a list of header and enrichment value tuples.

_getActiveSampleSizeStar(self, fraction_of_actives)

 
Parameters:
  • fraction_of_actives (float) - Decimal notation for the fraction of sampled actives, used to determine the sample set size.
Returns:
The size of the decoy sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.

_getActiveSampleSize(self, fraction_of_actives)

 
Parameters:
  • fraction_of_actives (float) - Decimal notation for the fraction of sampled actives, used to determine the sample set size.
Returns:
the size of the sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.

_getDecoySampleSize(self, fraction_of_decoys)

 
Parameters:
  • fraction_of_actives (float) - Decimal notation for the fraction of sampled actives, used to determine the sample set size.
Returns:
the size of the sample set required to recover the specified fraction of decoys. If there are fewer decoys than the requested fraction of all decoys then the number of total_decoys is returned.

format(self, value, precision=2)

 
Parameters:
  • value (float or None) - Float value to format as string.
  • precision (integer) - Number of digits after the decimal.
Returns:
a string representation of the passed value. If the value is None then the returned string is 'n/a'. Uses %g formatting idiom so large values are returned as exponentials.

Property Details [hide private]

max_ef_value

Get Method:
_getMaxEfValue(self)
Set Method:
_setMaxEfValue(self, max_ef_value)
Delete Method:
'Upper limit of Enrichment metric values.  Default is \'inf\'.  The ma\
x_ef_value is assigned in cases where a ZeroDivisionError is caught bu\
t there are known actives in the sampled set.  In contrast, a value of\
 None is assigned where a ZeroDivisionError is caught but there are no\
 known actives in the sampled set.'