| Trees | Indices | Help | 
 | 
|---|
|  | 
object --+
         |
        Calculator
A class to calculate enrichment terms for a screen.
API examples
------------
# Ex. 1)  Reading screen result data from file.
efcalc = enrichment.Calculator(
    actives_file_name = "my_actives.txt",  # Active titles, one per line.
    results = "screen_results.rept", # Glide report file.
    total_decoys = 1000
)
efcalc.calculateMetrics() # Calculate a default suite of terms.
efcalc.report() # Print default report to standard out.
efcalc.savePlot() # Create default graph png.
print efcalc.calcBEDROC(alpha=20) # Print the BEDROC metric value.
# Ex. 2) Using a structure sequence as screen result data.
results = []
for st in structure.StructureReader('iglur_dock_pv.maegz', 2):
    results.append(st)
efcalc = enrichment.Calculator(
    actives_file_name = "my_actives.txt", # Active titles, one per line.
    results = results, # Iterable sequence of structure.Structure objects.
    total_decoys = 1000
)
efcalc.calculateMetrics()
efcalc.report()
Class data
----------
table_sep (string)
    Token used to separate column fields.  Default is a None,
    i.e. whitespace.
rept_file_ext (list)
    List of parsable Glide report file extensions.
csv_file_ext (list)
    List of parsable csv file extensions.
table_file_ext (list)
    List of parsable table file extensions.
ef_precision (int)
    Number of decimals when reporting EF values.  Default = 2
efp_precision (int)
    Number of decimals when reporting EF' values.  Default = 2
efs_precision (int)
    Number if decimals when reporting EF* values.  Default = 2
eff_precision (int)
    Number of decimals when reporting Eff values.  Default = 3
fod_precision (int)
    Number of decimals when reporting FOD values.  Default = 1
Instance data
-------------
total_actives (int)
    The number of all active ligands in the screen, ranked
    and unranked.
total_ligands (int)
    The number of the total number of ligands (actives and
    unknowns/decoys) used in the screen.
active_ranks (list):
    List of *unadjusted* integer ranks for the actives found in the
    screen.  For example, a screen result that placed three actives
    as the first three ranks has an active_ranks list of = [1, 2, 3].
adjusted_active_ranks (list)
    Modified active ranks; each rank is improved by the number of
    preceding actives.  For example, a screen result that placed
    three actives as the first three ranks, [1, 2, 3], has adjusted
    ranks of [1, 1, 1].  In this way, actives are not penalized by
    being outranked by other actives.
active_titles (list)
    List of strings; the titles of the known actives in the screen.
missing_active_titles (list)
    List of strings; the titles of ligands not discovered in the
    screen results.
title_ranks (dict):
    *Unadjusted* integer rank keys for title.  Not available for
    table inputs, or other screen results that don't list the title.
active_fingerprint (dict):
    Title keys for fingerprint.  Not available for screen results
    that don't include title and structure information.
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| float | 
 | ||
| 
 | |||
| 
 | |||
| float | 
 | ||
| 
 | |||
| 
 | |||
| 
 | |||
| list | 
 | ||
| 
 | |||
| float | 
 | ||
| 
 | |||
| 
 | |||
| 
 | |||
| list | 
 | ||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| 
 | |||
| Inherited from  | |||
| 
 | |||
| list | 
 | ||
| list | 
 | ||
| 
 | |||
| 
 | |||
| max_ef_value | |||
| 
 | |||
| 
 x.__init__(...) initializes x; see help(type(x)) for signature 
 | 
| 
 repr(x) 
 | 
| 
 Assigns active_titles and total_actives by parsing a structure file or text file. The file must be formatted such that there is one ligand title per line. | 
| 
 Assigns active_titles and total_actives by parsing a table file. A table file is formatted such that the first column is the rank of each retrieved active, the second column is the cummulative count of actives found, and the last row contains the total number of ligands screened and the total count of the actives possible to find in the screen. | 
| 
 Sets the active_ranks data member with ranks from a Glide report file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank. | 
| 
 Sets the active_ranks data member with ranks from a structure file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank. | 
| 
 Sets the active_ranks data member with ranks from a csv file. Rows are presumed to be in order, and a 'Title' field is required. | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 Sets the active_ranks data member with ranks from a sequence of structures. Also assigns active_fingerprint and missing_active_titles data members. | 
| 
 Sets min_Tc_total_actives, a float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs. Tc is scaled from 0.0-1.0, where a 1.0 indicates a high degree of similarity. | 
| 
 
@return:
    a float representation of the diversity factor.
@rtype:
    float
df = (1- min_Tc_actives)/(1 - min_Tc_total_actives)
 | 
| 
 Sets a suite of enrichment factor terms as instance data members. See cmdline_doc for description of metrics. The standard suite includes the attributes: self.ave_num_outranking_decoys self.bedroc20 (alpha=20.0) self.bedroc160_9 (alpha=160.9) self.bedroc8_0 (alpha=8.0) self.roc self.rie self.auac self.ef_40 (EF 40% of actives) self.ef_50 (EF 50% of actives) self.ef_60 (EF 60% of actives) self.ef_70 (EF 70% of actives) self.ef_80 (EF 80% of actives) self.ef_90 (EF 90% of actives) self.ef_100 (EF 100% of actives) self.ef_1pct (EF top 1% of total ligands) self.ef_2pct (EF top 2% of total ligands) self.ef_5pct (EF top 5% of total ligands) self.ef_10pct (EF top 10% of total ligands) self.ef_20pct (EF top 20% of total ligands) self.efs_40 (EF* 40% of actives) self.efs_50 (EF* 50% of actives) self.efs_60 (EF* 60% of actives) self.efs_70 (EF* 70% of actives) self.efs_80 (EF* 80% of actives) self.efs_90 (EF* 90% of actives) self.efs_100 (EF* 100% of actives) self.efs_1pct (EF* top 1% of total decoys) self.efs_2pct (EF* top 2% of total decoys) self.efs_5pct (EF* top 5% of total decoys) self.efs_10pct (EF* top 10% of total decoys) self.efs_20pct (EF* top 20% of total decoys) self.efp_40 (EF' 40% of actives) self.efp_50 (EF' 50% of actives) self.efp_60 (EF' 60% of actives) self.efp_70 (EF' 70% of actives) self.efp_80 (EF' 80% of actives) self.efp_90 (EF' 90% of actives) self.efp_100 (EF' 100% of actives) self.efp_1pct (EF' top 1% of total decoys) self.efp_2pct (EF' top 2% of total decoys) self.efp_5pct (EF' top 5% of total decoys) self.efp_10pct (EF' top 10% of total decoys) self.efp_20pct (EF' top 20% of total decoys) self.fod_40 (FOD 40% of actives) self.fod_50 (FOD 50% of actives) self.fod_60 (FOD 60% of actives) self.fod_70 (FOD 70% of actives) self.fod_80 (FOD 80% of actives) self.fod_90 (FOD 90% of actives) self.fod_100 (FOD 100% of actives) self.eff_1pct (Eff top 1% of total decoys) self.eff_2pct (Eff top 2% of total decoys) self.eff_5pct (Eff top 5% of total decoys) self.eff_10pct (Eff top 10% of total decoys) self.eff_20pct (Eff top 20% of total decoys) self.actives_in_top_1_pct (of total ligands) self.actives_in_top_2_pct (of total ligands) self.actives_in_top_5_pct (of total ligands) self.actives_in_top_10_pct (of total ligands) self.actives_in_top_20_pct (of total ligands) self.pct_actives_in_top_1_pct (of total ligands) self.pct_actives_in_top_2_pct (of total ligands) self.pct_actives_in_top_5_pct (of total ligands) self.pct_actives_in_top_10_pct (of total ligands) self.pct_actives_in_top_20_pct (of total ligands) self.actives_in_top_1_pct_star (of total decoys) self.actives_in_top_2_pct_star (of total decoys) self.actives_in_top_5_pct_star (of total decoys) self.actives_in_top_10_pct_star (of total decoys) self.actives_in_top_20_pct_star (of total decoys) self.pct_actives_in_top_1_pct_star (of total decoys) self.pct_actives_in_top_2_pct_star (of total decoys) self.pct_actives_in_top_5_pct_star (of total decoys) self.pct_actives_in_top_10_pct_star (of total decoys) self.pct_actives_in_top_20_pct_star (of total decoys) self.def_1pct (DEF top 1% of actives) self.def_2pct (DEF top 2% of actives) self.def_5pct (DEF top 5% of actives) self.def_10pct (DEF top 10% of actives) self.def_20pct (DEF top 20% of actives) self.defs_1pct (DEF* top 1% of actives) self.defs_2pct (DEF* top 2% of actives) self.defs_5pct (DEF* top 5% of actives) self.defs_10pct (DEF* top 10% of actives) self.defs_20pct (DEF* top 20% of actives) self.defp_1pct (DEF' top 1% of total decoys) self.defp_2pct (DEF' top 2% of total decoys) self.defp_5pct (DEF' top 5% of total decoys) self.defp_10pct (DEF' top 10% of total decoys) self.defp_20pct (DEF' top 20% of total decoys) | 
| 
 
@return:
    the Enrichment factor (EF) for the given sample size of the
    screen results.  If the fewer than min_actives are found
    in the set, or the calculation raises a ZeroDivisionError,
    the returned value is None.
@param n_sampled_set:
    The number of ranked results for which to calculate
    the enrichment factor.
@type n_sampled_set:
    integer
@param min_actives:
    The number of actives that must be within the n_sampled_set,
    otherwise the returned EF value is None.
@type min_actives:
    integer
EF is defined as::
        n_actives_in_sampled_set / n_sampled_set
  EF =  ----------------------------------------
             total_actives / total_ligands
where 'n_sampled_set' is the number of *all* ranks in which
to search for actives.
 | 
| 
 
@return:
    the Enrichment factor* (EF*) for the given sample size of
    the screen results, calculated with respect to the total
    decoys instead of the more traditional total ligands.
    If the fewer than min_actives are found in the set the
    returned value is None.
@param n_sampled_decoy_set:
    The number of ranked decoys for which to calculate the
    enrichment factor.
@type n_sampled_decoy_set:
    integer
@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF value is None.
@type min_actives:
    integer
Here, EF* is defined as::
         n_actives_in_sampled_set / n_sampled_decoy_set
  EF* =  ----------------------------------------------
              total_actives / total_decoys
where 'n_sampled_decoy_set' is the number of *decoy* ranks in
which to search for actives.
 | 
| 
 
@return:
    the Enrichment Factor prime (EF') for a given sample size.
    If the fewer than min_actives are found in the set the
    returned value is None.
@param n_sampled_decoy_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_decoy_set:
    integer
@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF' value is None.
@type min_actives:
    integer
EF' is defined as::
                   n_actives_sampled_set
   EF' = -------------------------------------------
         cummulative_sum(frac. decoys/frac. actives)
 | 
| 
 
@return:
    Diverse Enrichment factor (DEF) for the given sample size of
    the screen results.  If the fewer than min_actives are found
    in the set, or the calculation raises a ZeroDivisionError,
    the returned value is None.
@param n_sampled_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_set:
    integer
@param min_actives:
    The number of actives that must be within the n_sampled_set,
    otherwise the returned EF value is None.
@type min_actives:
    integer
DEF is defined as::
              1 - (min_similarity_among_actives_in_sampled_set)
  DEF = EF * --------------------------------------------------
              1 - (min_similarity_among_all_actives)
where 'n_sampled_set' is the number of *all* ranks in which
to search for actives.
 | 
| 
 
@return:
    Diverse Enrichment factor (DEF*) for the given sample size
    of the screen results, calculated with respect to the total
    decoys instead of the more traditional total ligands.
    If the fewer than min_actives are found in the set the
    returned value is None.
@param n_sampled_decoy_set:
    The number of ranked decoys for which to calculate the
    enrichment factor.
@type n_sampled_decoy_set:
    integer
@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF value is None.
@type min_actives:
    integer
Here, DEF* is defined as::
                   1 - (min_similarity_among_actives_in_sampled_set)
  DEF = EF_star * --------------------------------------------------
                        1 - (min_similarity_among_all_actives)
where 'n_sampled_decoy_set' is the number of *decoy* ranks in
which to search for actives.
 | 
| 
 
@return:
    Diverse Enrichment Factor prime (DEF') for a given sample
    size.  If the fewer than min_actives are found in the set
    the returned value is None.
@param n_sampled_decoy_set:
    The number of ranked decoy results for which to calculate
    the enrichment factor.
@type n_sampled_decoy_set:
    integer
@param min_actives:
    The number of actives that must be within the
    n_sampled_decoy_set, otherwise the returned EF' value is None.
@type min_actives:
    integer
DEF' is defined as::
               1 - (min_similarity_among_actives_in_sampled_set)
  DEF' = EF' * --------------------------------------------------
                    1 - (min_similarity_among_all_actives)
 | 
| 
 
@return:
    the average fraction of decoys outranking the given
    fraction, provided as a float, of known active ligands.
    The returned value is None if a) the calculation raises as
    ZeroDivisionError, or b) fraction_of_actives is generates
    more actives than are ranked, or c) the fraction_of_actives
    is greater than 1.0
@param fraction_of_actives:
    Decimal notation of the fraction of sampled actives, used
    to set the sampled set size.
@type fraction_of_actives:
    float
FOD is defined as::
                       __
             1         \    number_outranking_decoys_in_sampled_set
  FOD = -------------  /   ---------------------------------------
         num_actives   --         total_decoys
 | 
| 
 
@return:
    a float for the active recovery Efficiency (EFF) at a
    particular sample set size.  The returned value is None if
    the calculation raises a ZeroDivisionError.
@param fraction_of_decoys:
    The size of the set is in terms of the number of decoys
    in the screen.  For example, given 1000 decoys and
    fraction_of_decoys=0.20, actives that appear within the
    first 200 ranks are counted.
@type fraction_of_decoys:
    float
EFF is defined as::
                     frac. actives in sample
  EFF = (2* -----------------------------------------------) - 1
            frac actives in sample + frac. decoys in sample
 | 
| 
 
 | 
| 
 
 | 
| 
 
@return:
    the average number of decoys that outranked the actives.
The rank of each active is adjusted by the number of outranking
actives.  The number of outranking decoys is then defined as the
adjusted rank of that active minus one.  The number of outranking
decoys is calculated for each docked active and averaged.
 | 
| 
 
 | 
| 
 
 | 
| 
 
@return:
    a float representation of the sensitivity of the screen.
@param rank:
    Active rank at which to calculate the specificity.
@type rank:
    integer
Sensitivity, at a particular rank, is defined as::
          n_ranked_actives
  Se =    -----------------
            total_actives
 | 
| 
 
@return:
    the specificity of the screen evaluated at a particular
    active's rank.
@rtype:
    float
@param rank:
    Active rank at which to calculate the specificity.
@type rank:
    integer
Specificity, at a particular level, is defined as::
               discarded_decoys
  Sp(rank) =  ------------------
                 total_decoys
 | 
| 
 
 | 
| 
 
@return:
    A float representation of the Receiver Operator Characteristic
    area underneath the curve.  Typically interpreted as the
    probability an active will appear before an inactive.
    A value of 1.0 reflects ideal performance, a value of 0.5
    reflects a performance on par with random selection.
ROC area is defined as::
         AUAC     Ra
  ROC = ------ - -----
          Ri      2Ri
    Where AUAC is the area under the accumulation curve, Ri is
    the ratio of inactives, Ra is the ratio of actives.
 | 
| 
 
@return:
    tuple of ROC AUC, the standard error, and estimated confidence
    interval (lower and upper bounds).
@param alpha:
    the signficance level.  Default is 0.05 (95% confidence interval)
@type alpha:
    float
Here, the ROC AUC is based on the Mann-Whitney-Wilcoxon U.
The U value is calculated directly::
U = R - ((n_a(n_a+1))/2)
where n_a is the number of actives and R is the sum of their ranks.
ROC AUC = ((n_a*n_i) - U)/(n_a*n_i)
n_a is the number of actives, n_i is the number of decoys.
SE = sqrt((A(1-A) + (n_a-1)(Q - A^2) + (n_i -1)(q - A^2))/(n_a*n_i))
CI = SE * scipy.stats.t.ppf((1+(1-alpha))/2.0, ((n_a+n_i)-1))
 | 
| 
 
 | 
| 
 
 Note: this list may grow, but the relative order of the columns should remain fixed. | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 
 | 
| 
 | |||
| max_ef_value
 | 
| Trees | Indices | Help | 
 | 
|---|
| Generated by Epydoc 3.0.1 on Wed Aug 3 07:59:22 2016 | http://epydoc.sourceforge.net |