Trees | Indices | Help |
|
---|
|
object --+ | Calculator
A class to calculate enrichment terms for a screen. API examples ------------ # Ex. 1) Reading screen result data from file. efcalc = enrichment.Calculator( actives_file_name = "my_actives.txt", # Active titles, one per line. results = "screen_results.rept", # Glide report file. total_decoys = 1000 ) efcalc.calculateMetrics() # Calculate a default suite of terms. efcalc.report() # Print default report to standard out. efcalc.savePlot() # Create default graph png. print efcalc.calcBEDROC(alpha=20) # Print the BEDROC metric value. # Ex. 2) Using a structure sequence as screen result data. results = [] for st in structure.StructureReader('iglur_dock_pv.maegz', 2): results.append(st) efcalc = enrichment.Calculator( actives_file_name = "my_actives.txt", # Active titles, one per line. results = results, # Iterable sequence of structure.Structure objects. total_decoys = 1000 ) efcalc.calculateMetrics() efcalc.report() Class data ---------- table_sep (string) Token used to separate column fields. Default is a None, i.e. whitespace. rept_file_ext (list) List of parsable Glide report file extensions. csv_file_ext (list) List of parsable csv file extensions. table_file_ext (list) List of parsable table file extensions. ef_precision (int) Number of decimals when reporting EF values. Default = 2 efp_precision (int) Number of decimals when reporting EF' values. Default = 2 efs_precision (int) Number if decimals when reporting EF* values. Default = 2 eff_precision (int) Number of decimals when reporting Eff values. Default = 3 fod_precision (int) Number of decimals when reporting FOD values. Default = 1 Instance data ------------- total_actives (int) The number of all active ligands in the screen, ranked and unranked. total_ligands (int) The number of the total number of ligands (actives and unknowns/decoys) used in the screen. active_ranks (list): List of *unadjusted* integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3]. adjusted_active_ranks (list) Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives. active_titles (list) List of strings; the titles of the known actives in the screen. missing_active_titles (list) List of strings; the titles of ligands not discovered in the screen results. title_ranks (dict): *Unadjusted* integer rank keys for title. Not available for table inputs, or other screen results that don't list the title. active_fingerprint (dict): Title keys for fingerprint. Not available for screen results that don't include title and structure information.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
float |
|
||
|
|||
|
|||
float |
|
||
|
|||
|
|||
|
|||
list |
|
||
list of tuples |
|
||
float |
|
||
|
|||
|
|||
|
|||
list |
|
||
|
|||
|
|||
|
|||
|
|||
|
|||
Inherited from |
|
|||
list |
|
||
list |
|
||
|
|
|||
max_ef_value | |||
|
x.__init__(...) initializes x; see help(type(x)) for signature
|
repr(x)
|
Assigns active_titles and total_actives by parsing a structure file or text file. The file must be formatted such that there is one ligand title per line. |
Assigns active_titles and total_actives by parsing a table file. (Deprecated) A table file is formatted such that the first column is the rank of each retrieved active, the second column is the cummulative count of actives found, and the last row contains the total number of ligands screened and the total count of the actives possible to find in the screen. |
Sets the active_ranks data member with ranks from a Glide report file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank. |
Sets the active_ranks data member with ranks from a structure file. Structures are assumed to be in rank order. Duplicate titles are assigned the earliest rank. |
Sets the active_ranks data member with ranks from a csv file. Rows are presumed to be in order, and a 'Title' field is required. |
|
|
|
Sets the active_ranks data member with ranks from a sequence of structures. Also assigns active_fingerprint and missing_active_titles data members. |
Sets min_Tc_total_actives, a float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs. Tc is scaled from 0.0-1.0, where a 1.0 indicates a high degree of similarity. |
@return: a float representation of the diversity factor. @rtype: float df = (1- min_Tc_actives)/(1 - min_Tc_total_actives) |
Sets a suite of enrichment factor terms as instance data members. See cmdline_doc for description of metrics. The standard suite includes the attributes: self.ave_num_outranking_decoys self.bedroc20 (alpha=20.0) self.bedroc160_9 (alpha=160.9) self.bedroc8_0 (alpha=8.0) self.roc self.rie self.auac self.ef_40 (EF 40% of actives) self.ef_50 (EF 50% of actives) self.ef_60 (EF 60% of actives) self.ef_70 (EF 70% of actives) self.ef_80 (EF 80% of actives) self.ef_90 (EF 90% of actives) self.ef_100 (EF 100% of actives) self.ef_1pct (EF top 1% of total ligands) self.ef_2pct (EF top 2% of total ligands) self.ef_5pct (EF top 5% of total ligands) self.ef_10pct (EF top 10% of total ligands) self.ef_20pct (EF top 20% of total ligands) self.efs_40 (EF* 40% of actives) self.efs_50 (EF* 50% of actives) self.efs_60 (EF* 60% of actives) self.efs_70 (EF* 70% of actives) self.efs_80 (EF* 80% of actives) self.efs_90 (EF* 90% of actives) self.efs_100 (EF* 100% of actives) self.efs_1pct (EF* top 1% of total decoys) self.efs_2pct (EF* top 2% of total decoys) self.efs_5pct (EF* top 5% of total decoys) self.efs_10pct (EF* top 10% of total decoys) self.efs_20pct (EF* top 20% of total decoys) self.efp_40 (EF' 40% of actives) self.efp_50 (EF' 50% of actives) self.efp_60 (EF' 60% of actives) self.efp_70 (EF' 70% of actives) self.efp_80 (EF' 80% of actives) self.efp_90 (EF' 90% of actives) self.efp_100 (EF' 100% of actives) self.efp_1pct (EF' top 1% of total decoys) self.efp_2pct (EF' top 2% of total decoys) self.efp_5pct (EF' top 5% of total decoys) self.efp_10pct (EF' top 10% of total decoys) self.efp_20pct (EF' top 20% of total decoys) self.fod_40 (FOD 40% of actives) self.fod_50 (FOD 50% of actives) self.fod_60 (FOD 60% of actives) self.fod_70 (FOD 70% of actives) self.fod_80 (FOD 80% of actives) self.fod_90 (FOD 90% of actives) self.fod_100 (FOD 100% of actives) self.eff_1pct (Eff top 1% of total decoys) self.eff_2pct (Eff top 2% of total decoys) self.eff_5pct (Eff top 5% of total decoys) self.eff_10pct (Eff top 10% of total decoys) self.eff_20pct (Eff top 20% of total decoys) self.actives_in_top_1_pct (of total ligands) self.actives_in_top_2_pct (of total ligands) self.actives_in_top_5_pct (of total ligands) self.actives_in_top_10_pct (of total ligands) self.actives_in_top_20_pct (of total ligands) self.pct_actives_in_top_1_pct (of total ligands) self.pct_actives_in_top_2_pct (of total ligands) self.pct_actives_in_top_5_pct (of total ligands) self.pct_actives_in_top_10_pct (of total ligands) self.pct_actives_in_top_20_pct (of total ligands) self.actives_in_top_1_pct_star (of total decoys) self.actives_in_top_2_pct_star (of total decoys) self.actives_in_top_5_pct_star (of total decoys) self.actives_in_top_10_pct_star (of total decoys) self.actives_in_top_20_pct_star (of total decoys) self.pct_actives_in_top_1_pct_star (of total decoys) self.pct_actives_in_top_2_pct_star (of total decoys) self.pct_actives_in_top_5_pct_star (of total decoys) self.pct_actives_in_top_10_pct_star (of total decoys) self.pct_actives_in_top_20_pct_star (of total decoys) self.def_1pct (DEF top 1% of actives) self.def_2pct (DEF top 2% of actives) self.def_5pct (DEF top 5% of actives) self.def_10pct (DEF top 10% of actives) self.def_20pct (DEF top 20% of actives) self.defs_1pct (DEF* top 1% of actives) self.defs_2pct (DEF* top 2% of actives) self.defs_5pct (DEF* top 5% of actives) self.defs_10pct (DEF* top 10% of actives) self.defs_20pct (DEF* top 20% of actives) self.defp_1pct (DEF' top 1% of total decoys) self.defp_2pct (DEF' top 2% of total decoys) self.defp_5pct (DEF' top 5% of total decoys) self.defp_10pct (DEF' top 10% of total decoys) self.defp_20pct (DEF' top 20% of total decoys) |
@return: the Enrichment factor (EF) for the given sample size of the screen results. If the fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None. @param n_sampled_set: The number of ranked results for which to calculate the enrichment factor. @type n_sampled_set: integer @param min_actives: The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None. @type min_actives: integer EF is defined as:: n_actives_in_sampled_set / n_sampled_set EF = ---------------------------------------- total_actives / total_ligands where 'n_sampled_set' is the number of *all* ranks in which to search for actives. |
@return: the Enrichment factor* (EF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If the fewer than min_actives are found in the set the returned value is None. @param n_sampled_decoy_set: The number of ranked decoys for which to calculate the enrichment factor. @type n_sampled_decoy_set: integer @param min_actives: The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None. @type min_actives: integer Here, EF* is defined as:: n_actives_in_sampled_set / n_sampled_decoy_set EF* = ---------------------------------------------- total_actives / total_decoys where 'n_sampled_decoy_set' is the number of *decoy* ranks in which to search for actives. |
@return: the Enrichment Factor prime (EF') for a given sample size. If the fewer than min_actives are found in the set the returned value is None. @param n_sampled_decoy_set: The number of ranked decoy results for which to calculate the enrichment factor. @type n_sampled_decoy_set: integer @param min_actives: The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF' value is None. @type min_actives: integer EF' is defined as:: n_actives_sampled_set EF' = ------------------------------------------- cummulative_sum(frac. decoys/frac. actives) |
@return: Diverse Enrichment factor (DEF) for the given sample size of the screen results. If the fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None. @param n_sampled_set: The number of ranked decoy results for which to calculate the enrichment factor. @type n_sampled_set: integer @param min_actives: The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None. @type min_actives: integer DEF is defined as:: 1 - (min_similarity_among_actives_in_sampled_set) DEF = EF * -------------------------------------------------- 1 - (min_similarity_among_all_actives) where 'n_sampled_set' is the number of *all* ranks in which to search for actives. |
@return: Diverse Enrichment factor (DEF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If the fewer than min_actives are found in the set the returned value is None. @param n_sampled_decoy_set: The number of ranked decoys for which to calculate the enrichment factor. @type n_sampled_decoy_set: integer @param min_actives: The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None. @type min_actives: integer Here, DEF* is defined as:: 1 - (min_similarity_among_actives_in_sampled_set) DEF = EF_star * -------------------------------------------------- 1 - (min_similarity_among_all_actives) where 'n_sampled_decoy_set' is the number of *decoy* ranks in which to search for actives. |
@return: Diverse Enrichment Factor prime (DEF') for a given sample size. If the fewer than min_actives are found in the set the returned value is None. @param n_sampled_decoy_set: The number of ranked decoy results for which to calculate the enrichment factor. @type n_sampled_decoy_set: integer @param min_actives: The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF' value is None. @type min_actives: integer DEF' is defined as:: 1 - (min_similarity_among_actives_in_sampled_set) DEF' = EF' * -------------------------------------------------- 1 - (min_similarity_among_all_actives) |
@return: the average fraction of decoys outranking the given fraction, provided as a float, of known active ligands. The returned value is None if a) the calculation raises as ZeroDivisionError, or b) fraction_of_actives is generates more actives than are ranked, or c) the fraction_of_actives is greater than 1.0 @param fraction_of_actives: Decimal notation of the fraction of sampled actives, used to set the sampled set size. @type fraction_of_actives: float FOD is defined as:: __ 1 \ number_outranking_decoys_in_sampled_set FOD = ------------- / --------------------------------------- num_actives -- total_decoys |
@return: a float for the active recovery Efficiency (EFF) at a particular sample set size. The returned value is None if the calculation raises a ZeroDivisionError. @param fraction_of_decoys: The size of the set is in terms of the number of decoys in the screen. For example, given 1000 decoys and fraction_of_decoys=0.20, actives that appear within the first 200 ranks are counted. @type fraction_of_decoys: float EFF is defined as:: frac. actives in sample EFF = (2* -----------------------------------------------) - 1 frac actives in sample + frac. decoys in sample |
|
|
@return: the average number of decoys that outranked the actives. The rank of each active is adjusted by the number of outranking actives. The number of outranking decoys is then defined as the adjusted rank of that active minus one. The number of outranking decoys is calculated for each docked active and averaged. |
|
|
Calculates sensitivity at a particular rank, defined as: Se(rank) = found_actives / total_actives @param rank: active rank at which to calculate the specificity @type rank: int @return: sensitivity of the screen at a given rank @rtype: float |
Calculates specificity at a particular rank, defined as: Sp(rank) = discarded_decoys / total_decoys @param rank: active rank at which to calculate the specificity @type rank: int @return: specificity of the screen at a given rank @rtype: float |
|
@return: A float representation of the Receiver Operator Characteristic area underneath the curve. Typically interpreted as the probability an active will appear before an inactive. A value of 1.0 reflects ideal performance, a value of 0.5 reflects a performance on par with random selection. Clasically ROC area is defined as: AUAC Ra ROC = ------ - ----- Ri 2Ri Where AUAC is the area under the accumulation curve, Ri is the ratio of inactives, Ra is the ratio of actives. A different method is used here in order to account for unranked actives - see PYTHON-3055. |
@return: tuple of ROC AUC, the standard error, and estimated confidence interval (lower and upper bounds). @param alpha: the signficance level. Default is 0.05 (95% confidence interval) @type alpha: float Here, the ROC AUC is based on the Mann-Whitney-Wilcoxon U. The U value is calculated directly:: U = R - ((n_a(n_a+1))/2) where n_a is the number of actives and R is the sum of their ranks. ROC AUC = ((n_a*n_i) - U)/(n_a*n_i) n_a is the number of actives, n_i is the number of decoys. SE = sqrt((A(1-A) + (n_a-1)(Q - A^2) + (n_i -1)(q - A^2))/(n_a*n_i)) CI = SE * scipy.stats.t.ppf((1+(1-alpha))/2.0, ((n_a+n_i)-1)) |
|
Note: this list may grow, but the relative order of the columns should remain fixed. |
Calculates set of points in ROC curve along each active rank.
|
|
|
|
|
|
|
|
|
|
max_ef_value
|
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Oct 26 00:59:34 2016 | http://epydoc.sourceforge.net |