schrodinger.analysis.enrichment.enrichment_input module¶

Input file parser for enrichment module.

For most virtual screen result input formats, titles are used to identify the ligands. The input is expected to be correctly ordered. If it is not ordered, please set the optional parameter sort_header in parser functions to the correct score header/property. If the file contains duplicate titles then only the first occurrence of a unique title is ranked.

Input file formats:

<actives_file>
    Text file.
        Raw text, one title per line.
    Structure file.
        A file containing structures with a meaningful title.
    CSV file.
        A comma-separated values file.
    List(str).
        A list of active string titles.
<results_file>
    Structure file, e.g. 'foo_pv.mae'
        A file containing ordered structures.
    CSV file.
        A comma-separated values file containing ranked titles ordered by
        virtual screen scoring metric.
    List(str) or List(structure).
        A list of ranked titles ordered by virtual screen scoring metric.

API examples:

# Ex. 1) Calculate BEDROC
active_titles = extract_active_titles_from_txt(actives_file)
total_actives, total_ligands, active_ranks, adjusted_active_ranks,
    total_ranked, title_ranks = extract_ranks_from_mae(
        mae_file_name="screen_results.maegz",
        active_titles=active_titles,
        num_decoy=1000)
bedroc, bedroc_ra = metrics.calcBEDROC(total_actives, total_ligands,
                                       active_ranks, 20.0)

# Ex. 2) Using the reporter class to calculate the default set of metrics.
         Note that this is not a good practice.
r = reporter.EnrichmentReporter(
    actives_file="my_actives.txt",
    results_file="screen_results.maegz",
    num_decoy=1000)
r.report()

class schrodinger.analysis.enrichment.enrichment_input.FingerprintComponent(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)[source]¶

Bases: object

Data class that contains critical objects that all fingerprint-related metrics functions (calc_DEF, calc_DEFStar and calc_DEFP) need.

Variables

fp_gen (CanvasFingerprintGenerator) – Object needed to generate fingerprint for each active title.
fp_sim (CanvasFingerprintSimilarity) – Object needed to compare fingerprint similarity for each active pair.
active_fingerprint (dict) – Title keys for fingerprint. Not available for screen results that don’t include title and structure information.
min_Tc_total_actives (float) – A float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs.

__init__(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)[source]¶: Initialize self. See help(type(self)) for accurate signature.

schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_csv(actives_file)[source]¶

Parse actives_file as a csv file, return distinct active titles. Repeated active titles are ignored.

Parameters: actives_file (str) – A csv file containing all active titles.
Returns: Distinct active titles from the actives file.
Return type: set(str)

schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_mae(actives_file)[source]¶

Parse actives_file as a maestro file, return distinct active titles. Repeated active titles are ignored.

Parameters: actives_file (str) – A maestro file containing all active titles.
Returns: Distinct active titles from the actives file.
Return type: set(str)

schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_txt(actives_file)[source]¶

Parse actives_file as a raw text file with one title per line, return distinct active titles from the actives file. Repeated active titles are ignored.

Parameters: actives_file (str) – Raw text file containing one title per line.
Returns: Distinct active titles from the actives file.
Return type: set(str)

schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_list(actives)[source]¶

Parse actives from list of string, return distinct active titles from the list. Repeated active titles are ignored.

Parameters: actives (list(str)) – A list of strings containing all active titles.
Returns: Distinct active titles from the actives file.
Return type: set(str)

schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_list(titles_iter, active_titles, num_decoy=0)[source]¶

Compute and return rank and count related terms from a list of ligand titles pre-sorted by virtual screen scoring metric.

Parameters

titles_iter (list(str)) – A list of title strings, pre-sorted by virtual screen scoring metric.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.

Returns

A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.

Return type

int, int, list(int), list(int), int, dict(str, int)

schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_csv(csv_file_name, active_titles, num_decoy=0, id_header='Title', sort_header=None)[source]¶

Compute and return rank and count related terms from a csv file.

Parameters

csv_file_name (str) – File name of the csv file that contains the virtual screening result.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_header (str) – Name of compound-identifying header.
sort_header (str) – Name of the virtual screen scoring metric header to sort on. (not implemented)

Returns

A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.

Return type

int, int, list(int), list(int), int, dict(str, int)

schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_structures(structure_iter, active_titles, num_decoy=0, id_property='s_m_title', sort_property=None)[source]¶

Compute and return rank and count related terms from a list of structures.

Parameters

structure_iter (list(structure.Structure)) – A list of structure.Structure.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_property (str) – Name of compound-identifying property.
sort_property (str) – Name of the virtual screen scoring metric property to sort on. (not implemented)

Returns

A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.

Return type

int, int, list(int), list(int), int, dict(str, int)

schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_mae(mae_file_name, active_titles, num_decoy=0, id_property='s_m_title', sort_property=None)[source]¶

Compute and return rank and count related terms from a structure file.

Parameters

mae_file_name (str) – A structure file that contains the virtual screening result.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_property (str) – Name of compound-identifying property.
sort_property (str) – Name of the virtual screen scoring metric property to sort on. (not implemented)

Returns

A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.

Return type

int, int, list(int), list(int), int, dict(str, int)

schrodinger.analysis.enrichment.enrichment_input.get_fingerprint_components(structure_file, active_titles, id_property='s_m_title')[source]¶

Initialize and return a data class object needed for fingerprint-related calculations.

Parameters

structure_file (str or list(str)) – Structure file or a list of structures.
active_titles (set(str)) – Distinct active titles from the actives file
id_property (str) – Name of compound-identifying property.

Returns

The initialized enrichment_input.FingerprintComponent object.

Return type

enrichment_input.FingerprintComponent

schrodinger.analysis.enrichment.enrichment_input module¶

Previous topic

Next topic