schrodinger.analysis.enrichment.enrichment_input module¶
Input file parser for enrichment module.
For most virtual screen result input formats, titles are used to identify the ligands. The input is expected to be correctly ordered. If it is not ordered, please set the optional parameter sort_header in parser functions to the correct score header/property. If the file contains duplicate titles then only the first occurrence of a unique title is ranked.
- Input file formats:
- <actives_file>
- Text file.
- Raw text, one title per line.
- Structure file.
- A file containing structures with a meaningful title.
- CSV file.
- A comma-separated values file.
- List(str).
- A list of active string titles.
- <results_file>
- Structure file, e.g. ‘foo_pv.mae’
- A file containing ordered structures.
- CSV file.
- A comma-separated values file containing ranked titles ordered by virtual screen scoring metric.
- List(str) or List(structure).
- A list of ranked titles ordered by virtual screen scoring metric.
- API examples:
# Ex. 1) Calculate BEDROC active_titles = extract_active_titles_from_txt(actives_file) total_actives, total_ligands, active_ranks, adjusted_active_ranks,
- total_ranked, title_ranks = extract_ranks_from_mae(
- mae_file_name=”screen_results.maegz”, active_titles=active_titles, num_decoy=1000)
- bedroc, bedroc_ra = metrics.calcBEDROC(total_actives, total_ligands,
- active_ranks, 20.0)
- # Ex. 2) Using the reporter class to calculate the default set of metrics.
- Note that this is not a good practice.
- r = reporter.EnrichmentReporter(
- actives_file=”my_actives.txt”, results_file=”screen_results.maegz”, num_decoy=1000)
r.report()
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.analysis.enrichment.enrichment_input.
FingerprintComponent
(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)¶ Bases:
object
Data class that contains critical objects that all fingerprint-related metrics functions (calc_DEF, calc_DEFStar and calc_DEFP) need.
Variables: - fp_gen (CanvasFingerprintGenerator) – Object needed to generate fingerprint for each active title.
- fp_sim (CanvasFingerprintSimilarity) – Object needed to compare fingerprint similarity for each active pair.
- active_fingerprint (dict) – Title keys for fingerprint. Not available for screen results that don’t include title and structure information.
- min_Tc_total_actives (float) – A float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs.
-
__init__
(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)¶ Initialize self. See help(type(self)) for accurate signature.
-
__class__
¶ alias of
builtins.type
-
__delattr__
¶ Implement delattr(self, name).
-
__dict__
= mappingproxy({'__module__': 'schrodinger.analysis.enrichment.enrichment_input', '__doc__': "\n Data class that contains critical objects that all fingerprint-related\n metrics functions (calc_DEF, calc_DEFStar and calc_DEFP) need.\n\n :cvar fp_gen: Object needed to generate fingerprint for each active title.\n :vartype fp_gen: CanvasFingerprintGenerator\n\n :cvar fp_sim: Object needed to compare fingerprint similarity for each\n active pair.\n :vartype fp_sim: CanvasFingerprintSimilarity\n\n :cvar active_fingerprint: Title keys for fingerprint. Not available for\n screen results that don't include title and\n structure information.\n :vartype active_fingerprint: dict\n\n :cvar min_Tc_total_actives: A float representing the lowest Tc, Tanimoto\n coefficient, of all the active similarity\n pairs.\n :vartype min_Tc_total_actives: float\n ", '__init__': <function FingerprintComponent.__init__>, '__dict__': <attribute '__dict__' of 'FingerprintComponent' objects>, '__weakref__': <attribute '__weakref__' of 'FingerprintComponent' objects>})¶
-
__dir__
() → list¶ default dir() implementation
-
__eq__
¶ Return self==value.
-
__format__
()¶ default object formatter
-
__ge__
¶ Return self>=value.
-
__getattribute__
¶ Return getattr(self, name).
-
__gt__
¶ Return self>value.
-
__hash__
¶ Return hash(self).
-
__init_subclass__
()¶ This method is called when a class is subclassed.
The default implementation does nothing. It may be overridden to extend subclasses.
-
__le__
¶ Return self<=value.
-
__lt__
¶ Return self<value.
-
__module__
= 'schrodinger.analysis.enrichment.enrichment_input'¶
-
__ne__
¶ Return self!=value.
-
__new__
()¶ Create and return a new object. See help(type) for accurate signature.
-
__reduce__
()¶ helper for pickle
-
__reduce_ex__
()¶ helper for pickle
-
__repr__
¶ Return repr(self).
-
__setattr__
¶ Implement setattr(self, name, value).
-
__sizeof__
() → int¶ size of object in memory, in bytes
-
__str__
¶ Return str(self).
-
__subclasshook__
()¶ Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
-
__weakref__
¶ list of weak references to the object (if defined)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_active_titles_from_csv
(actives_file)¶ Parse actives_file as a csv file, return distinct active titles. Repeated active titles are ignored.
Parameters: actives_file (str) – A csv file containing all active titles. Returns: Distinct active titles from the actives file. Return type: set(str)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_active_titles_from_mae
(actives_file)¶ Parse actives_file as a maestro file, return distinct active titles. Repeated active titles are ignored.
Parameters: actives_file (str) – A maestro file containing all active titles. Returns: Distinct active titles from the actives file. Return type: set(str)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_active_titles_from_txt
(actives_file)¶ Parse actives_file as a raw text file with one title per line, return distinct active titles from the actives file. Repeated active titles are ignored.
Parameters: actives_file (str) – Raw text file containing one title per line. Returns: Distinct active titles from the actives file. Return type: set(str)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_active_titles_from_list
(actives)¶ Parse actives from list of string, return distinct active titles from the list. Repeated active titles are ignored.
Parameters: actives (list(str)) – A list of strings containing all active titles. Returns: Distinct active titles from the actives file. Return type: set(str)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_ranks_from_list
(titles_iter, active_titles, num_decoy=0)¶ Compute and return rank and count related terms from a list of ligand titles pre-sorted by virtual screen scoring metric.
Parameters: - titles_iter (list(str)) – A list of title strings, pre-sorted by virtual screen scoring metric.
- active_titles (set(str)) – Distinct active titles from the actives file
- num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
Returns: A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
Return type: int, int, list(int), list(int), int, dict(str, int)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_ranks_from_csv
(csv_file_name, active_titles, num_decoy=0, id_header='Title', sort_header=None)¶ Compute and return rank and count related terms from a csv file.
Parameters: - csv_file_name (str) – File name of the csv file that contains the virtual screening result.
- active_titles (set(str)) – Distinct active titles from the actives file
- num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
- id_header (str) – Name of compound-identifying header.
- sort_header (str) – Name of the virtual screen scoring metric header to sort on. (not implemented)
Returns: A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
Return type: int, int, list(int), list(int), int, dict(str, int)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_ranks_from_structures
(structure_iter, active_titles, num_decoy=0, id_property='s_m_title', sort_property=None)¶ Compute and return rank and count related terms from a list of structures.
Parameters: - structure_iter (list(structure.Structure)) – A list of structure.Structure.
- active_titles (set(str)) – Distinct active titles from the actives file
- num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
- id_property (str) – Name of compound-identifying property.
- sort_property (str) – Name of the virtual screen scoring metric property to sort on. (not implemented)
Returns: A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
Return type: int, int, list(int), list(int), int, dict(str, int)
-
schrodinger.analysis.enrichment.enrichment_input.
extract_ranks_from_mae
(mae_file_name, active_titles, num_decoy=0, id_property='s_m_title', sort_property=None)¶ Compute and return rank and count related terms from a structure file.
Parameters: - mae_file_name (str) – A structure file that contains the virtual screening result.
- active_titles (set(str)) – Distinct active titles from the actives file
- num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
- id_property (str) – Name of compound-identifying property.
- sort_property (str) – Name of the virtual screen scoring metric property to sort on. (not implemented)
Returns: A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
Return type: int, int, list(int), list(int), int, dict(str, int)
-
schrodinger.analysis.enrichment.enrichment_input.
get_fingerprint_components
(structure_file, active_titles, id_property='s_m_title')¶ Initialize and return a data class object needed for fingerprint-related calculations.
Parameters: - structure_file (str or list(str)) – Structure file or a list of structures.
- active_titles (set(str)) – Distinct active titles from the actives file
- id_property (str) – Name of compound-identifying property.
Returns: The initialized enrichment_input.FingerprintComponent object.
Return type: