Trees | Indices | Help |
|
---|
|
A module for sorting structure files by Structure-level property values. The module supports multi-key sorting, 'block' sorting, and file merging.
'sort_criteria' and 'intra_block_sort_criteria' are lists of tuples, where each tuple is an ct-level property dataname and ascending/descending directive for that dataname. If a structure does not have a particular property it is assigned a None value. Python natively places None before 'something', which is the opposite of common table sort behaviors such as Excel or Maestro's Project table. The module overrides this behavior with the NONE_IS_LAST constant. If NONE_IS_LAST evaluates as True then None values appear after defined values when sorted in ascending order.
'Block sorting' is possible by using the auxiliary 'intra_block_sort_criteria' sort keys. Block sorting organizes structures into groups by the 'intra_block_sort_criteria' set of keys, then orders those groups by their leading member's 'sort_criteria'. Put another way, 'intra_block_sort_criteria' specifies how to organize structures *within* a block, and 'sort_criteria' specifies how to organize the blocks. If 'intra_block_sort_criteria' is None, then a simple multi-key sort is performed using the 'sort_criteria'. For example, if you have a pose file with multiple poses for each ligand-title, a useful global order is to have all poses with the same title in a contiguous block ordered by Emodel values, and title-blocks ordered by the Glide score of the first member in each title-block.
Copyright Schrodinger, LLC. All rights reserved
|
|||
StructureFileSorter A class to sort structure files by ct-level property values. |
|||
DsuList A class to sort a list with special behaviors. |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
_version =
|
|||
ASCENDING = 1
|
|||
DESCENDING = -1
|
|||
CHUNK_SIZE = 2500000
|
|||
MKTMPSUFFIX =
|
|||
NONE_IS_LAST = True hash(x) |
|||
GLIDE_SP_KEY_1 =
|
|||
GLIDE_SP_KEY_2 =
|
|||
GLIDE_XP_KEY_1 =
|
|||
GLIDE_XP_KEY_2 =
|
|||
GLIDE_HTVS_KEY_1 =
|
|||
GLIDE_HTVS_KEY_2 =
|
|||
logger = log.get_output_logger(__file__)
|
|||
__package__ =
|
|
Returns True if file_name is small enough to sort in memory, otherwise False. Test is based on the size, in bytes, of the first 100 structures in file_name and the number of structures in file_name. If the size in bytes of the first 100 structures is less than chunk_size the file is assumed to contain ligand-sized structures, otherwise it is assumed to contain receptor-sized structures. The type of structures determines a limit on the structure count that can be sorted in memory: 1x10^3 receptor-sized structures, or 1x10^4 ligand-sized structures (hardwired values). If the count of structures in file_name is less than the limit then the file should be sortable in memory. file_name (string) Path to the structure file on which to operate. chunk_size (int) The size, in bytes, to used to estimate the scale of structures in the file. Default is the module constant CHUNK_SIZE. |
Sort structure file by the values of ct-level properties within the file. This is the central API that has some logic under the hood to choose a good trade off between disk IO and memory use given the size of the file. file_name (string) Path to file upon which to operate. sort_criteria (list of tuples) List of (m2io dataname, module constant) tuples. These are the primary, secondary, ..., keys for sorting the structures, *or* blocks if intra_block_sort_criteria is defined, and optional ascending/descending constants. e.g.: [('s_m_title', sort.ASCENDING), ('r_i_glide_docking_score', sort.ASCENDING)] out_file_name (string) Output structure file containing the sorted structures. If out_file_name is None, then the input file is clobbered with the results of the sort. Default is to replace input file_name with sorted results. intra_block_sort_criteria (list of tuples) Optional list of (m2io dataname, module constant) tuples for block sorting. These are the primary, secondary, ..., keys for sorting the structures *within* blocks, and optional ascending/descending order constants. Default is None, don't block sort. no_split (bool) Deprecated option. This option is currently ignored. |
Returns a list of file names generated by splitting the original structures in file_name split into smaller files. file_name (string) Path to the structure file upon which to operate. max_count (int) Maximum number of structures per sub-file. dir (string) Path to the directory where the sub-files are written. The default is the runtime current working directory. There needs to be enough space to store effectively a copy of file_name. For really large files, /tmp is not a good location for most hosts. |
Combines pre-ordered structure files by their property values. Input files are assumed to be sorted by default. Optionally the files can be sorted by the sort_criteria prior to merging by setting sort_file_list=True.
Note:
This function is not suited for handling pose viewer files because
all receptors will be included in the output. See
|
Combines pre-ordered pose viewer structure files by their property values. Input files are assumed to be ordered. Only the receptor from the first pose viewer file is retained. file_list (list) List of paths for the pose viewer files that will be merged. sort_criteria (list of tuples) List of (m2io dataname, module constant) tuples, which are the primary keys for sorting the ligand structures. out_file_name (string) Path to the structure output file containing all the merged structures. |
Combines pre-ordered structure iterators by their property values.
|
Orders the structures in file_name, keeping structures in memory during the sort operation. file_name (string) Path to file upon which to operate. sort_criteria (list of tuples) List of (m2io dataname, module constant) tuples, which are the primary keys for sorting the structures and optional sort order constants. out_file_name (string) Output structure file containing the sorted structures. If out_file_name is None then the input file_name is clobbered with the sorted results. intra_block_sort_criteria (list of tuples) List of (m2io dataname, module constant) tuples, which are the properties for sorting the structures within groups, and optional sort order constants. |
Returns the path to a new temporary file that is safe to append structures to. dir (string) Path to a directory with write permissions, where temporary files can be created. Default is None, use the tempfile default, which appears to be /tmp. suffix (string) Optional suffix for temporary files. Default is module constant MKTMPSUFFIX. |
|
GLIDE_SP_KEY_1
|
GLIDE_XP_KEY_1
|
GLIDE_HTVS_KEY_1
|
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Oct 26 00:59:32 2016 | http://epydoc.sourceforge.net |