Package schrodinger :: Package structutils :: Module sort :: Class StructureFileSorter
[hide private]
[frames] | no frames]

Class StructureFileSorter


A class to sort structure files by ct-level property values.

API Example
___________

glide_sp_pv_sorter = sort.StructureFileSorter(
    file_name = 'foo_pv.mae', 
    file_index = 2
)
glide_sp_pv_sorter.sort()
glide_sp_pv_sorter.writeTopNFromBlock('bar_lib.mae', 2)

st_sorter = sort.StructureFileSorter(
    file_name = "baz.mae",
    sort_criteria = [
        ('r_prop_one', sort.ASCENDING),
        ('i_prop_two', sort.DESCENDING)
    ]
)
st_sorter.sort()
st_sorter.write('baz-sorted.mae')


Class Attributes
----------------

None


Instance Attributes
-------------------
structure_index_order (list) 
    Sorted structure index order.  A list of the original file
    indexes, in the order they appear when sorted by sort_criteria
    and intra_block_sort_criteria.

structure_dict (dict)
    File index keys for ct-level property dictionary.

structure_block_order (list)
    Block_ids sorted by 'sort_criteria' keys. 

structure_count (int)
    The number of structures in the file.

read_forward_quota (int)
    Sort in batches, with this chunk size, instead of with
    random-access.  If the value evaluate as True, the input file is
    read, forward-only, in small chunks that are sorted in memory.
    Default is 0, use random-access. 


An instance is primarily a data structure where the original file
positions are keys for the dictionary of properties.  It has auxiliary
data structures for tracking the sorted order of the original
file positions, and methods to write output files with that order.

Using random-access to re-read the structures in the proper
order is typically faster than re-reading in batches.  However,
read_forward_quota attribute can be set to a positive integer to
force batch re-read/writing.

Instance Methods [hide private]
 
__init__(self, file_name=None, file_index=1, sort_criteria=[('b_glide_receptor', 1), ('r_i_docking_score', 1)], intra_block_sort_criteria=[('s_m_title', 1), ('r_i_glide_emodel', 1)], keep_structures=False)
Loads only the structure properties used to sort the file into a dictionary (keyed by file index), but does not do any sorting.
 
sort(self)
Organizes the data structure by self.sort_criteria, and self.intra_block_sort_criteria if it is not None.
 
write(self, out_file_name, index_list=None, dir=None)
Writes structures to disk, no return value.
 
writeTopNFromBlock(self, out_file_name='', max_per_block=1, max_num_block=None)
Write the first max_per_block structures from each block to the output file.
Method Details [hide private]

__init__(self, file_name=None, file_index=1, sort_criteria=[('b_glide_receptor', 1), ('r_i_docking_score', 1)], intra_block_sort_criteria=[('s_m_title', 1), ('r_i_glide_emodel', 1)], keep_structures=False)
(Constructor)

 

Loads only the structure properties used to sort the file into
a dictionary (keyed by file index), but does not do any sorting.

file_name (string)
    Path to the structure file upon with to operate.  Default is
    None.

file_index (integer)
    File position at which to start reading file_name.
    Default is 1.

sort_criteria (list of tuples)
    List of m2io datanames and module constant tuples that
    identify the values for sorting and the sort order.
    Default is GLIDE_SP_KEY_1.

intra_block_sort_criteria (list of tuples)
    List of m2io datanames and module constant tuples that
    identify the values for group sorting, and the sort order.
    Default is GLIDE_SP_KEY_2.  If 'intra_block_sort_criteria' is None,
    then a simple multi-key sort is performed using the 'sort_criteria'.

keep_structures (bool)
    If true then a reference to each structure is kept, keyed by
    '_structure'.  The default is False, don't keep references
    to the structures.

sort(self)

 

Organizes the data structure by self.sort_criteria, and self.intra_block_sort_criteria if it is not None. Assigns attributes for the correct sorted order of the original file positions.

write(self, out_file_name, index_list=None, dir=None)

 

Writes structures to disk, no return value.

out_file_name (str)
    Path to the output structure file.

index_list (list)
    List of file indexes to write, in the order that they
    should appear in the output file (typically a slice
    of self.structure_index_order).  If None, then all of
    self.structure_index_order is written.

dir (string)
    Path to the directory where the intermediate file is written.
    The default is the runtime current working directory.
    There needs to be enough space to store effectively a copy
    of file_name.  For really large files, /tmp is not a good
    location for most hosts.

writeTopNFromBlock(self, out_file_name='', max_per_block=1, max_num_block=None)

 

Write the first max_per_block structures from each block to the
output file.

out_file_name (string)
    Name of structure file to write.

max_per_block (int)
    Number of leading members, from each block, to write to
    out_file_name.  Default is 1.

max_num_block (int)
    Number of blocks from which to draw leading members.  If the
    value is None then N max_per_block structure are pulled from
    each block.  Otherwise, the top N max_per_block strucutures
    from just the top M max_num_blocks blocks are written.