Trees | Indices | Help |
|
---|
|
A class to sort structure files by ct-level property values. API Example ___________ glide_sp_pv_sorter = sort.StructureFileSorter( file_name = 'foo_pv.mae', file_index = 2 ) glide_sp_pv_sorter.sort() glide_sp_pv_sorter.writeTopNFromBlock('bar_lib.mae', 2) st_sorter = sort.StructureFileSorter( file_name = "baz.mae", sort_criteria = [ ('r_prop_one', sort.ASCENDING), ('i_prop_two', sort.DESCENDING) ] ) st_sorter.sort() st_sorter.write('baz-sorted.mae') Class Attributes ---------------- None Instance Attributes ------------------- structure_index_order (list) Sorted structure index order. A list of the original file indexes, in the order they appear when sorted by sort_criteria and intra_block_sort_criteria. structure_dict (dict) File index keys for ct-level property dictionary. structure_block_order (list) Block_ids sorted by 'sort_criteria' keys. structure_count (int) The number of structures in the file. read_forward_quota (int) Sort in batches, with this chunk size, instead of with random-access. If the value evaluate as True, the input file is read, forward-only, in small chunks that are sorted in memory. Default is 0, use random-access. An instance is primarily a data structure where the original file positions are keys for the dictionary of properties. It has auxiliary data structures for tracking the sorted order of the original file positions, and methods to write output files with that order. Using random-access to re-read the structures in the proper order is typically faster than re-reading in batches. However, read_forward_quota attribute can be set to a positive integer to force batch re-read/writing.
|
|||
|
|||
|
|||
|
|||
|
|
Loads only the structure properties used to sort the file into a dictionary (keyed by file index), but does not do any sorting. file_name (string) Path to the structure file upon with to operate. Default is None. file_index (integer) File position at which to start reading file_name. Default is 1. sort_criteria (list of tuples) List of m2io datanames and module constant tuples that identify the values for sorting and the sort order. Default is GLIDE_SP_KEY_1. intra_block_sort_criteria (list of tuples) List of m2io datanames and module constant tuples that identify the values for group sorting, and the sort order. Default is GLIDE_SP_KEY_2. If 'intra_block_sort_criteria' is None, then a simple multi-key sort is performed using the 'sort_criteria'. keep_structures (bool) If true then a reference to each structure is kept, keyed by '_structure'. The default is False, don't keep references to the structures. |
Organizes the data structure by self.sort_criteria, and self.intra_block_sort_criteria if it is not None. Assigns attributes for the correct sorted order of the original file positions. |
Writes structures to disk, no return value. out_file_name (str) Path to the output structure file. index_list (list) List of file indexes to write, in the order that they should appear in the output file (typically a slice of self.structure_index_order). If None, then all of self.structure_index_order is written. dir (string) Path to the directory where the intermediate file is written. The default is the runtime current working directory. There needs to be enough space to store effectively a copy of file_name. For really large files, /tmp is not a good location for most hosts. |
Write the first max_per_block structures from each block to the output file. out_file_name (string) Name of structure file to write. max_per_block (int) Number of leading members, from each block, to write to out_file_name. Default is 1. max_num_block (int) Number of blocks from which to draw leading members. If the value is None then N max_per_block structure are pulled from each block. Otherwise, the top N max_per_block strucutures from just the top M max_num_blocks blocks are written. |
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Oct 26 00:59:59 2016 | http://epydoc.sourceforge.net |