Trees | Indices | Help |
|
---|
|
object --+ | EnsembleSelection
Objects of this class select the "best"(*) ensembles of a specified size when given as input the results from an exhaustive cross-docking calculation on a set of N complexes (i.e., the NxN matrix of GlideScores). The input can come from a CSV file or can be provided as a dict. The file reading methods are provided as a convenience and are useful for testing, but they have their limitations. Hence it is recommended that the data be provided as dicts when possible. (*) Two definitions of "best" are available: 1) RMSD vs experimental DeltaG (see the 'best_ensembles_by_rmsd' method; 2) number of ligands that can be docked "properly" by at least one receptor in the ensemble ( see 'best_ensembles_by_count'). "Properly" means that that the score is lower than 'tol' plus the self-docking score for the ligand. A couple of object attributes are available (should be considered read-only outside this class): * titles * N = len(titles)
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Inherited from |
|
|||
|
Constructor optionally takes a few parameters that determine the behavior of the selection algorithm or that provide the input data. The input data may be provided either as a dict or as a filename to be parsed. * initial_seed: random seed used whenever the random sampling method is used. It may be set to None if non-reproducible results are desired. * max_exhaustive: the maximum number of combinations for a systmatic (exhaustive) search of the available combinations. When the number of combinations exceeds this number, random sampling is used instead. * n_random_comb: the number of iterations for random sampling. * tol: the tolerance used to determine if a ligand is docked "properly" * structures: a dict of Structure object to be passed on to the set_structures method. * gscores: a dict of GlideScores to be passed on to the set_gscores method. * fname: a filename for a csv file to be passed to the read_csv method. * exp_dg: a dict of experimental DeltaGs to be passed on to the set_exp_dg method. * exp_dg_fname: a filename to be passed on to the read_exp method. * docking_failure_penalty: the assumed deviation between experimental DeltaG and GlideScore when the ligand fails to dock into the ensemble under consideration. Used by the rmsd_comb method. * bad_self_docking_penalty: added to the RMSD of a receptor combination for each receptor that self-docks "badly". * self_docking_tolerance: a self-docking is considered bad if it has score > exp_dg + self_docking_tolerance. * score_property: only used when 'structures' is provided. It is the property used as the score for ensemble selection. * fixed_titles: titles of receptors that must always be included in the ensemble.
|
Return the total penalty due to bad self-docking scores in the present combination 'comb'. See __init__ for details. |
Calculate the RMS deviation between experimental DeltaG and the lowest GScore obtained for each ligand using a given combination 'comb' of receptors. 'comb' is a tuple of titles. NOTE: ligands that failed to dock into all of the receptors in 'comb' count as a deviation of 'docking_failure_penalty', as given to the constructor (10.0 kcal/mol by default). |
Return the score for ligand 'lig' within ensemble 'comb'. The default implementation is to return the lowest docking score for lig over all the receptors in the ensemble. If the ligand failed to dock or a score can't be computed for some reason, returns None. |
Return the count of ligands that can be docked "properly" by at least one receptor in the given combination 'comb' of receptors. 'comb' is a tuple of titles. |
Return an iterator that produces a random sample of n-element combinations from the list of receptors contained in self. The iterator will try to produce self.n_random_comb combinations, but it will skip duplicates so the actual number of combinations returned is likely to be smaller. |
Return an iterator that produces combinations containing between n1 and n2 elements (both inclusive) from the list of receptors. This This is basically a loop over self.combinations() from n1 to n2. |
Return an iterator that produces n-element combinations from the list of receptors contained in self. If the number of combinations is less than self.max_exhaustive, a complete list of combinations is produced. If it is larger, a random sample is produced by deferring to the sample_combinations method. |
Return the 'nmax' best n-member ensembles by the return value of a caller-specified scoring function 'func', which takes a tuple of ensemble members (title strings) and returns a sortable value. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned. The return value for the function is stored as the 'score' property of each Ensemble object returned. |
Return the 'nmax' best n-member ensembles by count of "properly docked ligands" as a list of Ensemble objects. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned. |
Return the 'nmax' best n-member ensembles by RMSD of computed gscore vs exp DeltaG as a list of Ensemble objects. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned. When sorting ensembles, a penalty of rmsd_size_penalty*n is added in order to favor smaller ensembles when possible. |
Set the GlideScore matrix. 'gscores' must be a dict of dicts, where each value gscores[prot][lig] = gs is the GlideScore from docking lig into prot. Both lig and prot are titles. Sets the following public attributes: * titles * N = len(titles) |
Set the experimental data used for computing RMSDs. 'exp_dg' should be a dict where the key is a ligand title and the value is the DeltaG. |
Import data from a CSV file, which is expected to contain an N*N matrix of GlideScores, plus the titles on the header and on the first column. Each column corresponds to a receptor and each row corresponds to a ligand. The set of ligand titles must be the same as the set of receptor titles. Ligands that failed to dock can be represented int the file either as a blank field or by the string "NA". |
Read a file with the experimental data. The text file has two whitespace-separated columns: 1) title 2) DeltaG |
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Oct 26 00:59:43 2016 | http://epydoc.sourceforge.net |