Package schrodinger :: Package application :: Package glide :: Module ensemble_selection :: Class EnsembleSelection

Class EnsembleSelection

object --+
         |
        EnsembleSelection

Objects of this class select the "best"(*) ensembles of a specified size
when given as input the results from an exhaustive cross-docking
calculation on a set of N complexes (i.e., the NxN matrix of GlideScores).

The input can come from a CSV file or can be provided as a dict. The file
reading methods are provided as a convenience and are useful for testing, but
they have their limitations. Hence it is recommended that the data be
provided as dicts when possible.

(*) Two definitions of "best" are available: 1) RMSD vs experimental
DeltaG (see the 'best_ensembles_by_rmsd' method; 2) number of ligands that
can be docked "properly" by at least one receptor in the ensemble ( see
'best_ensembles_by_count'). "Properly" means that that the score is
lower than 'tol' plus the self-docking score for the ligand.

A couple of object attributes are available (should be considered read-only
outside this class):
    * titles
    * N = len(titles)

Instance Methods

[hide private]

__init__(self, gscores=None, exp_dg=None, fname=None, exp_dg_fname=None, structures=None, initial_seed=42, max_exhaustive=1000000, n_random_comb=100000, tol=0.5, docking_failure_penalty=10.0, bad_self_docking_penalty=0.1, self_docking_tolerance=2.0, fixed_titles=None, score_property='r_i_docking_score')
Constructor optionally takes a few parameters that determine the behavior of the selection algorithm or that provide the input data.

self_docking_penalty(self, comb)
Return the total penalty due to bad self-docking scores in the present combination 'comb'.

rmsd_comb(self, comb)
Calculate the RMS deviation between experimental DeltaG and the lowest GScore obtained for each ligand using a given combination 'comb' of receptors.

score(self, lig, comb)
Return the score for ligand 'lig' within ensemble 'comb'.

count_good_ligs(self, comb)
Return the count of ligands that can be docked "properly" by at least one receptor in the given combination 'comb' of receptors.

sample_combinations(self, n)
Return an iterator that produces a random sample of n-element combinations from the list of receptors contained in self.

check_ensemble_size(self, n)

combinations_range(self, n1, n2)
Return an iterator that produces combinations containing between n1 and n2 elements (both inclusive) from the list of receptors.

combinations(self, n)
Return an iterator that produces n-element combinations from the list of receptors contained in self.

best_ensembles(self, n, n2=None, nmax=15, func=None)
Return the 'nmax' best n-member ensembles by the return value of a caller-specified scoring function 'func', which takes a tuple of ensemble members (title strings) and returns a sortable value.

best_ensembles_by_count(self, n, n2=None, nmax=15)
Return the 'nmax' best n-member ensembles by count of "properly docked ligands" as a list of Ensemble objects.

best_ensembles_by_rmsd(self, n, n2=None, nmax=15, rmsd_size_penalty=0.1)
Return the 'nmax' best n-member ensembles by RMSD of computed gscore vs exp DeltaG as a list of Ensemble objects.

self_docking_rmsd(self)
Return, well, the self-docking rms deviation of GScore vs exp DeltaG.

count_combinations(self, n)
Return the number of n-member combinations out of the list of N receptors held by the object (i.e., N!/(n! * (N-n)!)).

count_singletons(self)
Return the number of ligands that get a "good" score with only one receptor.

set_tol(self, tol)
Set the tolerance used for determining whether a ligand is docked "properly" into a receptor.

_compute_ssets(self)

set_gscores(self, gscores, nocopy=False, fixed_titles=None)
Set the GlideScore matrix.

set_structures(self, structures, score_property='r_i_docking_score', fixed_titles=None)
Set the self.structures property and use it to update other properties such as gscore, titles, and N.

set_exp_dg(self, exp_dg)
Set the experimental data used for computing RMSDs.

read_csv(self, fname, fixed_titles=None)
Import data from a CSV file, which is expected to contain an N*N matrix of GlideScores, plus the titles on the header and on the first column.

read_exp_dg(self, fname)
Read a file with the experimental data.

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, gscores=None, exp_dg=None, fname=None, exp_dg_fname=None, structures=None, initial_seed=42, max_exhaustive=1000000, n_random_comb=100000, tol=0.5, docking_failure_penalty=10.0, bad_self_docking_penalty=0.1, self_docking_tolerance=2.0, fixed_titles=None, score_property=`'r_i_docking_score'`)
(Constructor)

Constructor optionally takes a few parameters that determine the
behavior of the selection algorithm or that provide the input data.
The input data may be provided either as a dict or as a filename to
be parsed.
    * initial_seed: random seed used whenever the random sampling
        method is used. It may be set to None if non-reproducible
        results are desired.
    * max_exhaustive: the maximum number of combinations for a
        systmatic (exhaustive) search of the available combinations.
        When the number of combinations exceeds this number, random
        sampling is used instead.
    * n_random_comb: the number of iterations for random sampling.
    * tol: the tolerance used to determine if a ligand is docked
      "properly"
    * structures: a dict of Structure object to be passed on to the
        set_structures method.
    * gscores: a dict of GlideScores to be passed on to the
        set_gscores method.
    * fname: a filename for a csv file to be passed to the read_csv
        method.
    * exp_dg: a dict of experimental DeltaGs to be passed on to the
        set_exp_dg method.
    * exp_dg_fname: a filename to be passed on to the read_exp method.
    * docking_failure_penalty: the assumed deviation between
        experimental DeltaG and GlideScore when the ligand fails to
        dock into the ensemble under consideration. Used by the
        rmsd_comb method.
    * bad_self_docking_penalty: added to the RMSD of a receptor
        combination for each receptor that self-docks "badly".
    * self_docking_tolerance: a self-docking is considered bad if
        it has score > exp_dg + self_docking_tolerance.
    * score_property: only used when 'structures' is provided. It is
        the property used as the score for ensemble selection.
    * fixed_titles: titles of receptors that must always be included
        in the ensemble.

Overrides: object.__init__

self_docking_penalty(self, comb)

Return the total penalty due to bad self-docking scores in the present combination 'comb'. See __init__ for details.

rmsd_comb(self, comb)

Calculate the RMS deviation between experimental DeltaG and the lowest GScore obtained for each ligand using a given combination 'comb' of receptors. 'comb' is a tuple of titles. NOTE: ligands that failed to dock into all of the receptors in 'comb' count as a deviation of 'docking_failure_penalty', as given to the constructor (10.0 kcal/mol by default).

score(self, lig, comb)

Return the score for ligand 'lig' within ensemble 'comb'. The default implementation is to return the lowest docking score for lig over all the receptors in the ensemble. If the ligand failed to dock or a score can't be computed for some reason, returns None.

count_good_ligs(self, comb)

Return the count of ligands that can be docked "properly" by at least one receptor in the given combination 'comb' of receptors. 'comb' is a tuple of titles.

sample_combinations(self, n)

Return an iterator that produces a random sample of n-element combinations from the list of receptors contained in self. The iterator will try to produce self.n_random_comb combinations, but it will skip duplicates so the actual number of combinations returned is likely to be smaller.

combinations_range(self, n1, n2)

Return an iterator that produces combinations containing between n1 and n2 elements (both inclusive) from the list of receptors. This This is basically a loop over self.combinations() from n1 to n2.

combinations(self, n)

Return an iterator that produces n-element combinations from the list of receptors contained in self. If the number of combinations is less than self.max_exhaustive, a complete list of combinations is produced. If it is larger, a random sample is produced by deferring to the sample_combinations method.

best_ensembles(self, n, n2=None, nmax=15, func=None)

Return the 'nmax' best n-member ensembles by the return value of a caller-specified scoring function 'func', which takes a tuple of ensemble members (title strings) and returns a sortable value. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned. The return value for the function is stored as the 'score' property of each Ensemble object returned.

best_ensembles_by_count(self, n, n2=None, nmax=15)

Return the 'nmax' best n-member ensembles by count of "properly docked ligands" as a list of Ensemble objects. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned.

best_ensembles_by_rmsd(self, n, n2=None, nmax=15, rmsd_size_penalty=0.1)

Return the 'nmax' best n-member ensembles by RMSD of computed gscore vs exp DeltaG as a list of Ensemble objects. If 'n2' is defined, then ensembles with sizes ranging from n to n2 are returned. When sorting ensembles, a penalty of rmsd_size_penalty*n is added in order to favor smaller ensembles when possible.

set_gscores(self, gscores, nocopy=False, fixed_titles=None)

Set the GlideScore matrix. 'gscores' must be a dict of dicts, where each value gscores[prot][lig] = gs is the GlideScore from docking lig into prot. Both lig and prot are titles.

Sets the following public attributes: * titles * N = len(titles)

set_exp_dg(self, exp_dg)

Set the experimental data used for computing RMSDs. 'exp_dg' should be a dict where the key is a ligand title and the value is the DeltaG.

read_csv(self, fname, fixed_titles=None)

Import data from a CSV file, which is expected to contain an N*N matrix of GlideScores, plus the titles on the header and on the first column. Each column corresponds to a receptor and each row corresponds to a ligand. The set of ligand titles must be the same as the set of receptor titles. Ligands that failed to dock can be represented int the file either as a blank field or by the string "NA".

read_exp_dg(self, fname)

Read a file with the experimental data. The text file has two
whitespace-separated columns:
    1) title
    2) DeltaG