Package schrodinger :: Package application :: Package canvas :: Module r_group :: Class RGroupFinder
[hide private]
[frames] | no frames]

Class RGroupFinder

This class is used to find optimal core alignments and determines r-groups for input structures. This is a base class that uses 'original' algorithm that tries to minimize the number of r-groups.

Instance Methods [hide private]
 
__init__(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None)
This function creates an instance of RGroupFinder object.
 
_resetData(self)
This function resets data objects used during calculation to their initial values.
 
calculate(self)
This function is called to compute optimal molecule alignment and determine attached r-groups.
 
_findCoreMatches(self)
In this function we loop over all molecule and find all possible core matches.
 
_checkAndTallyCoreMatches(self, i, ct, core_matches, bond_matches)
Verify that a valid match was found by checking that core atoms contains any heavy atoms, that current molecule is not a duplicate etc.
 
_checkCoreMatches(self, ct, core_matches, bond_matches)
Verifies that a valid match was found by checking that core atoms contains any heavy atoms, that current molecule is not a duplicate etc.
 
_checkCoreHeavyAtoms(self, ct, core_matches, bond_matches)
This function verifies that core match has heavy atoms.
 
_generateMessage(self)
This function checks calculation results and generates appropriate message.
list
_findAttachments(self, ct, core)
Find attachments for a given core.
 
_optimizeAlignment(self)
This function finds best core matches for molecules that have multiple matches.
list, int
_calculateStatesTally(self)
This function calculates number of 'states' (matches) for each multiply matched molecule as well as the grand total of all states.
list
_findBestMatches(self)
Find best matching core for each multiply matched molecule.
DEE_EnergyMatrix
_calculateEnergyMatrix(self)
This function computes 'energy' matrix that is used in the optimization procedure that finds best matching cores for each molecule.
dict
_calculateRGroupSimilarity(self)
This function calculates Tanimoto fingerprint similarities for every attachment pair defined using SMILES.
dict
_findUniqueRGroups(self)
This function iterates over core 'states'.
dict
_calculatePairSimilarity(self, r_group_bitset, sim_func)
This function calculates similarities for each pair of unique r-group attachments.
float
_calculatePairSimilarityFP(self, key1, key2, r_group_bitset)
This function calculates Tanimoto FP similarities for a pair of unique r-group attachments.
float
_calculatePairSimilarityGeneric(self, key1, key2, r_group_bitset)
This function calculates similarities for a pair of unique r-group attachments using simple sheme.
double
_calculateStatePairScore(self, a1, a2, r_group_sim)
This function calculates pairwise score for a pair of core states.
 
_calculateSimilarityMatrix(self, ns, r_group_sim)
This function computes similarity matrix, which contains pairwise scores between core states of molecules that have multiple core matches.
 
_finalizeCalculation(self)
This function is used to add r-group data into Data() object after best matching core has been found.
 
_saveBestMatchData(self)
This function iterates over input structures and stores core and bond data for 'best matches' in the Data() structure.
 
_savePropertyAndAttachmentData(self)
Iterate over input structures and save property and attachment groups data in the main Data() structure.
Method Details [hide private]

__init__(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None)
(Constructor)

 

This function creates an instance of RGroupFinder object.

Parameters:
  • input_file (str) - name of the input structure file
  • smarts (list) - list of cores SMARTS patterns
  • temp_dir (str) - name of temporary directory
  • thread (calculation_thread) - calculation thread (optional)
  • use_mm (int) - type of multiple matching algorithm (1-SA and 2-DEE)
  • use_fp_sim (bool) - True to use fingerprint similarity between pairs of 'states'
  • sa_seed (int) - SimualtedAnnealing random generator seed
  • t_factor - SimulatedAnnealing temperature factor
  • tmax_mult (float) - SimualtedAnnealing parameter to set starting temperature

calculate(self)

 

This function is called to compute optimal molecule alignment and determine attached r-groups. It defines the order of calls that need to be made as in 'Template Method' design pattern. When calculation is done here self.data object can be retrieved to get results.

_checkAndTallyCoreMatches(self, i, ct, core_matches, bond_matches)

 

Verify that a valid match was found by checking that core atoms contains
any heavy atoms, that current molecule is not a duplicate etc.  Also
update the class attributes to reflect the results of the verification.

@param i: index of current molecule
@type i: int

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match
found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A bool indicating whether this is a 'good' match
- The updated core_matches list
- The updated bond_matches list
@rtype: tuple

_checkCoreMatches(self, ct, core_matches, bond_matches)

 

Verifies that a valid match was found by checking that core atoms
contains any heavy atoms, that current molecule is not a duplicate etc.

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A match state constant that is one of GOOD_MATCH, NO_MATCHES,
  DUPLICATE_MATCH, or NO_HEAVY_ATOM_MATCH
- The updated core_matches list
- The updated bond_matches list
- The smiles string for the input molecule
@rtype: tuple

_checkCoreHeavyAtoms(self, ct, core_matches, bond_matches)

 

This function verifies that core match has heavy atoms.

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A match state constant that is one of GOOD_MATCH, NO_MATCHES,
  DUPLICATE_MATCH, or NO_HEAVY_ATOM_MATCH
- The updated core_matches list
- The updated bond_matches list
@rtype: tuple

_generateMessage(self)

 

This function checks calculation results and generates appropriate message.

Raises:
  • RGroupException - raises this exception if only one structure matched the core definition or no matches were found.

_findAttachments(self, ct, core)

 

Find attachments for a given core.

Parameters:
Returns: list
list of attached groups ('hydrogen', 'null' or SMILES)

_optimizeAlignment(self)

 

This function finds best core matches for molecules that have multiple matches. Indices of best core matches are appended to self.core_states data.

_calculateStatesTally(self)

 

This function calculates number of 'states' (matches) for each multiply matched molecule as well as the grand total of all states.

Returns: list, int
list consisting of number of states for each molecule and a total number of states

_findBestMatches(self)

 

Find best matching core for each multiply matched molecule.

Returns: list
list that contains indices of 'best matching' cores for each molecule

_calculateEnergyMatrix(self)

 

This function computes 'energy' matrix that is used in the optimization procedure that finds best matching cores for each molecule.

Returns: DEE_EnergyMatrix
energy matrix

_calculateRGroupSimilarity(self)

 

This function calculates Tanimoto fingerprint similarities for every attachment pair defined using SMILES.

Returns: dict
dictionary of similarity scores keyd on the pair of SMILES string

_findUniqueRGroups(self)

 

This function iterates over core 'states'. For each attached group, which is defined by a SMILES string, it calculates its fingerprint. SMILES string is used as a key, so that we only store FPs for unique attachments.

Returns: dict
dictionary, where SMILES is key and FP is value

_calculatePairSimilarity(self, r_group_bitset, sim_func)

 

This function calculates similarities for each pair of unique r-group attachments.

Parameters:
  • r_group_bitset (dict) - dictionary of FPs for r-groups keyed on SMILES
  • sim_func (function) - function used to compute similarity between pair of attachments. This function can be either _calculatePairSimilarityFP or _calculatePairSimilarityGeneric.
Returns: dict
dictionary of pair similarities keyed on a pair of SMILES

_calculatePairSimilarityFP(self, key1, key2, r_group_bitset)

 

This function calculates Tanimoto FP similarities for a pair of unique r-group attachments.

Parameters:
  • key1 (str) - first r-group key, which can be SMILES or 'hydrogen'
  • key2 (str) - second r-group key, which can be SMILES or 'hydrogen'
  • r_group_bitset (dict) - dict of FPs for r-groups.
Returns: float
similarity between two groups

_calculatePairSimilarityGeneric(self, key1, key2, r_group_bitset)

 

This function calculates similarities for a pair of unique r-group attachments using simple sheme.

Parameters:
  • key1 (str) - first r-group key, which can be SMILES or 'hydrogen'
  • key2 (str) - second r-group key, which can be SMILES or 'hydrogen'
  • r_group_bitset (dict) - dict of FPs for r-groups. This variable is not used by this function, but we need it for compatibility.
Returns: float
similarity between two groups

_calculateStatePairScore(self, a1, a2, r_group_sim)

 

This function calculates pairwise score for a pair of core states. For each state we have lists of groups at each attachment point. These groups can be 'null', 'hydrogen' or be identified by a SMILES string. It is possible to have multiple groups attached to the same point.

Parameters:
  • a1 (list) - list of attachment groups for 1st state
  • a2 (list) - list of attachment groups for 2nd state
  • r_group_sim (dict) - dictionary of Tanimoto similarities
Returns: double
pairwise score

_calculateSimilarityMatrix(self, ns, r_group_sim)

 

This function computes similarity matrix, which contains pairwise scores between core states of molecules that have multiple core matches.

Parameters:
  • ns (int) - total number of core states (matrix dimensions)
  • r_group_sim (dict) - pairwise Tanimoto similarity between pairs of attachments
Returns:
ns * ns similarity matrix

_finalizeCalculation(self)

 

This function is used to add r-group data into Data() object after best matching core has been found. This includes adding core bonds, adding properties and adding information about attched groups.