Package schrodinger :: Package application :: Package canvas :: Module r_group :: Class RGroupFinder

Class RGroupFinder

This class is used to find optimal core alignments and determines r-groups for input structures. This is a base class that uses 'original' algorithm that tries to minimize the number of r-groups.

Instance Methods

[hide private]

__init__(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None)
This function creates an instance of RGroupFinder object.

_resetData(self)
This function resets data objects used during calculation to their initial values.

calculate(self)
This function is called to compute optimal molecule alignment and determine attached r-groups.

_findCoreMatches(self)
In this function we loop over all molecule and find all possible core matches.

_checkAndTallyCoreMatches(self, i, ct, core_matches, bond_matches)
Verify that a valid match was found by checking that core atoms contains any heavy atoms, that current molecule is not a duplicate etc.

_checkCoreMatches(self, ct, core_matches, bond_matches)
Verifies that a valid match was found by checking that core atoms contains any heavy atoms, that current molecule is not a duplicate etc.

_checkCoreHeavyAtoms(self, ct, core_matches, bond_matches)
This function verifies that core match has heavy atoms.

_generateMessage(self)
This function checks calculation results and generates appropriate message.

list

_findAttachments(self, ct, core)
Find attachments for a given core.

_optimizeAlignment(self)
This function finds best core matches for molecules that have multiple matches.

list, int

_calculateStatesTally(self)
This function calculates number of 'states' (matches) for each multiply matched molecule as well as the grand total of all states.

list

_findBestMatches(self)
Find best matching core for each multiply matched molecule.

DEE_EnergyMatrix

_calculateEnergyMatrix(self)
This function computes 'energy' matrix that is used in the optimization procedure that finds best matching cores for each molecule.

dict

_calculateRGroupSimilarity(self)
This function calculates Tanimoto fingerprint similarities for every attachment pair defined using SMILES.

dict

_findUniqueRGroups(self)
This function iterates over core 'states'.

dict

_calculatePairSimilarity(self, r_group_bitset, sim_func)
This function calculates similarities for each pair of unique r-group attachments.

float

_calculatePairSimilarityFP(self, key1, key2, r_group_bitset)
This function calculates Tanimoto FP similarities for a pair of unique r-group attachments.

float

_calculatePairSimilarityGeneric(self, key1, key2, r_group_bitset)
This function calculates similarities for a pair of unique r-group attachments using simple sheme.

double

_calculateStatePairScore(self, a1, a2, r_group_sim)
This function calculates pairwise score for a pair of core states.

_calculateSimilarityMatrix(self, ns, r_group_sim)
This function computes similarity matrix, which contains pairwise scores between core states of molecules that have multiple core matches.

_finalizeCalculation(self)
This function is used to add r-group data into Data() object after best matching core has been found.

_saveBestMatchData(self)
This function iterates over input structures and stores core and bond data for 'best matches' in the Data() structure.

_savePropertyAndAttachmentData(self)
Iterate over input structures and save property and attachment groups data in the main Data() structure.

Method Details

[hide private]

init(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None)
(Constructor)

This function creates an instance of RGroupFinder object.

Parameters:

input_file (str) - name of the input structure file
smarts (list) - list of cores SMARTS patterns
temp_dir (str) - name of temporary directory
thread (calculation_thread) - calculation thread (optional)
use_mm (int) - type of multiple matching algorithm (1-SA and 2-DEE)
use_fp_sim (bool) - True to use fingerprint similarity between pairs of 'states'
sa_seed (int) - SimualtedAnnealing random generator seed
t_factor - SimulatedAnnealing temperature factor
tmax_mult (float) - SimualtedAnnealing parameter to set starting temperature

calculate(self)

This function is called to compute optimal molecule alignment and determine attached r-groups. It defines the order of calls that need to be made as in 'Template Method' design pattern. When calculation is done here self.data object can be retrieved to get results.

_checkAndTallyCoreMatches(self, i, ct, core_matches, bond_matches)


Verify that a valid match was found by checking that core atoms contains
any heavy atoms, that current molecule is not a duplicate etc.  Also
update the class attributes to reflect the results of the verification.

@param i: index of current molecule
@type i: int

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match
found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A bool indicating whether this is a 'good' match
- The updated core_matches list
- The updated bond_matches list
@rtype: tuple

_checkCoreMatches(self, ct, core_matches, bond_matches)


Verifies that a valid match was found by checking that core atoms
contains any heavy atoms, that current molecule is not a duplicate etc.

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A match state constant that is one of GOOD_MATCH, NO_MATCHES,
  DUPLICATE_MATCH, or NO_HEAVY_ATOM_MATCH
- The updated core_matches list
- The updated bond_matches list
- The smiles string for the input molecule
@rtype: tuple

_checkCoreHeavyAtoms(self, ct, core_matches, bond_matches)


This function verifies that core match has heavy atoms.

@param ct: current molecule ct
@type ct: L{schrodinger.structure.Structure}

@param core_matches: list of lists of core atom indices for each match found
@type core_matches: list

@param bond_matches: list of lists of core bonds for each match found
@type bond_matches: list

@return: A tuple of:
- A match state constant that is one of GOOD_MATCH, NO_MATCHES,
  DUPLICATE_MATCH, or NO_HEAVY_ATOM_MATCH
- The updated core_matches list
- The updated bond_matches list
@rtype: tuple

_generateMessage(self)

This function checks calculation results and generates appropriate message.

Raises:

RGroupException - raises this exception if only one structure matched the core definition or no matches were found.

_findAttachments(self, ct, core)

Find attachments for a given core.

Parameters:

ct (schrodinger.structure.Structure) - current molecule ct
core (list) - core atom indices

Returns: list

list of attached groups ('hydrogen', 'null' or SMILES)

_optimizeAlignment(self)

This function finds best core matches for molecules that have multiple matches. Indices of best core matches are appended to self.core_states data.

_calculateStatesTally(self)

This function calculates number of 'states' (matches) for each multiply matched molecule as well as the grand total of all states.

Returns: list, int: list consisting of number of states for each molecule and a total number of states

_findBestMatches(self)

Find best matching core for each multiply matched molecule.

Returns: list: list that contains indices of 'best matching' cores for each molecule

_calculateEnergyMatrix(self)

This function computes 'energy' matrix that is used in the optimization procedure that finds best matching cores for each molecule.

Returns: DEE_EnergyMatrix: energy matrix

_calculateRGroupSimilarity(self)

This function calculates Tanimoto fingerprint similarities for every attachment pair defined using SMILES.

Returns: dict: dictionary of similarity scores keyd on the pair of SMILES string

_findUniqueRGroups(self)

This function iterates over core 'states'. For each attached group, which is defined by a SMILES string, it calculates its fingerprint. SMILES string is used as a key, so that we only store FPs for unique attachments.

Returns: dict: dictionary, where SMILES is key and FP is value

_calculatePairSimilarity(self, r_group_bitset, sim_func)

This function calculates similarities for each pair of unique r-group attachments.

Parameters:

r_group_bitset (dict) - dictionary of FPs for r-groups keyed on SMILES
sim_func (function) - function used to compute similarity between pair of attachments. This function can be either _calculatePairSimilarityFP or _calculatePairSimilarityGeneric.

Returns: dict

dictionary of pair similarities keyed on a pair of SMILES

_calculatePairSimilarityFP(self, key1, key2, r_group_bitset)

This function calculates Tanimoto FP similarities for a pair of unique r-group attachments.

Parameters:

key1 (str) - first r-group key, which can be SMILES or 'hydrogen'
key2 (str) - second r-group key, which can be SMILES or 'hydrogen'
r_group_bitset (dict) - dict of FPs for r-groups.

Returns: float

similarity between two groups

_calculatePairSimilarityGeneric(self, key1, key2, r_group_bitset)

This function calculates similarities for a pair of unique r-group attachments using simple sheme.

Parameters:

key1 (str) - first r-group key, which can be SMILES or 'hydrogen'
key2 (str) - second r-group key, which can be SMILES or 'hydrogen'
r_group_bitset (dict) - dict of FPs for r-groups. This variable is not used by this function, but we need it for compatibility.

Returns: float

similarity between two groups

_calculateStatePairScore(self, a1, a2, r_group_sim)

This function calculates pairwise score for a pair of core states. For each state we have lists of groups at each attachment point. These groups can be 'null', 'hydrogen' or be identified by a SMILES string. It is possible to have multiple groups attached to the same point.

Parameters:

a1 (list) - list of attachment groups for 1st state
a2 (list) - list of attachment groups for 2nd state
r_group_sim (dict) - dictionary of Tanimoto similarities

Returns: double

pairwise score

_calculateSimilarityMatrix(self, ns, r_group_sim)

This function computes similarity matrix, which contains pairwise scores between core states of molecules that have multiple core matches.

Parameters:

ns (int) - total number of core states (matrix dimensions)
r_group_sim (dict) - pairwise Tanimoto similarity between pairs of attachments

Returns:

ns * ns similarity matrix

_finalizeCalculation(self)

This function is used to add r-group data into Data() object after best matching core has been found. This includes adding core bonds, adding properties and adding information about attched groups.

Class RGroupFinder

__init__(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None) (Constructor)

calculate(self)

_checkAndTallyCoreMatches(self, i, ct, core_matches, bond_matches)

_checkCoreMatches(self, ct, core_matches, bond_matches)

_checkCoreHeavyAtoms(self, ct, core_matches, bond_matches)

_generateMessage(self)

_findAttachments(self, ct, core)

_optimizeAlignment(self)

_calculateStatesTally(self)

_findBestMatches(self)

_calculateEnergyMatrix(self)

_calculateRGroupSimilarity(self)

_findUniqueRGroups(self)

_calculatePairSimilarity(self, r_group_bitset, sim_func)

_calculatePairSimilarityFP(self, key1, key2, r_group_bitset)

_calculatePairSimilarityGeneric(self, key1, key2, r_group_bitset)

_calculateStatePairScore(self, a1, a2, r_group_sim)

_calculateSimilarityMatrix(self, ns, r_group_sim)

_finalizeCalculation(self)

init(self, input_file, smarts, temp_dir=None, thread=None, use_mm=1, use_fp_sim=False, sa_seed=None, t_factor=None, tmax_mult=None)
(Constructor)