schrodinger.application.combinatorial_screen.combinatorial_screener module

This module contains the CombinatorialScreener class, which employs a heuristic approach to identify subsets of combinatorial reactants that are most likely to yield enumerated products with the highest dendritic fingerprint similarities to a query.

Basic Algorithm

For the example reaction A + B + C –> ABC, the reactants in each of the 3 groups are ranked by decreasing Tversky similarity to the query, where the Tversky weight for the reactant R is 1 and the Tversky weight for the query Q is 0. In other words,

rank_score(R, Q) = ON(R & Q) / ON(R)

Where:

ON(R & Q) = Number of ‘on’ bits shared by reactant and query ON(R) = Number of ‘on’ bits in reactant

This quantity is maximized when R is a substructure of Q.

Once the reactants have been ranked, limits NA, NB, NC on the ranked lists are assigned to yield the subset S(NA, NB, NC) = A[0:NA] x B[0:NB] x C[0:NC], where [0:N] is a Python slice.

If the minimum number of enumerated products desired is min_products, the limits must be chosen such that NA * NB * NC >= min_products.

NA, NB, NC are arrived by setting them to 1 and then performing a systematic exploration of larger values with the goal of identifying combinations of reactants whose logical OR fingerprints yield the highest similarities to the query. In the case of dendritic fingerprints, the logical OR similarities correlate strongly with similarities computed from the enumerated products, so this is a good approximation that avoids enumeration of S(NA, NB, NC) as the limits are varied.

A rough outline of the procedure is as follows:

  1. Set NA = NB = NC = 1

  2. if NA * NB * NC >= min_products, we are done

  3. sim_best = 0, R_best = None

  4. for R in (A, B, C):

    NR += 1 # Temporarily add new reactant R_new for each (a, b, c) in S(NA, NB, NC), where R_new is in (a, b, c)

    FP_abc = FP(a) | FP(b) | FP(c) # Logical OR fingerprint sim = Tanimoto(FP_abc, FP_query) if sim > sim_best:

    sim_best = sim R_best = R

    NR -= 1 # Remove new reactant

  5. NR_best += 1 # Expand limit to include best new reactant

  6. Go to step 2

This approach is superior to assigning equal limits, such as (10, 10, 10) if 1000 products are desired. In many cases, the algorithm finds limits that are quite ragged, such as (2, 50, 10), and the enumerated compound with the highest similiarty to the query is found at some non-obvious position, such as (1, 45, 7).

Copyright Schrodinger LLC, All Rights Reserved.

class schrodinger.application.combinatorial_screen.combinatorial_screener.CombinatorialScreener(reactant_fp_files, query_smiles, max_reactants=None)

Bases: object

Identifies subsets of combinatorial reactants that are most likely to yield products with the highest dendritic Tanimoto similarities to a query.

__init__(reactant_fp_files, query_smiles, max_reactants=None)

Constructor taking the names of dendritic fingerprint files for one or more sets of reactants, the SMILES string for a query, and an optional cap on the reactant subset size.

Each fingerprint file must contain a single extra data column that holds the SMILES of the reactants. If the same fingerprint file is supplied more than once, each instance is treated as a separate reactant set, but the file is read only once, and all screens will yield an identical reactant subset size for all instances of that reactant.

Parameters:
  • reactant_fp_files (list(str)) – List of reactant fingerprint files.
  • query_smiles (str) – SMILES string for query.
  • max_reactants (int) – Maximum allowed size of each reactant subset when a query is screened. The default is MAX_COMBOS ** 1/N, where N is the number of reactant groups.
screen(min_products)

Performs a similarity screen against the query to determine the number of reactants in each sorted group that are required to make the minimum number of enumerated products. These reactant counts are stored in self.reactant_limits.

Parameters:min_products (int) – Minimum number of theoretical products