schrodinger.application.scaffold_enumeration.cxsmiles module

Functions to parse “repeating units” and “position variant bonds” from CX SMILES “features” text are not particularly bright, but probably good enough for machine-generated CX SMILES.

class schrodinger.application.scaffold_enumeration.cxsmiles.MCG(atoms, center)

Bases: tuple

__contains__

Return key in self.

__init__

Initialize self. See help(type(self)) for accurate signature.

__len__

Return len(self).

atoms

List of atom indices ([int]).

center

Central atom index (int).

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

class schrodinger.application.scaffold_enumeration.cxsmiles.SRU(atoms, subscript, superscript)

Bases: tuple

__contains__

Return key in self.

__init__

Initialize self. See help(type(self)) for accurate signature.

__len__

Return len(self).

atoms

List of atom indices ([int]).

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

subscript

SRU’s subscript (str).

superscript

SRU’s superscript (str).

schrodinger.application.scaffold_enumeration.cxsmiles.parse_mcg(text, pos, accum)

Parses “multi-center SGroup” data from CX SMILES “features”.

<quote>

The multicenter atom indexes written after “m:” followed by a colon character and the indexes of the atoms which forms the given SGroup separated by “.”. The SGroups are separated by commas.

Example: “m:0:7.6.5.4.3,2:12.11.10.9.8,C:0.0,2.1”

</quote>

Parameters:
  • text (str) – CX SMILES “features” string.
  • pos (int) – Index of the character in text right after “m:”.
  • accum (list) – List to which the “SGroups” are to be appended.
Returns:

Index of the first unconsumed character in text.

Return type:

int

schrodinger.application.scaffold_enumeration.cxsmiles.parse_sru(text, pos, accum)

Parses “SRU” data from CX SMILES “features”.

<quote>

Polymer Sgroups Each Sgroup exported after “Sg:” in fields separated by a colon. Fields are:

1. Sgroup type keyword. Valid keywords are: +——-+———–+ |Keyword|Sgroup Type| |n |SRU |

  1. Atom indexes separated with commas.
  2. Subscript of the Sgroup. If the supscript equals the keyword of the Sgroup this field can be empty. Escaped field.
  3. Superscript of the Sgroup. In the superscript only connectivity and flip information is allowed. This field can be empty. Escaped field.
  4. Head crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.
  5. Tail crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.
  6. If the c export option is present then bracket orientation, bracket type followed by the coordinates (4 pair, separated with commas). Bracket orientation can be s or d (single or double), bracket type can be b,c,r,s for braces, chevrons, round and square, respectively. The brackets are written between parentheses and separated with semicolons.

A colon is needed after the last non-empty field.

If one needs to retain not only the chemically relevant information, but the whole structure (as drawn), then the c export option should be used.

Examples: CCCC |Sg:gen:0,1,2:| CCCC |Sg:n:0,1,2:3-6:eu| CC()C(*)N* |Sg:n:6,1,2,4::hh&#44;f:6,0,:4,2,|

</quote>

In addition:

<quote>

Escaping

In some places special characters are escaped to ‘&#code’ where code is the ASCII code of the special character.

Not escaped characters in fields of Sgroups and DataSgroups: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.

Not escaped characters in atom property keys and values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.

Not escaped characters in atom labels and atom values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#%()[]./?-+*^_~=,:’ and the space character.

</quote>

This subroutine recognizes only:
atoms (2), subscript (3), and superscript (4).
Parameters:
  • text (str) – CX SMILES “features” string.
  • pos (int) – Index of the character in text right after “Sg:n:”.
  • accum (list) – List to which the “SGroups” are to be appended.
Returns:

Index of the first unconsumed character in text.

Return type:

int

schrodinger.application.scaffold_enumeration.cxsmiles.parse_cx_extensions(text)

Parses: (a) multi-center groups and (b) SRUs.

Parameters:text (str) – CX extensions to be parsed.
Returns:Tuple ot lists that hold the MCGs and SRUs.
Return type:(list(MCG), list(SRU))
schrodinger.application.scaffold_enumeration.cxsmiles.mol_from_cxsmiles(text, parseName=True)

Strives to instantiate rdkit.Chem.Mol from text assuming that the latter is CX SMILES.

Parameters:
  • text (str) – CX SMILES string.
  • parseName (bool) – Parse molecule title?
Returns:

Molecule or None

Return type:

rdkit.Chem.Mol or NoneType