schrodinger.application.scaffold_enumeration.cxsmiles module¶
Functions to parse “repeating units” and “position variant bonds” from CX SMILES “features” text are not particularly bright, but probably good enough for machine-generated CX SMILES.
-
class
schrodinger.application.scaffold_enumeration.cxsmiles.
MCG
(atoms, center)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
atoms
¶ List of atom indices ([int]).
-
center
¶ Central atom index (int).
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
-
class
schrodinger.application.scaffold_enumeration.cxsmiles.
SRU
(atoms, subscript, superscript)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
atoms
¶ List of atom indices ([int]).
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
subscript
¶ SRU’s subscript (str).
-
superscript
¶ SRU’s superscript (str).
-
-
schrodinger.application.scaffold_enumeration.cxsmiles.
parse_mcg
(text, pos, accum)¶ Parses “multi-center SGroup” data from CX SMILES “features”.
<quote>
The multicenter atom indexes written after “m:” followed by a colon character and the indexes of the atoms which forms the given SGroup separated by “.”. The SGroups are separated by commas.
Example: “m:0:7.6.5.4.3,2:12.11.10.9.8,C:0.0,2.1”
</quote>
Parameters: - text (str) – CX SMILES “features” string.
- pos (int) – Index of the character in
text
right after “m:”. - accum (list) – List to which the “SGroups” are to be appended.
Returns: Index of the first unconsumed character in
text
.Return type: int
-
schrodinger.application.scaffold_enumeration.cxsmiles.
parse_sru
(text, pos, accum)¶ Parses “SRU” data from CX SMILES “features”.
<quote>
Polymer Sgroups Each Sgroup exported after “Sg:” in fields separated by a colon. Fields are:
1. Sgroup type keyword. Valid keywords are: +——-+———–+ |Keyword|Sgroup Type| |n |SRU |
- Atom indexes separated with commas.
- Subscript of the Sgroup. If the supscript equals the keyword of the Sgroup this field can be empty. Escaped field.
- Superscript of the Sgroup. In the superscript only connectivity and flip information is allowed. This field can be empty. Escaped field.
- Head crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.
- Tail crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.
- If the c export option is present then bracket orientation, bracket type followed by the coordinates (4 pair, separated with commas). Bracket orientation can be s or d (single or double), bracket type can be b,c,r,s for braces, chevrons, round and square, respectively. The brackets are written between parentheses and separated with semicolons.
A colon is needed after the last non-empty field.
If one needs to retain not only the chemically relevant information, but the whole structure (as drawn), then the c export option should be used.
Examples: CCCC |Sg:gen:0,1,2:| CCCC |Sg:n:0,1,2:3-6:eu| CC()C(*)N* |Sg:n:6,1,2,4::hh,f:6,0,:4,2,|
</quote>
In addition:
<quote>
Escaping
In some places special characters are escaped to ‘&#code’ where code is the ASCII code of the special character.
Not escaped characters in fields of Sgroups and DataSgroups: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.
Not escaped characters in atom property keys and values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.
Not escaped characters in atom labels and atom values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#%()[]./?-+*^_~=,:’ and the space character.
</quote>
- This subroutine recognizes only:
- atoms (2), subscript (3), and superscript (4).
Parameters: - text (str) – CX SMILES “features” string.
- pos (int) – Index of the character in
text
right after “Sg:n:”. - accum (list) – List to which the “SGroups” are to be appended.
Returns: Index of the first unconsumed character in
text
.Return type: int
-
schrodinger.application.scaffold_enumeration.cxsmiles.
parse_cx_extensions
(text)¶ Parses: (a) multi-center groups and (b) SRUs.
Parameters: text (str) – CX extensions to be parsed. Returns: Tuple ot lists that hold the MCGs and SRUs. Return type: (list(MCG), list(SRU))
-
schrodinger.application.scaffold_enumeration.cxsmiles.
mol_from_cxsmiles
(text, parseName=True)¶ Strives to instantiate
rdkit.Chem.Mol
fromtext
assuming that the latter is CX SMILES.Parameters: - text (str) – CX SMILES string.
- parseName (bool) – Parse molecule title?
Returns: Molecule or None
Return type: rdkit.Chem.Mol or NoneType