Canvas 1.4 Python API¶
Introduction¶
This section describes the Python application programming interface that accompanies Canvas version 1.5, which is part of the Schrödinger Suite 2012 release.
The Canvas Python API consists of a rich library of object‑oriented tools for developing custom cheminformatics applications with a range of functionality, including:
Each Python application you create must import the canvas module:
import schrodinger.application.canvas.base as canvas
If you wish to use Structure objects (sometimes
also referred to as “mmct” objects, for the underlying C library
implementation) and convert back and forth between Canvas ChmMol
objects, you will need additional imports:
from schrodinger import structure
from schrodinger.infra.canvas import ChmMmctAdaptor
Furthermore, you must have a valid CANVAS_FULL license, and the application must successfully checkout that license before the Canvas APIs can be used. The following checkout procedure is recommended:
try:
lic = canvas.ChmLicenseFull()
except:
sys.stderr.write("Unable to checkout CANVAS_FULL license\n")
sys.exit(1)
if not lic.isValid():
sys.stderr.write("CANVAS_FULL license is not valid\n")
sys.exit(1)
To ensure that all environment variables and paths are set correctly, a Canvas Python application should be invoked as follows:
$SCHRODINGER/run <app_name> [options]
Where $SCHRODINGER points to the directory in which the 2010 Schrodinger software is installed.
Constructors and Static Methods¶
Most Canvas objects are created automatically as a returned value from a
member function invoked on another object. For example, if mol is a
ChmMol
object, mol.getAtom(i) returns a ChmAtom
object.
However, there are certain objects that you will create explicitly through a
constructor. For example:
query = canvas.ChmQuery("c1nc[c,n]cc1")
bitset = canvas.ChmBitset(1024)
There are also cases where you will create an object through a static method of that class, such as:
mol = canvas.ChmMol.fromSMILES("Nc1ccccc1")
query = canvas.ChmQuery.fromMDLFile("query.mol")
In the material that follows, any applicable constructors are documented before other methods of the class, and all static methods are noted as such.
Iterable Lists¶
Lists returned by Canvas APIs behave much like ordinary Python lists, but they are not entirely equivalent, nor can you supply an ordinary Python list as an argument in an API that calls for a Canvas-style list. Another important difference is that after iterating through a Canvas list, you must rewind it before you can iterate through it again. For example:
atoms = mol.getAtoms()
print "First pass..."
for atom in atoms:
print "Label =", atom.getLabel()
print "Second pass..."
atoms.rewind()
for atom in atoms:
print "Label =", atom.getLabel()
# Note that the "for" loops above are equivalent to the following:
while atoms.hasNext():
atom = atoms.next()
print "Label =", atom.getLabel()
Iterator Methods¶
-
class
schrodinger.application.canvas.base.
ChmIterator
¶ -
__getitem__
(i) The ith item in the list, where i runs from 0 to size()-1. The legality of i is not checked, so if the supplied value is out of range, a segmentation fault can occur.
-
at
(i) The ith item in the list, where i runs from 0 to size()-1. If an illegal value is supplied, an exception is thrown.
-
hasNext
() True if more items remain in the list.
-
next
() The next item in the list
-
rewind
() Rewinds to the beginning of the list.
-
size
() The total number of items in the list.
-
Input/Output Streams¶
Certain Canvas APIs operate on an input or output stream that’s attached to
file. If the input and output file names are inFile
and outFile
,
respectively, the input/output stream objects may be created as follows:
inStream = canvas.ifstream(inFile)
outStream = canvas.ofstream(outFile)
Exceptions¶
When an error occurs within a Canvas constructor or method, a RuntimeError exception with an informative error message is normally thrown. To handle the exception and display the error message, the following approach may be used:
try:
pass # call some Canvas function here
except Exception as canvasErr:
sys.stderr.write(str(canvasErr) + "\n")
Chemical Structure APIs¶
The classes and methods in this section support chemical structure storage and I/O.
Atoms¶
-
class
schrodinger.application.canvas.base.
ChmAtom
¶ Describes a single atom in a
ChmMol
.-
findNeighbor
(otherAtom) Loops over the bonds to the current
ChmAtom
, and returns the position (0, 1,…) at which the bond to the suppliedChmAtom
is found.
-
firstNeighbor
() The first
ChmAtom
bonded to the currentChmAtom
.
-
getAtomicNumber
() Atomic number.
-
getBond
(otherAtom) The
ChmBond
between the currentChmAtom
and the suppliedChmAtom
.
-
getBond
(i) The
ChmBond
between the currentChmAtom
and the ithChmAtom
to which it is bonded. All atoms (explicit + implicit) are considered, so i ranges from 0 tomol.getBondCount(True)-1
.
-
getBondCount
(wantImplicitHydrogens=False) Number of atoms bonded to the current
ChmAtom
. The Boolean wantImplicitHydrogens is False by default and determines whether bonds to implicit hydrogens should be counted.
-
getBonds
(wantImplicitHydrogens=False) Iterable list of bonds (as
ChmBond
objects) made by the currentChmAtom
. The Boolean wantImplictHydrogens is False by default and determines whether bonds to implicit hydrogens will not be included.
-
getFormalCharge
() Integer formal charge.
-
getHeavyBondCount
() Number of heavy atoms bonded to the current
ChmAtom
.
-
getHeavyBonds
() Iterable list of bonds (as
ChmBond
objects) made to heavy atoms.
-
getHeavyNeighbors
() Iterable list of heavy atoms (as
ChmAtom
objects) bonded to the currentChmAtom
.
-
getHybridization
() Hybridization as an integer: 0 → unknown, 1 → sp, 2 → sp2, 3 → sp3 , 4 → sp3d, 5 → sp3d2. The return values are also available as symbolic constants in
schrodinger.application.canvas.base
:HybridizationUnknown
→ 0,Sp
→ 1,Sp2
→ 2,Sp3
→ 3,Sp3d
→ 4, andSp3d2
→ 5.
-
getHydrogenCount
() Number of hydrogens (explicit + implicit) bonded to the current
ChmAtom
.
-
getImplicitHydrogenCount
() Number of implicit hydrogens bonded to the current
ChmAtom
.
-
getImplicitNeighbors
() Iterable list of bonds (as
ChmAtom
objects) to implicit hydrogens.
-
getLabel
() A string that contains a label of the form <symbol><number>, where <symbol> is the elemental symbol, and <number> is the zero‑based atom number (0, 1,…).
-
getMolIndex
() Index (0, 1,…) of the current
ChmAtom
in its parentChmMol
.
-
getNeighbors
(wantImplicitHydrogens=False, wantExplicitHydrogens=True) Iterable list of atoms (as
ChmAtom
objects) bonded to the currentChmAtom
. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be included. The Boolean wantExlicitHydrogens is True by default and determines whether explicit hydrogens should be included.
-
getRingSize
() Size of the smallest ring that contains the current
ChmAtom
.
-
getSymbol
() Elemental symbol.
-
getTotalValence
() Total valence, which is the sum of the bond orders over all bonds made by the current
ChmAtom
. For this computation, single → 1, double → 2, triple → 3, aromatic → 1.5.
-
getUserData
() An integer-valued piece of data associated with the atom.
-
getX
() X coordinate.
-
getY
() Y coordinate.
-
getZ
() Z coordinate.
-
hasCoords
() True if the
ChmAtom
has coordinates.
-
isAromatic
() True if in aromatic ring.
-
isBondedTo
(otherAtom) True if bonded to the supplied
ChmAtom
.
-
isCarbon
() True if carbon.
-
isHalogen
() True if F, Cl, Br or I.
-
isHeavy
() True if not hydrogen.
-
isHydrogen
() True if hydrogen.
-
isInRing
() True if in ring.
-
isNitrogen
() True if nitrogen.
-
isOxygen
() True if oxygen.
-
isSulfur
() True if sulfur.
-
isTerminal
() True if terminal.
-
setPartialCharge
(q) Sets the partial charge.
-
setUserData
(value) Sets an integer-valued piece of data to associate with the atom.
-
setX
(x) Sets the X coordinate.
-
setY
(y) Sets the Y coordinate.
-
setZ
(z) Sets the Z coordinate.
-
Bonds¶
-
class
schrodinger.application.canvas.base.
ChmBond
¶ Describes a bond between two
ChmAtom
objects.-
atom1
() First
ChmAtom
in theChmBond
.
-
atom2
() Second
ChmAtom
in theChmBond
.
-
getBondLength
() Bond length in angstroms.
-
getMolIndex
() Index (0, 1,…) of the current
ChmBond
in its parentChmMol
.
-
getOrder
() Bond order. 0 → unknown, 1 → single, 2 → double, 3 → triple, 4 → aromatic, 5 → Kekulized single, 6 → Kekulized double. The return values are also available as symbolic constants in
schrodinger.application.canvas.base
:OrderUnknown
→ 0,Single
→ 1,Double
→ 2,Triple
→ 3,Aromatic
→ 4,KekulizedSingle
→ 5, andKekulizedDouble
→ 6.
-
getRingSize
() Size of the smallest ring that contains the current
ChmBond
.
-
getUserCode
() A non-negative integer-valued code associated with the bond.
-
isAromatic
() True if in aromatic ring.
-
isInRing
() True if in ring.
-
isTerminal
() True if terminal.
-
mate
(atom) Given one
ChmAtom
in the currentChmBond
, this function returns the otherChmAtom
.
-
setUserCode
(code) Sets a non-negative integer-valued code to associate with the bond.
-
Molecule Conversion¶
-
class
schrodinger.application.canvas.base.
ChmMmctAdaptor
¶ Converts between
ChmMol
and mmct (Structure) objects.-
create
(ctHandle[, stereoTreatment[, allHydrogensAsImplicit[, wantProperties]]]) Converts an mmct handle to a
ChmMol
, returning a newChmMol
object. The ctHandle argument can be accessed via thehandle
attribute of aStructure
object.Parameters: - stereoTreatment (int) – Controls how stereochemistry is transferred.
Default is
ChmMmctAdaptor.NoStereo
. See Stereochemistry Constants for other legal values. - allHydrogensAsImplicit (bool) – Determines whether all hydrogens should be stored as implicit hydrogens. Default is False.
- wantProperties (bool) – Determines whether properties should be copied. Default is True.
- stereoTreatment (int) – Controls how stereochemistry is transferred.
Default is
-
create
(mol[, wantProperties]) Converts
ChmMol
to an mmct handle. Returns an integer handle, which can be passed to theStructure
constructor to create aStructure
object.Parameters: wantProperties (bool) – Determines whether properties should be copied. Default is True.
-
Stereochemistry Constants¶
-
ChmMmctAdaptor.NoStereo
Ignore stereochemistry (0).
-
ChmMmctAdaptor.StereoFromGeometry
Use mmstereo library to assign stereochemistry from 3D structure (1).
-
ChmMmctAdaptor.StereoFromGeometry_Safe
As above, but ignore mmstereo assignments that Canvas doesn’t agree with (2).
-
ChmMmctAdaptor.StereoFromAnnotation
Use existing mmstero annotations (3).
-
ChmMmctAdaptor.StereoFromAnnotation_Safe
As above, but ignore annotations that Canvas doesn’t agree with (4).
-
ChmMmctAdaptor.StereoFromAnnotationAndGeometry
Use existing mmstero annotations, with assignment from 3D structure as a backup for unspecified stereochemistry (5).
-
ChmMmctAdaptor.StereoFromAnnotationAndGeometry_Safe
As above, but ignore annotations/assignments that Canvas doesn’t agree with (6).
Molecules¶
-
class
schrodinger.application.canvas.base.
ChmMol
¶ Describes a chemical structure.
-
static
fromMDL
(molString[, autoName]) Static method that creates a
ChmMol
from a MOL file string or a single-structure SD file string.Parameters: autoName (bool) – Determines whether a name is generated automatically with each call to this function (“Molecule1”, “Molecule2”, etc.). Default is False.
-
static
fromSMILES
(smiles[, autoName]) Static method that creates a
ChmMol
from a SMILES string.Parameters: autoName (bool) – Determines whether a name is generated automatically with each call to this function (“Molecule1”, “Molecule2”, etc.). Default is False.
-
getAtom
(i) The ith
ChmAtom
in theChmMol
. All atoms (explicit and implicit) are considered, so i ranges from 0 to getAtomCount(True)-1.
-
getAtomCount
(wantImplicitHydrogens=False) Number of atoms in the
ChmMol
. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be counted.
-
getAtoms
(wantImplicitHydrogens=False) Iterable list of atoms (as
ChmAtom
objects) in theChmMol
. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be included.
-
getAtoms
(mask) Iterable list of a subset of atoms in the
ChmMol
. The argument mask is aChmBitset
object, with “on” bits for the subset of atoms in question. The length of mask should be getAtomCount(False).
-
getBond
(i) The ith
ChmBond
in theChmMol
. All atoms (explicit and implicit) are considered, so i ranges from 0 tomol.getBondCount(True)-1
.
-
getBondCount
(wantImplicitHydrogens=False) Number of bonds in the
ChmMol
. The Boolean wantImplicitHydrogens is False by default and determines whether bonds to implicit hydrogens should be counted.
-
getBonds
(wantImplicitHydrogens=False) Iterable list of bonds (as
ChmBond
objects) in theChmMol
. The Boolean wantImplicitHydrogens is False by default and determines whether bonds to implicit hydrogens should be included.
-
getBonds
(bondMask) Iterable list of a subset of bonds in the
ChmMol
. The argument bondMask is aChmBitset
object, with “on” bits for the subset of bonds in question.
-
getBondsForAtoms
(atomMask) Iterable list of bonds that connect a subset of atoms. The argument atomMask is a
ChmBitset
with “on” bits for a subset of atoms in the structure. The atoms need not form a contiguous substructure.
-
getConnectionTable
(wantImplicitHydrogens=False) Connection table: 1 → bonded, 0 → not bonded. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be included. This method returns a
ChmSymmetricMatrix
. Usematrix.getItem(i, j)
on the returned matrix to get the connectivity between atoms i and j. Atom numbers i and j run from 0 tomol.getAtomCount(wantImplicitHydrogens)-1
.
-
getConnectionTableWithBondOrders
(wantImplicitHydrogens=False) Analogous to getConnectionTable, except that
matrix.getItem(i, j)
returns the bond order if atoms i and j are bonded.
-
getDistanceMatrix
(wantImplicitHydrogens=False) Shortest path distance matrix. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be included. This method returns a
ChmSymmetricMatrix
. Usematrix.getItem(i, j)
on the returned matrix to get the distance between atoms i and j. Atom numbers i and j run from 0 tomol.getAtomCount(wantImplicitHydrogens)-1
.
-
getDistanceMatrix3D
(wantImplicitHydrogens=False) Three-dimensional distance matrix. The Boolean wantImplicitHydrogens is False by default and determines whether implicit hydrogens should be included. This method returns a
ChmSymmetricMatrix`
. Usematrix.getItem(i, j)
on the returned matrix to get the distance between atoms i and j. Atom numbers i and j run from 0 tomol.getAtomCount(wantImplicitHydrogens)-1
.
-
getHeavyAtomCount
() Number of heavy atoms.
-
getMW
() Molecular weight.
-
getName
() Structure name.
-
getNetCharge
() Net charge on structure.
-
getProperties
() Name → value mappings of all stored properties. May be coerced into an ordinary Python dictionary as follows:
props = dict(mol.getProperties())
-
getProperty
(propName) The value of a stored property, cast as a string.
-
getPropertyCount
() The number of stored properties.
-
has2DCoords
() True if structure has X and Y coordinates only.
-
has3DCoords
()
True if structure has X, Y and Z coordinates.
-
hasCoords
() True if structure has 2D or 3D coordinates.
-
hasProperty
(propertyName) True if propertyName is one of the stored properties.
-
setName
(name) Sets structure name.
-
setProperty
(propertyName, propertyValue) Sets/adds a property, where propertyValue is a string.
-
toCanonicalSMILES
() Creates a canonical SMILES string from the structure.
-
toMDL
([withHydrogens[, withProps[, withCoords[, name]]]]) Creates an SD file string from the structure.
Parameters: - withHydrogens (int) – Determines whether hydrogens should be included. Choices are 0 (None), 1 (AsWritten), 2 (Polar), 3 (All), 4 (Chiral). Default is 0.
- withProps (bool) – Determines whether stored properties should be output. Default is True.
- withCoords (bool) – Determines whether the existing coordinates should be retained (zeros are written if False). Default is True.
- name (string) – Used to override automatic name assignment (“Molecule1”, “Molecule2”, etc.).
-
toSMILES
(wantKekulized=False) Creates an ordinary SMILES string from the structure. The Boolean wantKekulized is False by default and determines whether aromatic rings should be represented using alternating single/double bonds.
-
static
File I/O¶
File I/O is covered by two classes, ChmSDReader
and
ChmSmilesFileReader
. For additional file I/O, see the
Structure-class-based I/O documentation.
-
class
schrodinger.application.canvas.base.
ChmSDReader
¶ Iterable SD file reader. Note that the following two methods for reading a file are equivalent:
# 1 sdr = canvas.ChmSDReader("file.sdf") for mol in sdr: print "Structure name =",mol.getName() # 2 sdr = canvas.ChmSDReader("file.sdf") while sdr.hasNext(): mol = sdr._next() print "Structure name =",mol.getName()
-
__init__
(sdFile) Creates an SD file reader object and opens it on the specified file.
-
_next
() The next
ChmMol
in the file.
-
getError
() Description of most recent error.
-
hasError
() True if the most recent attempt to read produced an error.
-
hasNext
() True if more structures remain in the file.
-
-
class
schrodinger.application.canvas.base.
ChmSmilesFileReader
¶ Reads a file containing SMILES strings, one per line. If a SMILES string is followed by a space and a second field, that field will be stored as the structure name. A typical workflow to read a file is as follows:
smi = canvas.ChmSmilesFileReader("file.smi") while smi.hasNextMol(): mol = smi.nextMol() print "Structure name =",mol.getName()
-
__init__
(smilesFile) Creates a SMILES file reader object and opens it on the specified file.
-
getError
() Description of most recent error.
-
getFailureCount
() Total number of errors.
-
getLastName
() The most recent structure name. If no name was found in the file, it’s generated automatically as MoleculeN, where N is the structure count (1, 2,…).
-
getLastSMILES
() The most recent SMILES string read.
-
getSuccessCount
() Total number of SMILES successfully read.
-
hasError
() True if the most recent attempt to read produced an error.
-
hasNextMol
() True if more SMILES strings remain in the file.
-
nextMol
() ChmMol
generated from the next SMILES string in the file.
-
skip
() Skip over the next SMILES string.
-
Substructure Matching¶
The classes in this section support the creation and execution of 2D
chemical queries. The ChmQueryMatcher
matches a ChmQuery
to a ChmMol
, and returns an iterable list of ChmMatch
objects.
-
class
schrodinger.application.canvas.base.
ChmMatch
¶ Describes a 2D match.
-
getMatchedAtoms
() Iterable list of the atoms (as
ChmAtom
objects) matched by the query.
-
-
class
schrodinger.application.canvas.base.
ChmQuery
¶ A 2D chemical query. Used in conjunction with
ChmQueryMatcher
.-
__init__
(smarts) Creates a
ChmQuery
from a SMARTS string.
-
__init__
(mol) Creates a
ChmQuery
from aChmMol
.
-
getAtomCount
() The number of atoms in the
ChmQuery
.
-
getSource
() The SMARTS representation of the query. This is a valid method even if the
ChmQuery
was created from aChmMol
.
-
static
fromMDL
(molString) Static method that creates a
ChmQuery
from a MOL file string or a single-structure SD file string.
-
static
fromMDLFile
(fileName) Static method that creates a
ChmQuery
from a MOL file or a single-structure SD file.
-
-
class
schrodinger.application.canvas.base.
ChmQueryMatcher
¶ Matches a
ChmQuery
to aChmMol
.-
__init__
(uniqueFilter=True) The Boolean uniqueFilter is True by default and determines whether only a single mapping should be returned when the query can be mapped to a set of atoms in more than one way (e.g., the query “c1ccccc1” can be mapped to a phenyl ring in 12 ways).
-
getUniqueFilter
() True if only one mapping per match should be returned.
-
hasExactMatch
(query, mol) True if the supplied
ChmQuery
produces an exact match to the suppliedChmMol
.
-
hasMatch
(query, mol) True if the supplied
ChmQuery
produces at least one match to the suppliedChmMol
.
-
matchMask
(query, mol) Matches a
ChmQuery
to aChmMol
and returns an iterable list of matches asChmBitset
objects.
-
matchMaskSubset
(query, mol, bitset) Matches a
ChmQuery
to a subset of atoms in aChmMol
. The argument bitset is aChmBitset
object, with “on” bits for the subset of atoms in question. The length of bitset must be mol.getAtomCount(False) or mol.getAtomCount(True).
-
matchSubset
(query, mol, atoms) Matches a
ChmQuery
to a subset of atoms in aChmMol
. The argument atoms is a list ofChmAtom
objects returned by another Canvas API, such as mol.getAtoms(bitset). It cannot be an ordinary Python list.
-
setUniqueFilter
(uniqueFilter) Turns unique filtering on/off according to the value of the Boolean uniqueFilter.
-
Bitset APIs¶
The classes and methods in this section support the storage, manipulation and comparison of fingerprint data. See Fingerprint APIs for the actual creation of fingerprints from chemical structures.
-
class
schrodinger.application.canvas.base.
ChmBitset
¶ Subclass of
ChmBitComparable
.A fixed length bit string with explicit on/off values (compare
ChmSparseBitset
). For the sake of clarity in describing certain APIs, the example bitset “1,0,0,1,1” will be used. This bitset has a length of 5 with “on” bits at positions 0, 3, and 4.See the
ChmBitComparable
base class for methods common to this class andChmSparseBitset
, including distance and similarity measures.-
__init__
(n) Creates a bitset of length n with all bits off. The length automatically increases if bits higher than n - 1 are turned on.
-
__init__
(onBits, n) Creates a bitset of length n with a specific collection of bits turned on. The argument onBits is an ordinary Python list or tuple, containing the positions of the on bits. The example bitset “1,0,0,1,1” could be created using either of the following statements:
bitset = canvas.ChmBitset([0, 3, 4], 5) bitset = canvas.ChmBitset((0, 3, 4), 5)
The length automatically increases if bits higher than n - 1 are turned on, including bits in onBits.
-
all
() True if all bits are on.
-
any
() True if any bits are on.
-
clear
() Turns off all bits.
-
clear
(i) Turns off bit i. i runs from 0 to size()-1.
-
compare
(bitset) Compares the current
ChmBitset
to the supplied one in a bitwise numeric fashion. Returns 1, 0, or -1, which correspond, respectively, to the current bitset being greater than, equal to, or less than the supplied bitset.
-
contains
(bitset) True if all the on bits in the supplied
ChmBitset
are also on in the currentChmBitset
. The two bitsets must have the same length.
-
empty
() True if all bits are off.
-
flip
() Flips all bits.
-
flip
(i) Flips bit i. If i is greater than size(), the length of the bitset is automatically increased.
-
get
(i) True if bit i is on. Returns False if i is larger than size()-1.
-
hasDifference
(bitset) True if any bit in the current
ChmBitset
differs from the corresponding bit in the suppliedChmBitset
. The two bitsets must have the same length.
-
intersects
(bitset) True if the current
ChmBitset
and the suppliedChmBitset
share at least one on bit. The two bitsets must have the same length.
-
isSubsetOf
(bitset) True if all the on bits in the current
ChmBitset
are also on in the suppliedChmBitset
. The two bitsets must have the same length.
-
resize
(n) Changes the length of the bitset. If shorter, bits are removed from the end; if longer, off bits are appended.
-
set
() Turns on all bits.
-
set
(i) Turns on bit i. If i is greater than size(), the length of the bitset is automatically increased.
-
size
() The length of the bitset.
-
to_bitstring
() A comma-delimited string that represents all bit values. The example bitset would return “1,0,0,1,1”.
-
toString
() A comma-delimited string that contains the on bit positions. The example bitset would return “0,3,4”.
-
toVector
() An ordinary Python tuple of the on bit positions. The example bitset would return (0, 3, 4).
-
-
class
schrodinger.application.canvas.base.
ChmSparseBitset
¶ Subclass of
ChmBitComparable
.A bitset of theoretical length 232, which stores only positions of the “on” bits.
See the
ChmBitComparable
base class for methods common to this class andChmBitset
, including distance and similarity measures.-
__init__
() Creates a sparse bitset with no on bits.
-
__init__
(onBits) Creates a sparse bitset with a specific collection of bits turned on. The argument onBits is an ordinary Python list or tuple, containing the positions of the on bits. The example bitset above could be created using either of the following statements:
bitset = canvas.ChmSparseBitset([0, 3, 4]) bitset = canvas.ChmSparseBitset((0, 3, 4))
-
contains
(bitset) True if all the on bits in the supplied
ChmSparseBitset
are also on in the currentChmSparseBitset
.
-
density
() The fraction of bits that are on, i.e., count()/size().
-
empty
() True if no bits are on.
-
hasDifference
(bitset) True any bit in the current
ChmSparseBitset
differs from the corresponding bit in the suppliedChmSparseBitset
.
-
isSubsetOf
(bitset) True if all the on bits in the current
ChmSparseBitset
are also on in the suppliedChmSparseBitset
.
-
intersects
(bitset) True if the current
ChmSparseBitset
and the suppliedChmSparseBitset
share at least one on bit.
-
positionOf
(i) If bit i is on, its relative position in the list of on bits is returned. If bit i is off, -1 is returned. Note that legal values of i lie on the interval [-21474836748, 21474836747], so the following correction should be made before calling this function:
if i > 2147483647: i = i – 4294967296
-
reduceToBitset
(n) Returns a
ChmBitset
of length n, which approximates the currentChmSparseBitset
. The reduction is done by dividing the coordinate [0, 2^{32} - 1] into n ranges of equal width, and mapping all sparse bits in a given range to a single bit in theChmBitset
object. For example, if n = 1024, the width of each range is 2^{32}/1024 = 4194304, so sparse bits on the interval [k \cdot 4194304, (k+1) \cdot 4194304 - 1] are mapped to bit k in the returnedChmBitset
. Thus if any sparse bit in that range is on, bit k will be on in theChmBitset
.
-
size
() The length of the bitset, i.e. 232.
-
toVector
() Returns a wrapped std::vector of the on bit positions. Note that the return type differs from ChmBitset::toVector (which returns a standard Python tuple) but that behavior should be largely equivalent.
-
-
class
schrodinger.application.canvas.base.
ChmBitComparable
¶ The
ChmBitComparable
class is a base class ofChmBitset
andChmSparseBitset
, but is not directly instantiable.Methods provided by the base class that can be called with any other
ChmBitComparable
are documented here. Methods that require bitsets of a specific type are documented with the appropriate subclass.See Similarity/Distance Measures for definitions of the distance and similarity measures.
-
count
() The number of on bits.
-
countCommonOff
(bitset) The number of off bits shared by the current
ChmBitComparable
and the suppliedChmBitComparable
. The two bitsets must have the same length.
-
countCommonOn
(bitset) The number of on bits shared by the current
ChmBitComparable
and the suppliedChmBitComparable
. The two bitsets must have the same length.
-
countDifference
(bitset) The number of bit differences between the current
ChmBitComparable
and the suppliedChmBitComparable
. The two bitsets must have the same length.
-
distDixon
(bitset) Dixon distance between the current
ChmBitComparable
and the supplied one.
-
distEuclidean
(bitset) Euclidean distance.
-
distHamming
(bitset) Hamming distance.
-
distPatternDifference
(bitset) Pattern difference distance.
-
distShape
(bitset) Shape distance.
-
distSize
(bitset) Size distance.
-
distSoergel
(bitset) Soergel distance.
-
distSquaredEuclidean
(bitset) Squared Euclidean distance.
-
distVariance
(bitset) Variance distance.
-
simBuser
(bitset) Buser similarity.
-
simCosine
(bitset) Cosine similarity.
-
simDice
(bitset) Dice similarity.
-
simHamann
(bitset) Hamann similarity.
-
simKulczynski
(bitset) Kulczynski similarity.
-
simMatching
(bitset) Matching similarity.
-
simMcConnaughey
(bitset) McConnaughey similarity.
-
simMinMax
(bitset) MinMax similarity.
-
simModifiedTanimoto
(bitset) Modified Tanimoto similarity.
-
simPearson
(bitset) Pearson similarity.
-
simPetke
(bitset) Petke similarity.
-
simRogersTanimoto
(bitset) Rogers Tanimoto.
-
simSimpson
(bitset) Simpson similarity.
-
simTanimoto
(bitset) Tanimoto similarity.
-
simTversky
(bitset, alpha, beta) Tversky similarity.
-
simYule
(bitset) Yule similarity.
-
Similarity/Distance Measures¶
A large number of methods are available for computing the similarity or
distance between two ChmBitComparable
objects bitset1
and
bitset2
. Some of these methods will not be familiar to most users, so
precise definitions are supplied here.
- Let:
- a \equiv Count of “on” bits in bitset1.
- b \equiv Count of “on” bits in bitset2.
- c \equiv Count of bits that are “on” in both bitset1 and bitset2.
- d \equiv Count of bits that are “off” in both bitset1 and bitset2.
- A \equiv a - c = count of bits that are exclusively “on” bits in bitset1.
- B \equiv b - c = count of bits that are exclusively “on” bits in bitset2.
- s \equiv length of bitset.
Buser similarity
(\sqrt{c d} + c)/(\sqrt{c d} + a + b - c)
Cosine similarity
c/\sqrt{a b}
- Dice similarity
- c/(0.5(a+b))
- Dixon distance
- (1.0 - \mathsf{Tanimoto}) \cdot \mathsf{Hamming}
- Euclidean distance
- \sqrt{A+B}
- Hamann similarity
- (c+d-A-B)/(A+B+c+d)
- Hamming distance
- A+B
- Kulczynski similarity
- 0.5 (c/a + c/b)
- Matching similarity
- (c+d)/s
- McConnaughey similarity
- (c^2 - (a-c)(b-c))/(ab)
- MinMax similarity
- \sum_i{\min[\mathsf{bitset1}(i), \mathsf{bitset2}(i)]/\max[\mathsf{bitset1}(i), \mathsf{bitset2}(i)]}
Modified Tanimoto similarity
\alpha \mathsf{Tanimoto} + (1-\alpha))T_0 where \alpha \equiv 2/3 - (a+b)/[6 \cdot \min(d, 10000)] and T_0 \equiv d/(a+b-2c+d) = Tanimoto of “off” bits
- Pattern difference distance
- (AB)/(A+B+c+d)^2
- Pearson similarity
- (cd - AB)/\sqrt{ab(A+d)(B+d)}
- Petke similarity
- c/\max(a,b)
- Rogers Tanimoto similarity
- (c+d)/(2(a+b) - 3c+d)
- Shape distance
- (A-B)/(A+B+c+d) - ((A-B)/(A+B+c+d))^2
- Simpson similarity
- c/\min(a,b)
- Size distance
- (A-B)^2/(A+B+c+d)^2
- Soergel distance
- (A+B)/(A+B+c)
- Tanimoto similarity
- c/(a+b-c)
- Tversky (\alpha,\beta) similarity
- c/(\alpha(a-c) + \beta(b-c)+c)
- Variance distance
- (A+B)/(4(A+B+c+d))
- Yule similarity
- (cd-A-B)/(cd+AB)
Fingerprint APIs¶
The classes and methods in this section support the creation and I/O of binary fingerprints from chemical structures.
Atom Typing Schemes¶
A Canvas fingerprint is generated by applying a set of rules that decompose
a structure into a set of fragments/features, each of which is hashed to a
32-bit integer code that turns on a bit in an underlying
ChmSparseBitset
object. The integer code that’s generated depends
upon the way in which atoms and bonds are distinguished, and the available
schemes are summarized below.
Scheme | Description |
---|---|
1 | All atoms and bonds are equivalent. |
2 | Atoms are distinguished by whether they are hydrogen bond (HB) acceptors or donors; all bonds are equivalent. |
3 | Atoms are distinguished by hybridization state; all bonds are equivalent. |
4 | Atoms are distinguished by functional type: {H}, {C}, {F,Cl}, {Br,I}, {N,O}, {S}, {other}. All bonds are equivalent. |
5 | Sybyl Mol2 atom types; all bonds are equivalent. |
6 | Atoms are distinguished by whether they are terminal, halogen, HB acceptor/donor; all bonds are equivalent. |
7 | Atomic number and bond order. |
8 | Atoms are distinguished by ring size, aromaticity, HB acceptor/donor, whether terminal, whether halogen; bonds are distinguished by bond order. |
9 | Carhart atom types (atom-pairs approach); all bonds are equivalent. |
10 | Daylight invariant atom types; bonds are distinguished by bond order. |
11 | Same as 7, but distinguishing aromatic from non-aromatic. |
12 | Same as 10, but distinguishing cyclic aliphatic from acyclic aliphatic. |
Base Fingerprint Classes¶
All output fingerprint classes listed below are
subclasses of ChmFPOut32
(which can’t be directly instantiated) and are
responsible for both the creation of fingerprints and writing them to files.
-
class
schrodinger.application.canvas.base.
ChmFPOut32
¶ -
close
() Closes the binary file on which the object is opened.
-
close
(minOnFreq, maxOffFreq, maxBits[, reduceBits]) Filters the bits according to their statistics across all fingerprints in the file, then closes the file. A bit will be eliminated unless the fraction of the time it is on is at least minOnFreq and the fraction of the time it is off is no more than maxOffFreq. These two parameters work as described only when maxBits is 0 and reduceBits is 0 or omitted. To keep only the maxBits most statistically significant bits, set minOnFreq to 0.0, set maxOffFreq to 1.0, set maxBits to the desired value, and set reduceBits to 0, or omit it. To reduce the length of the fingerprint from 232 to 232-reduceBits, set minOnFreq to 0.0, set maxOffFreq to 1.0, set maxBits to 0, and set reduceBits to the desired value.
-
generate
(mol) Generates a fingerprint from a
ChmMol
and returns it as aChmSparseBitset
.
-
generateMap
(mol) Generates a fingerprint from a
ChmMol
, maps the bits back to the atoms and bonds in the structure, and returns the mappings as aChmFPMap32
.
-
getCurrentRowCount
() The number of fingerprints written to the binary file on which the object is opened.
-
getTypeInfo
() A textual summary of the fingerprint settings.
-
open
(fileName) Opens a binary file to which fingerprints may be written.
-
write
(mol[, extraData]) Writes the fingerprint for the provided molecule to a previously opened file.
Parameters: extraData (dict) – Allows recording of extra information with a given fingerprint. This must be a dictionary with string keys and values.
-
write
(bitset, id) Deprecated since version 2011: Use write(mol) or the
ChmCustomOut32
write(sparse_bitset, id) method instead.
-
Fingerprint/Output Classes¶
-
class
schrodinger.application.canvas.base.
ChmCustomOut32
¶ Subclass of
ChmFPOut32
.This class writes custom (user-defined) fingerprints to a file.
-
__init__
(type_info) The type_info argument is a string describing the type.
-
write
(sparse_bitset, id, extraData=None) Write the fingerprint provided in sparse_bitset to the fingerprint output file. (The file must first be opened wiht the open method.)
The id argument must be a string. If provided, extraData should be a dictionary with string keys and values.
-
-
class
schrodinger.application.canvas.base.
ChmDendriticOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a dendritic fingerprint of theoretical length 232. Dendritic fingerprints encode linear and branched paths, where a branched path consists of two or more intersecting linear paths.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 10. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
getMaxPath
() The maximum number of bonds in linear paths
-
getMinPath
() The minimum number of bonds in linear paths.
-
setMaxPath
(max) Sets the maximum number of bonds in linear paths. The default is 5.
-
setMinPath
(min) Sets the minimum number of bonds in linear paths. The default is 0.
-
-
class
schrodinger.application.canvas.base.
ChmLinearOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a linear fingerprint of theoretical length 232. Linear fingerprints encode linear paths and ring closure.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 10. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
getHalfStep
() True if fragments can end with a bond.
-
getMaxPath
() The maximum number of bonds in linear paths.
-
getMaxRingPath
() The maximum number of bonds that will be traversed in order to close a ring.
-
getMinPath
() The minimum number of bonds in linear paths.
-
setHalfStep
(doHalf) Controls whether linear fragments can start/end at a bond. If False, all linear fragments will start and end at an atom, which is the default treatment.
-
setMaxPath
(max) Sets the maximum number of bonds in linear paths. The default is 7.
-
setMaxRingPath
(max) Sets the maximum number of bonds that will be traversed in order to close a ring. The default is 14. If 0, all paths encoded in the fingerprint will be self-avoiding.
-
setMinPath
(min) Sets the minimum number of bonds in linear paths. The default is 0.
-
-
class
schrodinger.application.canvas.base.
ChmMolprint2D32
¶ Subclass of
ChmFPOut32
.Creates and outputs MOLPRINT 2D fingerprints of theoretical length 232. MOLPRINT 2D fingerprints encode atom environments using lists of atom types located at topological distances of 0 to 2 bonds from each heavy atom.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 5. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
-
class
schrodinger.application.canvas.base.
ChmPairwiseOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a pairwise fingerprint of theoretical length 232. Pairwise fingerprints encode pairs of atoms, differentiated by type, and the distance separating them: Typei-Typej-dij.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 9. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
set3D
(use3D) Controls whether 3D distances will be used. If False, all distances will be 2D, which is the default treatment. If True, setBinWidth must also be called.
-
setBinWidth
(dbin) Sets the bin width for 2D/3D distances. The default for 2D distances is 1. There is no default for 3D distances, so this function must be called if using 3D distances.
-
setFuzzyDistance
(dfuzz) If a 2D or 3D distance is within dfuzz of a boundary, turn on the bit corresponding to the neighboring bin.
-
setMaxDistance
(dmax) Sets the maximum 2D/3D distance to consider. The default is no limit.
-
setMinDistance
(dmin) Sets the minimum 2D/3D distance to consider. The default is 0.
-
-
class
schrodinger.application.canvas.base.
ChmRadialOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a radial fingerprint of theoretical length 232. Radial fingerprints (also known as extended connectivity fingerprints) encode fragments that grow radially from each heavy atom.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 4. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
getIterations
() The number of iterations of radial growth, which is also the radial size of the largest fragments.
-
getMinSize
() The minimum radial size of fragments that will be retained.
-
setIterations
(iter) Sets the number of iterations of radial growth, which is also the radial size of the largest fragments in the fingerprint. The default is 4.
-
setMinSize
(min) Sets the minimum radial size of fragments to retain. All fragments generated prior to iteration min will be discarded. The default is 0.
-
-
class
schrodinger.application.canvas.base.
ChmSMARTSOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a custom fingerprint whose bits are defined by a series of SMARTS patterns in a text file:
SMARTS1 bitName1 SMARTS2 bitName2 etc.
Each SMARTS pattern and bit name must be separated by one or more spaces.
-
__init__
(smartsFile, wantCSV=False) The Boolean wantCSV is False by default and determines whether the output fingerprint file should be a CSV file.
-
-
class
schrodinger.application.canvas.base.
ChmToplogicalTorsionOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a topological torsion fingerprint of theoretical length 232. Topological torsion fingerprints encode linear paths of 4 atoms, differentiated by type: Typei-Typej-Typek-Typel.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 10. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
-
class
schrodinger.application.canvas.base.
ChmTripletOut32
¶ Subclass of
ChmFPOut32
.Creates and outputs a triplet fingerprint of theoretical length 232. Triplet fingerprints encode triplets of atoms, differentiated by type, and the distances separating them: Typei-dij-Typej-djk-Typek-dki.
-
__init__
([atomTypingScheme]) Parameters: atomTypingScheme (int) – Default is 10. Value must be in the range 1-12. See Atom Typing Schemes for details.
-
set3D
(use3D) Controls whether 3D distances will be used. If False, all distances will be 2D, which is the default treatment. If True, setBinWidth must also be called.
-
setBinWidth
(dbin) Sets the bin width for 2D/3D distances. There is no default for 3D distances, so this function must be called if using 3D distances.
-
setFuzzyDistance
(dfuzz) If a 2D or 3D distance is within dfuzz of a boundary, turn on the bit corresponding to the neighboring bin.
-
setMaxDistance
(dmax) Sets the maximum 2D/3D distance to consider. The default is no limit.
-
setMinDistance
(dmin) Sets the minimum 2D/3D distance to consider. The default is 0.
-
Input¶
-
class
schrodinger.application.canvas.base.
ChmFPIn32
¶ Reads a binary fingerprint file created by
ChmFPOut32
subclasses, and extracts each fingerprint as aChmSparseBitset
. A typical workflow to read a file is as follows:fpIn = canvas.ChmFPIn32("linear.fp") while fpIn.hasNext(): fp = fpIn.next # A ChmSparseBitset
-
__init__
(fileName) Creates the object and opens a binary fingerprint file.
-
__iter__
() The ChmFPIn32 object can be used as an iterator. Values yielded are the next fingerprint in the file, as a
ChmSparseBitset
of length 232.
-
extraIterator
() The extraIterator yields a tuple of (bitset, id, extraData) (as opposed to the bitset only behavior of the default iterator). The id is a string, and extraData is a dictionary of extra data as written to the file with
ChmFPOut32.write()
.
-
filter
(minOnFreq, maxOffFreq, maxBits, reduceBits) Filters the bits according to their statistics across all fingerprints in the file, then rewrites the file. A bit will be eliminated unless the fraction of the time it is on is at least minOnFreq.and the fraction of the time it is off is no more than maxOffFreq. These two parameters work as described only when maxBits is 0 and reduceBits is 0. To keep only the maxBits most statistically significant bits, set minOnFreq to 0.0, set maxOffFreq to 1.0, set maxBits to the desired value, and set reduceBits to 0. To reduce the length of the fingerprint from 232 to 232‑reduceBits, set minOnFreq to 0.0, set maxOffFreq to 1.0, set maxBits to 0, and set reduceBits to the desired value. If reduceBits is applied to the same file repeatedly, the fingerprint will become progressively shorter, until it contains only one bit.
-
getFileName
() The name of the binary fingerprint file on which the object is opened.
-
getOnBits
() A
ChmSparseBitset
that contains the union of on bits in the fingerprint file.
-
getPos
() The number of fingerprints that have been read from the file on which the object is opened.
-
getRowCount
() The number of fingerprints contained in the file on which the object is opened.
-
getTypeInfo
() A textual summary of the fingerprint settings.
-
getUnionCount
() Equivalent to getOnBits().count().
-
hasNext
() True if more fingerprints remain in the file.
-
next
() The next fingerprint in the file, as a
ChmSparseBitset
of length 232.
-
nextReduced
() The next fingerprint in the file, as a
ChmBitset
of length getUnionCount().
-
rewind
() Rewinds the file to the beginning.
-
Bit Mapping¶
-
class
schrodinger.application.canvas.base.
ChmFPMap32
¶ Provides a mapping of each fingerprint bit back to the atoms and bonds that are responsible for setting that bit. A
ChmFPMap32
object is obtained by calling the generateMap method of aChmFPOut32
subclass.-
getCode
(i) The 32-bit integer code for the ith on bit, where i runs from 0 to getCodeCount()-1. This is the position of the on bit in an underlying
ChmSparseBitset
object. Note that if the underlying unsigned value is larger than 2147483647, a negative value will be returned. In other words, ifcode > 2147483647
,code
is mapped tocode – 4294967296
. The returned value can be safely used in all subsequent APIs that accept a code.
-
getCodeCount
() The number of unique codes in the fingerprint, i.e., the number of on bits.
-
getAtomsMaskForCode
(code) Iterable list of
ChmBitset
objects, each of which encodes a set of atoms that are associated with the bit. If the length of the list is greater than one, the bit maps to more than one set of atoms.
-
getBondsMaskForCode
(code) Iterable list of
ChmBitset
objects, each of which encodes a set of bonds that are associated with the bit. If the length of the list is greater than one, the bit maps to more than one set of bonds.
-
getOnBits
() Returns the underlying
ChmSparseBitset
object that holds the fingerprint.
-
hasCode
(code) True if code is the position of an on bit.
-
Pairwise I/O¶
-
class
schrodinger.application.canvas.base.
ChmPairwiseFPIn
¶ Reads and stores a matrix of similarities supplied in binary or CSV format.
-
__init__
(inFile, matrixFormat[, forceSquare]) Reads a similarity/distance matrix from the file inFile. The integer matrixFormat must be one of the matrix file format constants.
Parameters: forceSquare (int) – Controls whether the matrix must be square. Default is False.
-
__init__
(inStream, matrixFormat[, forceSquare]) Reads a similarity/distance matrix from an input stream. Use
canvas.ifstream(inFile)
to create an input stream on a file named inFile.Parameters: forceSquare (int) – Controls whether the matrix must be square. Default is False.
-
cols
() The number of columns in the matrix.
-
getColumnName
(i) The ith column name. i runs from 0 to cols()-1.
-
getRowName
(i) The ith row name. i runs from 0 to rows()-1.
-
isDistanceMatrix
() True if the matrix is square and the diagonals are 0.
-
isSimilarityMatrix
() True if the matrix is square and the diagonals are 1.
-
isSquare
() True if the matrix is square.
-
isSymmetric
() True if the matrix is symmetric.
-
isValid
() True if the matrix is square and the diagonals are 1 or 0.
-
rows
() The number of rows in the matrix.
-
-
class
schrodinger.application.canvas.base.
ChmPairwiseFPOut32
¶ Given one or two fingerprint files created by a
ChmFPOut32
subclass, this class computes and outputs a similarity or distance matrix in binary or CSV format. Not to be confused withChmPairwiseOut32
, which computes and outputs a pairwise fingerprint.-
__init__
(outFile, forceBinary, alpha, beta, useUnionCount) The matrix file outFile is binary by default, but it may be changed to CSV by calling setHumanReadable(True). The Boolean forceBinary controls whether fingerprints with scaled bit values are treated as ordinary 0/1 values. This parameter is relevant only when processing fingerprints created canvasFPGen with the –scaling option, and it should normally be set to True. The floating point parameters alpha and beta are applied only when Tversky similarities are computed, and they may be set to 0 otherwise. The useUnionCount parameter is a boolean that should generally be set to False. If True, the union of the count of on bits is used in computing the similarity metrics instead of the count of common off bits. (This helps with numerical accuracy when you are using a metric that considers common off bits, such as Buser, Hamann, matching, pattern difference, Pearson, Rogers Tanimoto, shape, size, variance, and Yule. However, it does mean that scale of the values can be affected by the addition or subtraction of fingerprints from the set.)
-
getFlattening
() True if similarities will be flattened through an exponential transformation.
-
getHumanReadable
() True if the matrix file will be written in CSV format.
-
getMetric
() The integer-valued similarity/distance method.
-
process
(fpFile[, blockSize]) Computes and outputs a symmetric matrix of self-similarities/distances for the fingerprints in fpFile, which must be a binary file.
Parameters: blockSize (int) – Determines how many fingerprints will be held in memory at one time. Default is 100.
-
process
(fpFile, start, stop[, blockSize]) Computes and outputs a symmetric matrix of self-similarities/distances for a subset of the fingerprints in fpFile. The integers start and stop are zero-based indices. If stop is greater than or equal to the number of fingerprints in the file, the matrix will be padded with zeros.
-
process
(fpFile1, fpFile2[, blockSize]) Computes and outputs a rectangular matrix of similarities/distances for two fingerprint files. Rows are spanned by fpFile1 and columns are spanned by fpFile2. The two files must contain the same type of fingerprint.
-
process
(fpFile1, start1, stop1, fpFile2, start2, stop2[, blockSize]) Computes and outputs a rectangular matrix of similarities/distances for subsets of fingerprints in two files.
-
setFlattening
(coeff) Subject similarities to the following exponential transformation that drives smaller values toward zero: \mathrm{sim } \rightarrow \exp[-\mathrm{coeff}\cdot (1-\mathrm{ sim})^2]. A reasonable value of coeff is 25. Not applied when computing distances.
-
setHumanReadable
(writeCSV) The Boolean writeCSV controls whether similarities/distances will be written in CSV format. The default format is binary.
-
setMetric
(code) The integer code controls how similarities/distances will be computed and it must lie in the range 1-24. See Similarity/Distance Method Constants for definitions and symbolic constants. The default is 21 (Tanimoto similarity).
-
Matrix File Formats¶
Matrix file format constants are scoped at the canvas module level. The
available formats are Binary
, CSV
, CSV_NoColHeader
, CSV_NoRowHeader
,
CSV_NoRowOrColHeader
, and ChmPairwiseFPIn
.
-
schrodinger.application.canvas.base.
Binary
Binary format (0).
-
schrodinger.application.canvas.base.
CSV
Comma-separated values, with column and row headers (1).
-
schrodinger.application.canvas.base.
CSV_NoColHeader
CSV file does not contain column names on first line (2).
-
schrodinger.application.canvas.base.
CSV_NoRowHeader
CSV file does not contain row names in the first column (3).
-
schrodinger.application.canvas.base.
CSV_NoRowOrColHeader
CSV file does not contain row names or column names (4).
Similarity/Distance Method Constants¶
One of these integer codes must be supplied when calling the
ChmPairwiseOut32.setMetric
method.
-
schrodinger.application.canvas.base.
buser
Buser similarity (1)
-
schrodinger.application.canvas.base.
cosine
Cosine similarity (2)
-
schrodinger.application.canvas.base.
dice
Dice similarity (3)
-
schrodinger.application.canvas.base.
dixon
Dixon distance (4)
-
schrodinger.application.canvas.base.
euclidean
Euclidean distance (5)
-
schrodinger.application.canvas.base.
hamann
Hamann similarity (6)
-
schrodinger.application.canvas.base.
hamming
Hamming distance (7)
-
schrodinger.application.canvas.base.
kulczynski
Kulczynski similarity (8)
-
schrodinger.application.canvas.base.
matching
Matching similarity (9)
-
schrodinger.application.canvas.base.
mcConnaughey
McConnaughey similarity (10)
-
schrodinger.application.canvas.base.
minmax
MinMax (11)
-
schrodinger.application.canvas.base.
modifiedTanimoto
Modified Tanimoto (12)
-
schrodinger.application.canvas.base.
patternDifference
Pattern difference distance (13)
-
schrodinger.application.canvas.base.
pearson
Pearson similarity (14)
-
schrodinger.application.canvas.base.
petke
Petke similarity (15)
-
schrodinger.application.canvas.base.
rogersTanimoto
Rogers Tanimoto similarity (16)
-
schrodinger.application.canvas.base.
shape
Shape distance (17)
-
schrodinger.application.canvas.base.
simpson
Simpson similarity (18)
-
schrodinger.application.canvas.base.
size
Size distance (19)
-
schrodinger.application.canvas.base.
soergel
Soergel distance (20)
-
schrodinger.application.canvas.base.
tanimoto
Tanimoto similarity (21) (default)
-
schrodinger.application.canvas.base.
tversky
Tversky similarity (22)
-
schrodinger.application.canvas.base.
variance
Variance similarity (23)
-
schrodinger.application.canvas.base.
yule
Yule similarity (24)
Hierarchical Clustering APIs¶
The classes and methods in this section support hierarchical agglomerative clustering on a matrix of similarities or distances.
-
class
schrodinger.application.canvas.base.
ChmHierarchicalClustering
¶ Performs a full hierarchical, agglomerative clustering procedure, merging n objects into n–1 clusters, n–2 clusters, …, 1 cluster.
The following linkage method constants determine how the distance between two clusters is computed.
-
SingleLinkage
Closest inter-cluster pair (1).
-
CompleteLinkage
Farthest inter-cluster pair (2).
-
GroupAverageLinkage
Average distance between all inter-cluster pairs (3).
-
UnweightedCentroidLinkage
Euclidean distance between cluster centroids (4).
-
WeightedAverageLinkage
Average distance to the two clusters merged in forming a given cluster (5).
-
WardsMinimumVarianceLinkage
Sum of squared distances to merged cluster centroid (6).
-
WeightedCentroidLinkage
Weighted center of mass distance, aka median (7).
-
FlexibleBetaLinkage
Weighted average intra-cluster and inter-cluster distances, aka Lance-Williams (8).
-
SchrodingerLinkage
Closest distance between terminal (R \leftrightarrow L) points in 1D cluster orderings (9).
The following pairwise selection constants indicate the preference for using the upper or lower triangle of the similarity/distance matrix M in case it’s not symmetric.
-
UseEither
No preference; matrix is symmetric (0).
-
UseMinAB
Use min{M(i, j), M(j, i)} (1).
-
UseMaxAB
Use max{M(i, j), M(j, i)} (2).
-
UseUpper
Use M(i, j), where i < j (3).
-
UseLower
Use M(i, j), where i > j (4).
-
__init__
([linkageMethod[, autoScale]]) Parameters: - linkageMethod (int) – Default is
ChmHierarchicalClustering.GroupAverageLinkage
. - autoScale (bool) – Affects only the stress value reported by the cluster methods, not how the actual clustering is done. Default is True.
- linkageMethod (int) – Default is
-
cluster
(pairwiseIn, treeStreamOut, selectionType) Performs clustering on a
ChmPairwiseIn
object and returns a value that measures the level of stress in the 1D ordering. Information required to create a dendrogram is written to the output stream treeStreamOut, which can be created using canvas.ofstream(treeFile), where treeFile is name of the destination file. treeFile can be supplied to the canvas utility canvasTreeDraw to create a PostScript dendrogram. selectionType must be one of the pairwise selection constants.
-
cluster
(pairwiseIn, treeStreamOut, baseStreamOut, selectionType) Writes a summary of the clustering process to the output stream baseStreamOut. See
ChmClusterGrouper
for subsequent usage of the baseFile associated with baseStreamOut. The return value and other parameters are as described previously.
-
cluster
(pairwiseIn, treeStreamOut, baseStreamOut, statsStreamOut, selectionType, allowTies) Writes statistics at each clustering level to the CSV output stream statsStreamOut. The Boolean allowTies controls whether more than one cluster member can be flagged as nearest to centroid or farthest from centroid if those members are equidistant to the centroid. This parameter has no effect on clustering and is actually relevant only when groupings are output (see next method). The return value and other parameters are as described previously.
-
cluster
(pairwiseIn, treeStreamOut, groupStreamOut, n, selectionType, noSingletons, allowTies) Writes groupings for the formation of n clusters to the CSV output stream groupStreamOut. If the Boolean noSingletons is True, groupings will be reported only if there are at least n non‑singleton clusters. Otherwise, groupStreamOut will contain only singletons (i.e., no groupings). The return value and other parameters are as described previously.
-
cluster
(pairwiseIn, treeStreamOut, groupStreamOut, d, selectionType, noSingletons, allowTies) Directly analogous to the previous method except that d is a threshold on the merging distance. All clusters formed at or below this distance will be written to groupStreamOut.
-
-
class
schrodinger.application.canvas.base.
ChmClusterGrouper
¶ Generates cluster groupings from the baseFile created by a
ChmHierarchicalClustering
object, or by the utilities canvasHC and canvasHCBuild.-
__init__
() Instantiates a
ChmClusterGrouper
object.
-
group
(pairwiseIn, baseStreamIn, groupStreamOut, n, noSingletons, allowTies) Given a
ChmPairwiseFPIn
object pairwiseIn and an associated input stream baseStreamIn that’s opened on a baseFile, this function writes groupings for the formation of n clusters to the CSV output stream groupStreamOut. If the Boolean noSingletons is True, groupings will be reported only if there are at least n non‑singleton clusters. Otherwise, groupStreamOut will contain only singletons (i.e., no groupings). The Boolean allowTies controls whether more than one cluster member can be flagged as nearest to centroid or farthest from centroid if those members are equidistant to the centroid.
-
groupByThreshold
(pairwiseIn, baseStreamIn, groupStreamOut, d, allowTies) Analogous to the previous method except that d is a threshold on the merging distance. All clusters formed at or below this distance will be written to groupStreamOut.
-
getDforN
(pairwiseIn, baseStreamIn, n) Returns the merging distance for the formation of n clusters.
-
getNforD
(pairwiseIn, baseStreamIn, d) Returns the number of clusters formed at below the merging distance d.
-