schrodinger.application.canvas.cluster module

Canvas clustering functionality.

There are classes to perform custering and to support command line and graphical interfaces to the clustering options.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.application.canvas.cluster.mktemp()

A simple wrapper to tempfile.mkstemp which closes the file and returns just the name of the file

class schrodinger.application.canvas.cluster.CanvasFingerprintCluster(logger)

Bases: object

A class which handles clustering of canvas fingerprints. This maintains a list of the possible linkage types and keeps track of the current type of linkage specified

LINKAGE_TYPES = ['Single', 'Complete', 'Average', 'Centroid', 'McQuitty', 'Ward', 'Weighted Centroid', 'Flexible Beta', 'Schrodinger']
__init__(logger)

Initialize the instance of the cluster class

getDescription()

Returns a string representing a summary of the current linkage settings

debug(output)

Wrapper for debug logging, just to simplify logging

setLinkage(linkage)

Set the current linkage based on the linkage name

getCurrentLinkage()

Returns the current linkage definition

clusterDM(dm_file_name)

Cluster the distance matrix file given in dm_file_name, using similarity settings encapsulated in dp_sim. The value returned is the cluster strain. The dm_file_name should point to a CSV file containing the matrix

generateDM(dm_file_name, fp_file, fp_gen, fp_sim)

Generate a distance matrix of the specified filename from the finger print file fp_file. The fp_gen and fp_sim objects encapsulate the current fingerprint and similarity settings

clusterFP(fp_file, fp_gen, fp_sim)

Cluster the fingerprints contained in fp_file. The bitsize will be taken from the CanvasFingerpintGenerator(). The similarity metric will be taken from the CanvasFingerprintSimilarity object fp_sim This function returns the ‘strain’ reported by the clustering

group(num_clusters)

Perform a grouping operation based on an existing clustering run. If the clustering has not actually been performed yet then an exception will be raised.

getMatrixTime()

Returns the time required for distance matrix generation

getClusterTime()

Returns the time required for clustering

getGroupTime()

Returns the time required for group creation

getClusteringMap()

Once grouping has been done this method may be called to return a dictionary where the keys represent the original fingerprint IDs (usually the position of the structure in the file or the entry ID) and the values are the cluster this structure belongs to

getClusterContents()

Once grouping has been done this method may be called to return a dictionary where the keys represent the cluster number and the values are a list of ID (usually position in the file or entry ids)

getDistanceToCentroid(item)

For a given item in the most recent cluster grouping return the distance to the centroid of the cluster which contains this item

getIsNearestToCentroid(item)

For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid

getIsFarthestFromCentroid(item)

For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid

getMaxDistanceFromCentroid(item)

For a given item in the most recent cluster grouping return the maximum distance to the centroid for any item in the cluster

getAverageDistanceFromCentroid(item)

For a given item in the most recent cluster grouping return the average distance to the centroid for any item in the cluster

getClusterVariance(item)

For a given item return the variance of the cluster which that item belongs to.

getBestNumberOfClusters()

The cluster statistics file contains information about each clustering level. This function returns the number of clusters at which the Kelley function has a minimum

getNumberOfClustersList()

Returns the number of clusters at each level

getRSquaredList()

Returns the r-squared value at each clustering level

getSemiPartialRSquaredList()

Returns the semi-partial R-squared value at each clustering level

getKelleyPenaltyList()

Returns the Kelley Penalty value at each clustering level

getMergeDistanceList()

Returns the merge distance value at each clustering level

getSeparationRatioList()

Returns the separation ratio - calculated from the merge distance of

getDendrogramData()

Returns a tuple with 1) a list of line positions, each in the form [x1,x2][y1,y2] each one of which defines a line segment to be plotted in a dendrogram 2) a list of x-axis tick positions 3) a list of x-axis tick labels

getDistanceMatrixFile()

Returns the name of the distance matrix file used in the most recent clustering

getClusterOrderMap(num_clusters)

Returns a dictionary where the keys are the item labels and the values represent the index it would have in the grouping which places the items in cluster order

class schrodinger.application.canvas.cluster.CanvasFingerprintClusterCLI(logger)

Bases: schrodinger.application.canvas.cluster.CanvasFingerprintCluster

A subclass of the canvas fingerprint cluster manager which is to be used from a program with a command line interface. This class has methods for defining options in an option parser and for applying those options once they’ve been parsed. The idea is to provide a standard command line interface for setting the clustering options

__init__(logger)

Initialize the instance of the cluster class

addOptions(parser)

Add options for cluster linkage

parseOptions(options)

Examine the options and set the internal state to reflect them

getLinkageDescription()

Return a string which contains a description of the linkage methods available for cluster linkage

LINKAGE_TYPES = ['Single', 'Complete', 'Average', 'Centroid', 'McQuitty', 'Ward', 'Weighted Centroid', 'Flexible Beta', 'Schrodinger']
clusterDM(dm_file_name)

Cluster the distance matrix file given in dm_file_name, using similarity settings encapsulated in dp_sim. The value returned is the cluster strain. The dm_file_name should point to a CSV file containing the matrix

clusterFP(fp_file, fp_gen, fp_sim)

Cluster the fingerprints contained in fp_file. The bitsize will be taken from the CanvasFingerpintGenerator(). The similarity metric will be taken from the CanvasFingerprintSimilarity object fp_sim This function returns the ‘strain’ reported by the clustering

debug(output)

Wrapper for debug logging, just to simplify logging

generateDM(dm_file_name, fp_file, fp_gen, fp_sim)

Generate a distance matrix of the specified filename from the finger print file fp_file. The fp_gen and fp_sim objects encapsulate the current fingerprint and similarity settings

getAverageDistanceFromCentroid(item)

For a given item in the most recent cluster grouping return the average distance to the centroid for any item in the cluster

getBestNumberOfClusters()

The cluster statistics file contains information about each clustering level. This function returns the number of clusters at which the Kelley function has a minimum

getClusterContents()

Once grouping has been done this method may be called to return a dictionary where the keys represent the cluster number and the values are a list of ID (usually position in the file or entry ids)

getClusterOrderMap(num_clusters)

Returns a dictionary where the keys are the item labels and the values represent the index it would have in the grouping which places the items in cluster order

getClusterTime()

Returns the time required for clustering

getClusterVariance(item)

For a given item return the variance of the cluster which that item belongs to.

getClusteringMap()

Once grouping has been done this method may be called to return a dictionary where the keys represent the original fingerprint IDs (usually the position of the structure in the file or the entry ID) and the values are the cluster this structure belongs to

getCurrentLinkage()

Returns the current linkage definition

getDendrogramData()

Returns a tuple with 1) a list of line positions, each in the form [x1,x2][y1,y2] each one of which defines a line segment to be plotted in a dendrogram 2) a list of x-axis tick positions 3) a list of x-axis tick labels

getDescription()

Returns a string representing a summary of the current linkage settings

getDistanceMatrixFile()

Returns the name of the distance matrix file used in the most recent clustering

getDistanceToCentroid(item)

For a given item in the most recent cluster grouping return the distance to the centroid of the cluster which contains this item

getGroupTime()

Returns the time required for group creation

getIsFarthestFromCentroid(item)

For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid

getIsNearestToCentroid(item)

For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid

getKelleyPenaltyList()

Returns the Kelley Penalty value at each clustering level

getMatrixTime()

Returns the time required for distance matrix generation

getMaxDistanceFromCentroid(item)

For a given item in the most recent cluster grouping return the maximum distance to the centroid for any item in the cluster

getMergeDistanceList()

Returns the merge distance value at each clustering level

getNumberOfClustersList()

Returns the number of clusters at each level

getRSquaredList()

Returns the r-squared value at each clustering level

getSemiPartialRSquaredList()

Returns the semi-partial R-squared value at each clustering level

getSeparationRatioList()

Returns the separation ratio - calculated from the merge distance of

group(num_clusters)

Perform a grouping operation based on an existing clustering run. If the clustering has not actually been performed yet then an exception will be raised.

setLinkage(linkage)

Set the current linkage based on the linkage name