Package schrodinger :: Package application :: Package canvas :: Module cluster :: Class CanvasFingerprintCluster
[hide private]
[frames] | no frames]

Class CanvasFingerprintCluster

object --+
         |
        CanvasFingerprintCluster
Known Subclasses:

A class which handles clustering of canvas fingerprints. This maintains a list of the possible linkage types and keeps track of the current type of linkage specified

Instance Methods [hide private]
 
__init__(self, logger)
Initialize the instance of the cluster class
 
__del__(self)
Destructor: cleanup and delete the temporary files
 
_deleteIfExists(self, filename)
A utility function which is used by the destructor which checks if a file exists and removes it if it does.
 
getDescription(self)
Returns a string representing a summary of the current linkage settings
 
debug(self, output)
Wrapper for debug logging, just to simplify logging
 
setLinkage(self, linkage)
Set the current linkage based on the linkage name
 
getCurrentLinkage(self)
Returns the current linkage definition
 
_createTempFile(self, temp_file_name)
If temp_file_name exists then remove it.
 
clusterDM(self, dm_file_name)
Cluster the distance matrix file given in dm_file_name, using similarity settings encapsulated in dp_sim.
 
generateDM(self, dm_file_name, fp_file, fp_gen, fp_sim)
Generate a distance matrix of the specified filename from the finger print file fp_file.
 
clusterFP(self, fp_file, fp_gen, fp_sim)
Cluster the fingerprints contained in fp_file.
 
group(self, num_clusters)
Perform a grouping operation based on an existing clustering run.
 
getMatrixTime(self)
Returns the time required for distance matrix generation
 
getClusterTime(self)
Returns the time required for clustering
 
getGroupTime(self)
Returns the time required for group creation
 
getClusteringMap(self)
Once grouping has been done this method may be called to return a dictionary where the keys represent the original fingerprint IDs (usually the position of the structure in the file or the entry ID) and the values are the cluster this structure belongs to
 
getClusterContents(self)
Once grouping has been done this method may be called to return a dictionary where the keys represent the cluster number and the values are a list of ID (usually position in the file or entry ids)
 
_readGroupFile(self)
A private method which reads the group file and extracts the cluster membership and per-cluster statistics
 
getDistanceToCentroid(self, item)
For a given item in the most recent cluster grouping return the distance to the centroid of the cluster which contains this item
 
getIsNearestToCentroid(self, item)
For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid
 
getIsFarthestFromCentroid(self, item)
For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid
 
getMaxDistanceFromCentroid(self, item)
For a given item in the most recent cluster grouping return the maximum distance to the centroid for any item in the cluster
 
getAverageDistanceFromCentroid(self, item)
For a given item in the most recent cluster grouping return the average distance to the centroid for any item in the cluster
 
getClusterVariance(self, item)
For a given item return the variance of the cluster which that item belongs to.
 
getBestNumberOfClusters(self)
The cluster statistics file contains information about each clustering level.
 
_readClusterStatistics(self)
A private method which reads the cluster statistics from the stat file and fills up the internal lists
 
getNumberOfClustersList(self)
Returns the number of clusters at each level
 
getRSquaredList(self)
Returns the r-squared value at each clustering level
 
getSemiPartialRSquaredList(self)
Returns the semi-partial R-squared value at each clustering level
 
getKelleyPenaltyList(self)
Returns the Kelley Penalty value at each clustering level
 
getMergeDistanceList(self)
Returns the merge distance value at each clustering level
 
getSeparationRatioList(self)
Returns the separation ratio - calculated from the merge distance of
 
_getLine(self, x1, x2, y1, y2)
A private function which takes the input parameters and returns [[x1,x2],[y1,y2]].
 
getDendrogramData(self)
Returns a tuple with 1) a list of line positions, each in the form [x1,x2][y1,y2] each one of which defines a line segment to be plotted in a dendrogram 2) a list of x-axis tick positions 3) a list of x-axis tick labels
 
getDistanceMatrixFile(self)
Returns the name of the distance matrix file used in the most recent clustering
 
getClusterOrderMap(self, num_clusters)
Returns a dictionary where the keys are the item labels and the values represent the index it would have in the grouping which places the items in cluster order

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  LINKAGE_TYPES = ['Single', 'Complete', 'Average', 'Centroid', ...
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, logger)
(Constructor)

 

Initialize the instance of the cluster class

Overrides: object.__init__

_createTempFile(self, temp_file_name)

 

If temp_file_name exists then remove it. Create a new temporary file name and return it:

clusterDM(self, dm_file_name)

 

Cluster the distance matrix file given in dm_file_name, using similarity settings encapsulated in dp_sim. The value returned is the cluster strain. The dm_file_name should point to a CSV file containing the matrix

generateDM(self, dm_file_name, fp_file, fp_gen, fp_sim)

 

Generate a distance matrix of the specified filename from the finger print file fp_file. The fp_gen and fp_sim objects encapsulate the current fingerprint and similarity settings

clusterFP(self, fp_file, fp_gen, fp_sim)

 

Cluster the fingerprints contained in fp_file. The bitsize will be taken from the CanvasFingerpintGenerator(). The similarity metric will be taken from the CanvasFingerprintSimilarity object fp_sim This function returns the 'strain' reported by the clustering

group(self, num_clusters)

 

Perform a grouping operation based on an existing clustering run. If the clustering has not actually been performed yet then an exception will be raised.

getBestNumberOfClusters(self)

 

The cluster statistics file contains information about each clustering level. This function returns the number of clusters at which the Kelley function has a minimum

_getLine(self, x1, x2, y1, y2)

 

A private function which takes the input parameters and returns [[x1,x2],[y1,y2]]. This is used in dendrogram generation


Class Variable Details [hide private]

LINKAGE_TYPES

Value:
['Single',
 'Complete',
 'Average',
 'Centroid',
 'McQuitty',
 'Ward',
 'Weighted Centroid',
 'Flexible Beta',
...