A class which handles clustering of canvas fingerprints. This
maintains a list of the possible linkage types and keeps track of the
current type of linkage specified
|
__init__(self,
logger)
Initialize the instance of the cluster class |
|
|
|
__del__(self)
Destructor: cleanup and delete the temporary files |
|
|
|
_deleteIfExists(self,
filename)
A utility function which is used by the destructor which checks if a
file exists and removes it if it does. |
|
|
|
getDescription(self)
Returns a string representing a summary of the current linkage
settings |
|
|
|
debug(self,
output)
Wrapper for debug logging, just to simplify logging |
|
|
|
setLinkage(self,
linkage)
Set the current linkage based on the linkage name |
|
|
|
getCurrentLinkage(self)
Returns the current linkage definition |
|
|
|
_createTempFile(self,
temp_file_name)
If temp_file_name exists then remove it. |
|
|
|
clusterDM(self,
dm_file_name)
Cluster the distance matrix file given in dm_file_name, using
similarity settings encapsulated in dp_sim. |
|
|
|
generateDM(self,
dm_file_name,
fp_file,
fp_gen,
fp_sim)
Generate a distance matrix of the specified filename from the finger
print file fp_file. |
|
|
|
clusterFP(self,
fp_file,
fp_gen,
fp_sim)
Cluster the fingerprints contained in fp_file. |
|
|
|
group(self,
num_clusters)
Perform a grouping operation based on an existing clustering run. |
|
|
|
getMatrixTime(self)
Returns the time required for distance matrix generation |
|
|
|
getClusterTime(self)
Returns the time required for clustering |
|
|
|
getGroupTime(self)
Returns the time required for group creation |
|
|
|
getClusteringMap(self)
Once grouping has been done this method may be called to return a
dictionary where the keys represent the original fingerprint IDs
(usually the position of the structure in the file or the entry ID)
and the values are the cluster this structure belongs to |
|
|
|
getClusterContents(self)
Once grouping has been done this method may be called to return a
dictionary where the keys represent the cluster number and the values
are a list of ID (usually position in the file or entry ids) |
|
|
|
_readGroupFile(self)
A private method which reads the group file and extracts the cluster
membership and per-cluster statistics |
|
|
|
getDistanceToCentroid(self,
item)
For a given item in the most recent cluster grouping return the
distance to the centroid of the cluster which contains this item |
|
|
|
getIsNearestToCentroid(self,
item)
For a given item in the most recent cluster grouping return a boolean
value which indicates whether the item is nearest the centroid |
|
|
|
getIsFarthestFromCentroid(self,
item)
For a given item in the most recent cluster grouping return a boolean
value which indicates whether the item is nearest the centroid |
|
|
|
getMaxDistanceFromCentroid(self,
item)
For a given item in the most recent cluster grouping return the
maximum distance to the centroid for any item in the cluster |
|
|
|
getAverageDistanceFromCentroid(self,
item)
For a given item in the most recent cluster grouping return the
average distance to the centroid for any item in the cluster |
|
|
|
getClusterVariance(self,
item)
For a given item return the variance of the cluster which that item
belongs to. |
|
|
|
|
|
_readClusterStatistics(self)
A private method which reads the cluster statistics from the stat
file and fills up the internal lists |
|
|
|
getNumberOfClustersList(self)
Returns the number of clusters at each level |
|
|
|
getRSquaredList(self)
Returns the r-squared value at each clustering level |
|
|
|
getSemiPartialRSquaredList(self)
Returns the semi-partial R-squared value at each clustering level |
|
|
|
getKelleyPenaltyList(self)
Returns the Kelley Penalty value at each clustering level |
|
|
|
getMergeDistanceList(self)
Returns the merge distance value at each clustering level |
|
|
|
getSeparationRatioList(self)
Returns the separation ratio - calculated from the merge distance of |
|
|
|
_getLine(self,
x1,
x2,
y1,
y2)
A private function which takes the input parameters and returns
[[x1,x2],[y1,y2]]. |
|
|
|
getDendrogramData(self)
Returns a tuple with 1) a list of line positions, each in the form
[x1,x2][y1,y2] each one of which defines a line segment to be plotted
in a dendrogram 2) a list of x-axis tick positions 3) a list of
x-axis tick labels |
|
|
|
getDistanceMatrixFile(self)
Returns the name of the distance matrix file used in the most recent
clustering |
|
|
|
getClusterOrderMap(self,
num_clusters)
Returns a dictionary where the keys are the item labels and the
values represent the index it would have in the grouping which places
the items in cluster order |
|
|
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|