BioloMICS logo
×
BioloMICS menu

Clustering algorithms

 
In agglomerative clustering, agglomeration of objects (records of the database or OTUs) starts with the most closely related pairs of objects and gradually links to less closely related objects. When objects are associated in one cluster, no further dislocation is possible. This is a disadvantage of this method, which is compensated by the widely used and appreciated graphical tree-like representation.
 
Several agglomerative clustering algorithms are available. The most commonly used are the Unweighted Pairgroup Method using Arithmetic mean (UPGMA), the Weighted Pairgroup Method using Arithmetic mean (WPGMA), the Complete Linkage and the Single Linkage methods. These methods differ in the way the distances are calculated when new members of a growing cluster are included. In single linkage, when an object or a group of objects i is to be linked with another cluster j, the smallest distance between a member of group i and a member of group j, is the distance at which both groups are connected. In complete linkage, it is the distance between the most distant members of both groups and in UPGMA it is the average distances between all members of both groups. In UPGMA, no weighting related to the relative size of the groups to be joined is performed, contrarily to WPGMA. Neighbor Joining (NJ) is another agglomerative clustering method aimed at the construction of phylogenetic trees. The NJ method tries to find the shortest possible tree. It is usually used as an alternative to parsimony methods, which can be slow and computationally intensive.
 
The following methods can be used in BioloMICS:
  • UPGMA
  • WPGMA
  • Single linkage
  • Complete linkage
  • Neighbor Joining
  • Ward's minimum variance
  • UPGMC
  • WPGMC
  • Lance & Williams flexible clustering
 
Type of methods
Methods
Pros & Cons
Use
Effect on space
Source
Hierarchical agglomeration: linkage clustering
 
 
Pairwise relationships among the objects are known.
*
 
 
Single linkage
Computation simple; contraction of space (chaining); combinatorial method.
Good complement to ordination
Contracting
*
 
Complete linkage
Dense nuclei of objects; space expansion: many objects cluster at low similarity: arbitrary rules to resolve conflicts; combinatorial method.
To increase the contrast among clusters.
*
 
Hierarchical agglomeration: average clustering
 
 
Preservation of reference space; pairwise relationships between objects are lost; combinatorial method.
*
 
 
Unweighted arithmetic average (UPGMA)
Fusion of clusters when the similarity reaches the mean inter-cluster similarity value.
For a collection of objects obtained by simple random or systematic sampling.
Conserving
*
 
Weighted arithmetic average (WPGMA)
Same, with adjustment for group sizes.
Preferable to the previous method in all other sampling situations.
Conserving
*
 
Unweighted centroid (UPGMC)
Fusion of clusters with closest centroids; may produce reversals.
For simple random or systematic samples of objects.
Conserving
*
 
Weighted centroid (WPGMC)
Same, with adjustment for group sizes; may produce reversals.
Preferable to the previous method in all other sampling situations.
Conserving
*
 
Ward’s minimum variance
Minimizes the within-group sum of squares.
When looking for hyperspherical clusters in space A.
Conserving
*
Hierarchical agglomeration: flexible clustering
Lance & Williams flexible clustering
The algorithm allows contraction, conservation, or dilation of space A; pairwise relationships between objects are lost; combinatorial method.
Contracting if Beta = 1 Conserving if Beta = -0.25 Dilating if Beta = -1
*
 
hierarchical agglomeration: Neighbor-joining clustering
Neighbor Joining
 
For trees based on DNA or protein sequence data
Neighbor Joining is based on the minimum evolution criterion for phylogenetic trees, i.e. the topology that gives the least total branch length is preferred at each step of the algorithm.
**
* Legendre, P. & L. Legendre. 1998. Numerical ecology. Second edition, Elsevier.
** Wikipedia
 
Agglomerative clustering methods are used for a wide range of applications, including the identification and the classification of objects (strains, species, etc.) or characters. It is important to note that agglomerative clustering may be increasingly misleading when the number of objects or characters to be grouped increases. Please pay attention to tree-like and other graphical representations since they represent 2 or 3-dimensional views and summaries of n-dimensional realities (i.e., the original matrix).