Clustering algorithms
In agglomerative clustering, agglomeration of objects (records of the database or OTUs) starts with the most closely related pairs of objects and gradually links to less closely related objects. When objects are associated in one cluster, no further dislocation is possible. This is a disadvantage of this method, which is compensated by the widely used and appreciated graphical tree-like representation.
Several agglomerative clustering algorithms are available. The most commonly used are the Unweighted Pairgroup Method using Arithmetic mean (UPGMA), the Weighted Pairgroup Method using Arithmetic mean (WPGMA), the Complete Linkage and the Single Linkage methods. These methods differ in the way the distances are calculated when new members of a growing cluster are included. In single linkage, when an object or a group of objects i is to be linked with another cluster j, the smallest distance between a member of group i and a member of group j, is the distance at which both groups are connected. In complete linkage, it is the distance between the most distant members of both groups and in UPGMA it is the average distances between all members of both groups. In UPGMA, no weighting related to the relative size of the groups to be joined is performed, contrarily to WPGMA. Neighbor Joining (NJ) is another agglomerative clustering method aimed at the construction of phylogenetic trees. The NJ method tries to find the shortest possible tree. It is usually used as an alternative to parsimony methods, which can be slow and computationally intensive.
The following methods can be used in BioloMICS:
-
UPGMA
-
WPGMA
-
Single linkage
-
Complete linkage
-
Neighbor Joining
-
Ward's minimum variance
-
UPGMC
-
WPGMC
-
Lance & Williams flexible clustering
|
* Legendre, P. & L. Legendre. 1998. Numerical ecology. Second edition, Elsevier.
** Wikipedia
Agglomerative clustering methods are used for a wide range of applications, including the identification and the classification of objects (strains, species, etc.) or characters. It is important to note that agglomerative clustering may be increasingly misleading when the number of objects or characters to be grouped increases. Please pay attention to tree-like and other graphical representations since they represent 2 or 3-dimensional views and summaries of n-dimensional realities (i.e., the original matrix).