HierarchicalClustering

Partitions the graph into clusters using hierarchical clustering.

Inheritance Hierarchy

HierarchicalClustering

Remarks

Hierarchical clustering creates a hierarchy of clusters in a bottom-to-top approach based on some distance metric and linkage.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which initially each node comprises its own cluster. At each step pairs of clusters are merged while moving up the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.

Predefined metrics are available as constants on HierarchicalClustering:

Internally the HierarchicalClusteringResult stores a dendrogram which represents the result of the clustering algorithm as a binary tree structure. Based on the dendrogram the graph's nodes are partitioned into clusters according to either clusterCount or cutoff. Since both methods to compute the clusters only require the dendrogram, the HierarchicalClusteringResult offers methods to create a new clustering based on either criterion as long as either the graph or the subgraph specification did not change in between (which would require re-computing the dendrogram).

Other Clustering Algorithms

@PRODUCT@ supports a number of other clustering algorithms:

KMeansClustering – partitions the graph into clusters based on the distance between nodes and the cluster midpoints.
EdgeBetweennessClustering – partitions the graph into clusters based on edge-betweenness centrality.
BiconnectedComponentClustering – partitions the graph into clusters based on its biconnected components.
LouvainModularityClustering – partitions the graph into clusters by applying the Louvain modularity method.
LabelPropagationClustering – partitions the graph into clusters by applying a label propagation algorithm.

Examples

Calculating hierarchical clusters of a graph

// prepare the hierarchical clustering algorithm
const algorithm = new HierarchicalClustering()
// run the algorithm
const result = algorithm.run(graph)

// highlight the nodes of the clusters with different styles
for (const node of graph.nodes) {
  const componentId = result.nodeClusterIds.get(node)!
  graph.setStyle(node, clusterStyles.get(componentId)!)
}

Type Details

yFiles module: view-layout-bridge

Constructors

HierarchicalClustering

()

ParametersOptions Overload

options - Object
A map of options to pass to the method.

metric - function(INode, INode):number

A function which returns the distance between any two nodes. This option sets the metric property on the created object.

Signature Details

function(arg1: INode, arg2: INode) : number

Encapsulates a method that has two parameters and returns a value of the type specified by the TResult parameter.

Parameters

arg1 - INode: The first parameter of the method that this delegate encapsulates.
arg2 - INode: The second parameter of the method that this delegate encapsulates.

Returns

number: The return value of the method that this delegate encapsulates.

linkage - HierarchicalClusteringLinkage

The method for determining the distance between clusters. This option sets the linkage property on the created object.

clusterCount - number

The number of clusters. This option sets the clusterCount property on the created object.

cutoff - number

The cut-off value. This option sets the cutoff property on the created object.

subgraphNodes - ItemCollection<INode>

The collection of nodes which define a subset of the graph for the algorithms to work on. This option either sets the value directly or recursively sets properties to the instance of the subgraphNodes property on the created object.

subgraphEdges - ItemCollection<IEdge>

The collection of edges which define a subset of the graph for the algorithms to work on. This option either sets the value directly or recursively sets properties to the instance of the subgraphEdges property on the created object.

Properties

clusterCount

: number

Gets or sets the number of clusters.

Remarks

If set to a negative number or 0, the cutoff value is used to calculate the clusters. If both values are negative, a number of 1 is used (one cluster containing all nodes in the graph).

A new result for a different number of clusters can be obtained efficiently with changeClusterCount on a valid HierarchicalClusteringResult, as long as the graph, linkage, and metric are still the same.

cutoff

: number

Gets or sets the cut-off value.

Remarks

The clusters are calculated based on the given cut-off value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.

This value will only be used if clusterCount is 0 or negative.

This value must be non-negative, otherwise the result will be a single cluster which contains all nodes of the graph.

A new result for a different cut-off value can be obtained efficiently with changeCutoff on a valid HierarchicalClusteringResult, as long as the graph, linkage, and metric are still the same.

linkage

: HierarchicalClusteringLinkage

Gets or sets the method for determining the distance between clusters.

Remarks

The default is SINGLE.

metric

: function(INode, INode):number

Gets or sets a function which returns the distance between any two nodes.

Remarks

The result of the function must not be negative.

Predefined common metrics are available as constants on HierarchicalClustering:

EUCLIDEAN distance (the default)
Squared Euclidean distance
MANHATTAN distance

Signature Details

function(arg1: INode, arg2: INode) : number

Encapsulates a method that has two parameters and returns a value of the type specified by the TResult parameter.

Parameters

arg1 - INode: The first parameter of the method that this delegate encapsulates.
arg2 - INode: The second parameter of the method that this delegate encapsulates.

Returns

number: The return value of the method that this delegate encapsulates.

subgraphEdges

: ItemCollection<IEdge>

Gets or sets the collection of edges which define a subset of the graph for the algorithms to work on.

Remarks

If nothing is set, all edges of the graph will be processed.

If only the excludes are set, all edges in the graph except those provided in the excludes are processed.

Note that edges which start or end at nodes which are not in the subgraphNodes are automatically not considered by the algorithm.

ItemCollection<T> instances may be shared among algorithm instances and will be (re-)evaluated upon (re-)execution of the algorithm.

Examples

Calculating hierarchical clusters on a subset of the graph

// prepare the hierarchical clustering algorithm
const algorithm = new HierarchicalClustering({
  // Ignore edges without target arrow heads
  subgraphEdges: {
    excludes: (edge: IEdge): boolean =>
      edge.style instanceof PolylineEdgeStyle &&
      edge.style.targetArrow instanceof Arrow &&
      edge.style.targetArrow.type === ArrowType.NONE,
  },
})
// run the algorithm
const result = algorithm.run(graph)

// highlight the nodes of the clusters with different styles
for (const node of graph.nodes) {
  const componentId = result.nodeClusterIds.get(node)!
  graph.setStyle(node, clusterStyles.get(componentId)!)
}

The edges provided here must be part of the graph which is passed to the run method.

subgraphNodes

: ItemCollection<INode>

Gets or sets the collection of nodes which define a subset of the graph for the algorithms to work on.

Remarks

If nothing is set, all nodes of the graph will be processed.

If only the excludes are set, all nodes in the graph except those provided in the excludes are processed.

ItemCollection<T> instances may be shared among algorithm instances and will be (re-)evaluated upon (re-)execution of the algorithm.

Examples

Calculating hierarchical on a subset of the graph

// prepare the hierarchical clustering algorithm
const algorithm = new HierarchicalClustering({
  subgraphNodes: {
    // only consider elliptical nodes in the graph
    includes: (node: INode): boolean =>
      node.style instanceof ShapeNodeStyle &&
      node.style.shape === ShapeNodeShape.ELLIPSE,
    // but ignore the first node, regardless of its shape
    excludes: graph.nodes.first()!,
  },
})
// run the algorithm
const result = algorithm.run(graph)

// highlight the nodes of the clusters with different styles
for (const node of graph.nodes) {
  const componentId = result.nodeClusterIds.get(node)!
  graph.setStyle(node, clusterStyles.get(componentId)!)
}

The nodes provided here must be part of the graph which is passed to the run method.

Methods

run

(graph: IGraph) : HierarchicalClusteringResult

Partitions the graph into clusters using hierarchical clustering.

Remarks

The returned HierarchicalClusteringResult can be used to efficiently create different clusterings of the same graph based on the metric and linkage as long as the graph hasn't changed (and would thus invalidate the computed dendrogram.

Complexity

O(|V|³)

Parameters

options - Object
A map of options to pass to the method.

graph - IGraph: The input graph to run the algorithm on.

Returns

↪HierarchicalClusteringResult: A HierarchicalClusteringResult containing the computed clusters and dendrogram.

Throws

Exception({ name: 'InvalidOperationError' }): If the algorithm can't create a valid result due to an invalid graph structure or wrongly configured properties.

The result obtained from this algorithm is a snapshot which is no longer valid once the graph has changed, e.g. by adding or removing nodes or edges.

Constants

EUCLIDEAN

: function(INode, INode):number

A predefined metric for Euclidean distance.

Signature Details

function(arg1: INode, arg2: INode) : number

Encapsulates a method that has two parameters and returns a value of the type specified by the TResult parameter.

Parameters

arg1 - INode: The first parameter of the method that this delegate encapsulates.
arg2 - INode: The second parameter of the method that this delegate encapsulates.

Returns

number: The return value of the method that this delegate encapsulates.

EUCLIDEAN_SQUARED

: function(INode, INode):number

A predefined metric for squared Euclidean distance.

Signature Details

function(arg1: INode, arg2: INode) : number

Encapsulates a method that has two parameters and returns a value of the type specified by the TResult parameter.

Parameters

arg1 - INode: The first parameter of the method that this delegate encapsulates.
arg2 - INode: The second parameter of the method that this delegate encapsulates.

Returns

number: The return value of the method that this delegate encapsulates.

MANHATTAN

: function(INode, INode):number

A predefined metric for Manhattan distance.

Signature Details

function(arg1: INode, arg2: INode) : number

Encapsulates a method that has two parameters and returns a value of the type specified by the TResult parameter.

Parameters

arg1 - INode: The first parameter of the method that this delegate encapsulates.
arg2 - INode: The second parameter of the method that this delegate encapsulates.

Returns

number: The return value of the method that this delegate encapsulates.