GroupAlgorithm

This class provides methods for automatically partitioning nodes of a graph into groups.

Inheritance Hierarchy

GroupAlgorithm

Remarks

Note: Methods of this class work with instances of Graph. To partition an IGraph into clusters use one of the following classes instead:

Partitions can be defined using edge betweenness centrality, biconnectivity, k-means clustering, hierarchical clustering or community detection algorithms like label propagation or louvain modularity.

This class also contains methods that provide clustering-related information like the average clustering coefficient, the average weighted coefficient and the modularity of a graph network.

Definitions

Betweenness centrality is a measure for how often a node lies on a shortest path between each pair of nodes in the graph.
Biconnected graph is a graph that has no cut vertex or articulation point (i.e., a node whose removal disconnects the graph).
K-means clustering algorithm partitions the nodes of a graph into k-clusters based on their positions on the plane and a given distance metric.
Hierarchical clustering creates a hierarchy of clusters in a bottom-to-top approach based on some distance metric and linkage.
Label Propagation algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors.
Our implementation is based on the description in: "Near linear time algorithm to detect community structures in large-scale networks" by U. N. Raghavan, R. Albert, and S. Kumara in Phys. Rev. E 76, 036106, 2007
Louvain Modularity algorithm iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized.
Our implementation is based on the description in: "Fast unfolding of communities in large networks" by V.D. Blondel, J.L. Guillaume, R. Lambiotte and E. Lefebvre, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008 (12pp).
Average clustering coefficient is the average of all the local clustering coefficient values of a graph. The local clustering coefficient of a node depends on the number of edges between its neighbors. A value of 1 indicates that the node's neighbors form a clique (i.e., there is an edge between each pair of neighbors).
Average weighted clustering coefficient is the average of all the local clustering coefficient values of a graph where each local value of a node is weighted by the maximum possible number of edges between its neighbors.
Modularity is the fraction of edges that belong to a community minus the expected fraction if these edges were distributed at random, see https://en.wikipedia.org/wiki/Modularity_(networks).

Type Details

yfiles module: algorithms
yfiles-umd modules: All layout modules, view-layout-bridge
Legacy UMD name: yfiles.algorithms.Groups

Static Methods

biconnectedComponentGrouping

(graph: Graph, groupIDs: INodeMap) : number

This method partitions the graph by analyzing its biconnected components.

Remarks

Note: This method works with instances of Graph. To partition an IGraph into clusters by its biconnected components use BiconnectedComponentClustering instead.

Nodes will be grouped such that the nodes within each group are biconnected. Nodes that belong to multiple biconnected components will be assigned to exactly one of these components.

Complexity

O(graph.E() + graph.N())

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

groupIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs or `null` for isolated nodes with only self-loop edges or no edges at all

Returns

↪number: the resulting number of different groups

Biconnected components are defined for undirected graphs only. As a consequence, this algorithm ignores self-loops and isolated nodes with only self-loop edges or no edges at all are not assigned to any group, i.e. the groupID for such a node will be null.

edgeBetweennessClustering

(graph: Graph, clusterIDs: INodeMap, directed: boolean, minGroupCount: number, maxGroupCount: number, edgeCosts: IDataProvider) : number

Partitions the graph into groups using edge betweenness centrality.

Remarks

Note: This method works with instances of Graph. To partition an IGraph into clusters with edge-betweenness clustering use EdgeBetweennessClustering instead.

In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove. The clustering with the best quality reached during the process will be returned.

The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the overall computation time. The upper bound on the number of groups is graph.N(). Also, the number of returned groups is never smaller than the number of connected components of the graph.

Complexity

O(graph.E()) * O(edgeBetweenness)

In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount groups have been determined.

Preconditions

minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

clusterIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs

directed - boolean

true if the graph should be considered as directed, false otherwise

minGroupCount - number

the minimum number of groups that will be returned

maxGroupCount - number

the maximum number of groups that will be returned

edgeCosts - IDataProvider

the IDataProvider that holds a positive number cost or null if the edges of the graph are considered to be of equal cost

Domain	Edge
Values	number	a non-negative value that represents the cost for each edge or `null` if the edges of the graph are considered to be of equal cost

Returns

↪number: the resulting number of different groups

Throws

Exception({ name: 'ArgumentError' }): if minGroupCount > maxGroupCount or minGroupCount > graph.N() or maxGroupCount <= 0

edgeBetweennessClustering

(graph: Graph, clusterIDs: INodeMap, qualityTimeRatio: number, minGroupCount: number, maxGroupCount: number, refine: boolean) : number

Partitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman.

Remarks

Note: This method works with instances of Graph. To partition an IGraph into clusters with edge-betweenness clustering use EdgeBetweennessClustering instead.

In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove or if the requested maximum number of groups is found. The clustering with the best quality reached during the process is returned.

The algorithm includes several heuristic speed-up techniques available through the quality/time ratio. For the highest quality setting, it is used almost unmodified. The fast betweenness approximation of Brandes and Pich (Centrality Estimation in Large Networks) is employed for values around 0.5. Typically, this results in a tiny decrease in quality but a large speed-up and is the recommended setting. To achieve the lowest running time, a local betweenness calculation is used (Gregory: Local Betweenness for Finding Communities in Networks).

Complexity

O(graph.E()) * O(edgeBetweenness)

In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount groups have been determined.

Preconditions

minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

clusterIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs

qualityTimeRatio - number

a value between 0.0 (low quality, fast) and 1.0 (high quality, slow); the recommended value is 0.5

minGroupCount - number

the minimum number of groups that will be returned

maxGroupCount - number

the maximum number of groups that will be returned

refine - boolean

true if the algorithm refines the current grouping, false if the algorithm discards the current grouping

Returns

↪number: the resulting number of different groups

Throws

Exception({ name: 'ArgumentError' }): if minGroupCount > maxGroupCount or minGroupCount > graph.N() or maxGroupCount <= 0

getClusteringCoefficient

(graph: Graph, coefficientMap: INodeMap, directed: boolean) : number

Calculates the local clustering coefficient for each node and returns the average clustering coefficient.

Remarks

Note: This method works with instances of Graph. To measure the clustering coefficients in an IGraph use ClusteringCoefficient instead.

The clustering coefficient measures the degree to which the nodes of a network tend to cluster together, see https://en.wikipedia.org/wiki/Clustering_coefficient. More precisely, for a node n, the local clustering coefficient is the actual number of edges between the neighbors of n divided by the maximum possible number of such edges. Hence, it is always a number value between 0.0 and 1.0.

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

coefficientMap - INodeMap

a map that returns the clustering coefficient for each node

Domain	YNode
Values	number	a value between `0.0` and `1.0` that represents the local clustering coefficient

directed - boolean

whether or not the graph is directed

Returns

↪number: returns the average clustering coefficient

The algorithm ignores self-loops and parallel edges. Group nodes are handled like common nodes.

getModularity

(graph: Graph, communityIndex: IDataProvider, edgeWeight?: IDataProvider) : number

Computes the modularity of a given graph.

Remarks

The modularity value estimates the quality of the partition of the nodes into the given communities and is a value between [-1/2,1]. High modularity values denote that nodes with dense connections are placed in the same community, while nodes with sparse connections are assigned to different communities.

To compute the modularity, a IDataProvider communityIndex is needed that returns an integer value representing the community index for each node.

If no IDataProvider for the edge weights is given, the algorithm will do the calculation assuming that all edges have edge weight equals to 1.0. Note that in the case where, all edges have edge weights equal to 0.0, the modularity will be 0.0, too.

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

The input graph.

communityIndex - IDataProvider

The community index for each node (nodes with same index are associated with the same community).

Domain	YNode
Values	number	an value representing the index of the community of a node

edgeWeight - IDataProvider

The weight of the edges.

If no IDataProvider is given, an equal weight of 1.0 will be assumed for all edges.

Domain	Edge
Values	number	a value representing the weight of an edge

Returns

↪number: the modularity of a given graph

Throws

Exception({ name: 'ArgumentError' }): if no IDataProvider communityIndex for the community indices is specified

hierarchicalClustering

(graph: Graph, distances: INodeDistanceProvider, linkage: Linkage) : Dendrogram

Partitions the graph into clusters based on hierarchical clustering.

Remarks

Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.

The result is returned as a Dendrogram object which represents the result of the clustering algorithm as a binary tree structure. It can easily be traversed by starting from the root node and moving on to nodes of the next level via method getChildren.

Complexity

O(graph.N() ^ 3)

Parameters

options - Object
A map of options to pass to the method.

graph - Graph: the input graph
distances - INodeDistanceProvider: a given INodeDistanceProvider object that determines the distance between any two nodes
linkage - Linkage: one of the predefined linkage values

Returns

↪Dendrogram: a Dendrogram which represents the result of the clustering as a binary tree or null if the given graph is empty

Throws

Exception({ name: 'ArgumentError' }): if an unknown linkage is given

If the INodeDistanceProvider object returns a negative distance value for a pair of nodes, zero will be used instead.

hierarchicalClustering

(graph: Graph, clusterIDs: INodeMap, distances: INodeDistanceProvider, linkage: Linkage, cutOff: number) : number

Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given cut-off value.

Remarks

Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.

The result will be given based on the given cut-off value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.

Complexity

O(graph.N() ^ 3)

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

clusterIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs

distances - INodeDistanceProvider

a given INodeDistanceProvider object that determines the distance between any two nodes

linkage - Linkage

one of the predefined linkage values

cutOff - number

the cut-off value that determines where to cut the hierarchic tree into clusters

Returns

↪number: the resulting number of clusters

Throws

Exception({ name: 'ArgumentError' }): if an unknown linkage is used

The cut-off value should be greater that zero. If a negative value is given, zero will be used instead.

If the INodeDistanceProvider object returns a negative distance value for a pair of nodes, zero will be used instead.

hierarchicalClustering

(graph: Graph, maxCluster: number, clusterIDs: INodeMap, distances: INodeDistanceProvider, linkage: Linkage) : number

Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given maximum number of clusters.

Remarks

Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.

The result will be given based on the given maximum number of clusters value that is used for cutting the hierarchical tree at a point such that the number of remaining clusters equals to this value.

The maximum number of clusters needs to be greater than zero and less than the number of the nodes of the graph.

Complexity

O(graph.N() ^ 3)

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

maxCluster - number

the maximum number of clusters that determines where to cut the hierarchic tree into clusters

clusterIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs

distances - INodeDistanceProvider

a given INodeDistanceProvider object that determines the distance between any two graph nodes

linkage - Linkage

one of the predefined linkage values

Returns

↪number: the resulting number of clusters

Throws

Exception({ name: 'ArgumentError' }): if an unknown linkage is given or if the maximum number of clusters is less than or equal to zero or greater than the number of nodes of the graph

If the INodeDistanceProvider object returns a negative distance value for a pair of nodes, zero will be used instead.

kMeansClustering

(graph: Graph, clusterIDs: INodeMap, nodePositions: IDataProvider, distanceMetric: DistanceMetric, k: number, iterations?: number, centroids?: YPoint[]) : number

Partitions the graph into clusters using k-means clustering algorithm.

Remarks

Note: This method works with instances of Graph. To partition an IGraph using k-means clustering use KMeansClustering instead.

The nodes of the graph will be partitioned in k clusters based on their positions such that their distance from the cluster's mean (centroid) is minimized.

The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.

Complexity

O(graph.N() * k * d * I) where k is the number of clusters, I the maximum number of iterations and d the dimension of the points

ParametersOptions Overload

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

clusterIDs - INodeMap

the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node

Domain	YNode
Values	number	a non-negative value that represents the ID of the cluster to which each node belongs

nodePositions - IDataProvider

the IDataProvider that holds a point representing the current position of each node in the graph

Domain	YNode
Values	YPoint	a point representing the current position of each node in the graph

distanceMetric - DistanceMetric

one of the predefined distance metrics

k - number

the number of clusters

iterations - number

the maximum number of iterations performed by the algorithm for convergence

centroids - YPoint[]

the initial centroids

Returns

↪number: the number of resulting (non-empty) clusters

Throws

Exception({ name: 'ArgumentError' }): if the given distance metric is not supported

If the given number of clusters is greater than the number of nodes of the graph, number k will be set equal to 2.

If the number of given centroids is smaller than k or if no centroids are given, random initial centroids will be assigned for all clusters.

labelPropagation

(graph: Graph, finalLabel: INodeMap, initialLabel?: IDataProvider, nodeWeight?: IDataProvider, edgeWeight?: IDataProvider, edgeDirectedness?: IDataProvider) : number

Detects the communities in the specified input graph by applying a label propagation algorithm.

Remarks

Note: This method works with instances of Graph. To partition an IGraph into clusters using label propagation use LabelPropagationClustering instead.

This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them. After applying the algorithm, all nodes with the same label (i.e. nodes for which INodeMap finalLabel returns the same integer value) belong to the same community.

For a given node and neighbor, the weight of the neighbors label is nodeWeight(neighbor) * edgeWeight((node, neighbor)). In addition, the overall weight of a specific label is the sum of such weights for all neighbors with matching label.

At the start of the algorithm, each node is associated with the label returned by method getInt of initialLabel. Negative values are ignored, i.e., at the start there may be nodes without label. This implies that there may be results, where some of the nodes are not associated with labels, too. For such nodes method getInt of finalLabel returns -1. If no initialLabel provider is given each node starts with a unique label value.

If IDataProvider edgeDirectedness is specified, the algorithm only considers a subset of the neighbors when determining the label of a node. More precisely, it only considers the neighbors connected by edges to a node n for which

the edge's directedness is greater than 0.0 and the edge is incoming (i.e. n is the edge's target),
the edge's directedness is smaller than 0.0 and the edge is outgoing (i.e. n is the edge's source),
or the edge's directedness is 0.0 (i.e. the edge is considered to be undirected).

ParametersOptions Overload

options - Object
A map of options to pass to the method.

graph - Graph

the undirected input graph

finalLabel - INodeMap

the INodeMap that returns the integer labels of each node after applying the algorithm where nodes without a label receive the value -1

Domain	YNode
Values	number	an value representing the label after applying the algorithm (nodes without a label receive the value `-1`)

initialLabel - IDataProvider

IDataProvider that stores the initial integer labels of each node (negative values indicate that a node is not associated with an initial label).

If no provider is specified, each node starts with a unique label.

Domain	YNode
Values	number	a positive value representing the original label (use negative values if a node should not be associated with an initial label)

nodeWeight - IDataProvider

IDataProvider that stores the weights of the nodes.

If no provider is specified, the algorithm assumes that all nodes have the weight 1.0.

Domain	YNode
Values	number	the weight of a node

edgeWeight - IDataProvider

IDataProvider that stores the weights of the edges.

If no provider is specified, the algorithm assumes that all edges have the weight 1.0.

Domain	Edge
Values	number	the weight of an edge

edgeDirectedness - IDataProvider

IDataProvider that stores the directedness of the edges.

If no provider is specified, the algorithm assumes that all edges are undirected (i.e. have directedness 0.0).

Domain	Edge
Values	number	the directedness of an edge

Returns

↪number: the number of resulting communities

A node with a self-loop is considered to be a neighbor of itself. Hence, the node's label is also included when calculating the most frequent label among the neighbors. If you want to prevent this, you can specify suitable edge weights. Setting the weight of self-loops to 0.0 results in ignoring them.

If none of the neighbors of a node is associated with a label with positive weight, the node keeps its current label.

If no edgeDirectedness provider is specified, the algorithm assumes that all edges are undirected (i.e. have directedness 0.0).

louvainModularity

(graph: Graph, communityIndex: INodeMap, edgeWeight?: IDataProvider) : number

Detects the communities in the specified input graph by applying the louvain modularity method.

Remarks

Note: This method works with instances of Graph. To partition an IGraph into clusters using louvain modularity use LouvainModularityClustering instead.

The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.

The community index of a node corresponds to the index of the associated community. If there are nodes that are not associated with a community, their index is -1.

If no IDataProvider for the edge weights is given, the algorithm will assume that all edges have edge weights equal to 1.0.

Parameters

options - Object
A map of options to pass to the method.

graph - Graph

the input graph

communityIndex - INodeMap

INodeMap that stores, for each node, the integer community index of the node.

Domain	YNode
Values	number	an value representing the index of the community of a node

edgeWeight - IDataProvider

IDataProvider that stores the weights of the edges.

If nothing is given, the algorithm will assume that all edges have edge weights equal to 1.0.

Domain	Edge
Values	number	the weight of an edge

Returns

↪number: the number of resulting communities

GroupAlgorithm

Remarks

Definitions

Type Details

Static Methods

biconnectedComponentGrouping

Remarks

Complexity

Parameters

Returns

edgeBetweennessClustering

Remarks

Complexity

Preconditions

Parameters

Returns

Throws

edgeBetweennessClustering

Remarks

Complexity

Preconditions

Parameters

Returns

Throws

getClusteringCoefficient

Remarks

Parameters

Returns

getModularity

Remarks

Parameters

Returns

Throws

See Also

hierarchicalClustering

Remarks

Complexity

Parameters

Returns

Throws

hierarchicalClustering

Remarks

Complexity

Parameters

Returns

Throws

hierarchicalClustering

Remarks

Complexity

Parameters

Returns

Throws

kMeansClustering

Remarks

Complexity

ParametersOptions Overload

Returns

Throws

labelPropagation

Remarks

ParametersOptions Overload

Returns

louvainModularity

Remarks

Parameters

Returns