This class provides methods for automatically partitioning nodes of a graph into groups.
Remarks
Note: Methods of this class work with instances of Graph. To partition an IGraph into clusters use one of the following classes instead:
- EdgeBetweennessClustering
- BiconnectedComponentClustering
- KMeansClustering
- HierarchicalClustering
- LouvainModularityClustering
- LabelPropagationClustering
- ClusteringCoefficient
Partitions can be defined using edge betweenness centrality, biconnectivity, k-means clustering, hierarchical clustering or community detection algorithms like label propagation or louvain modularity.
This class also contains methods that provide clustering-related information like the average clustering coefficient, the average weighted coefficient and the modularity of a graph network.
Definitions
- Betweenness centrality is a measure for how often a node lies on a shortest path between each pair of nodes in the graph.
- Biconnected graph is a graph that has no cut vertex or articulation point (i.e., a node whose removal disconnects the graph).
- K-means clustering algorithm partitions the nodes of a graph into
k
-clusters based on their positions on the plane and a given distance metric. - Hierarchical clustering creates a hierarchy of clusters in a bottom-to-top approach based on some distance metric and linkage.
- Label Propagation algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors.
Our implementation is based on the description in: "Near linear time algorithm to detect community structures in large-scale networks" by U. N. Raghavan, R. Albert, and S. Kumara in Phys. Rev. E 76, 036106, 2007
- Louvain Modularity algorithm iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized.
Our implementation is based on the description in: "Fast unfolding of communities in large networks" by V.D. Blondel, J.L. Guillaume, R. Lambiotte and E. Lefebvre, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008 (12pp).
- Average clustering coefficient is the average of all the local clustering coefficient values of a graph. The local clustering coefficient of a node depends on the number of edges between its neighbors. A value of
1
indicates that the node's neighbors form a clique (i.e., there is an edge between each pair of neighbors). - Average weighted clustering coefficient is the average of all the local clustering coefficient values of a graph where each local value of a node is weighted by the maximum possible number of edges between its neighbors.
- Modularity is the fraction of edges that belong to a community minus the expected fraction if these edges were distributed at random, see https://en.wikipedia.org/wiki/Modularity_(networks).
Type Details
- yfiles module
- algorithms
- yfiles-umd modules
- All layout modules, view-layout-bridge
- Legacy UMD name
- yfiles.algorithms.Groups
Static Methods
This method partitions the graph by analyzing its biconnected components.
Remarks
Note: This method works with instances of Graph. To partition an IGraph into clusters by its biconnected components use BiconnectedComponentClustering instead.
Nodes will be grouped such that the nodes within each group are biconnected. Nodes that belong to multiple biconnected components will be assigned to exactly one of these components.
Complexity
O(graph.E() + graph.N())
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- groupIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs or null
for isolated nodes with only self-loop edges or no edges at all
Returns
- ↪number
- the resulting number of different groups
groupID
for such a node will be null
.edgeBetweennessClustering
(graph: Graph, clusterIDs: INodeMap, directed: boolean, minGroupCount: number, maxGroupCount: number, edgeCosts: IDataProvider) : numberPartitions the graph into groups using edge betweenness centrality.
Remarks
Note: This method works with instances of Graph. To partition an IGraph into clusters with edge-betweenness clustering use EdgeBetweennessClustering instead.
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove. The clustering with the best quality reached during the process will be returned.
The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the overall computation time. The upper bound on the number of groups is graph.N()
. Also, the number of returned groups is never smaller than the number of connected components of the graph.
Complexity
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount
groups have been determined.
Preconditions
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- clusterIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs - directed - boolean
true
if the graph should be considered as directed,false
otherwise- minGroupCount - number
- the minimum number of groups that will be returned
- maxGroupCount - number
- the maximum number of groups that will be returned
- edgeCosts - IDataProvider
- the IDataProvider that holds a positive number cost or
null
if the edges of the graph are considered to be of equal costDomain Edge Values number a non-negative value that represents the cost for each edge or null
if the edges of the graph are considered to be of equal cost
Returns
- ↪number
- the resulting number of different groups
Throws
- Exception({ name: 'ArgumentError' })
- if
minGroupCount > maxGroupCount
orminGroupCount > graph.N()
ormaxGroupCount <= 0
edgeBetweennessClustering
(graph: Graph, clusterIDs: INodeMap, qualityTimeRatio: number, minGroupCount: number, maxGroupCount: number, refine: boolean) : numberPartitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman.
Remarks
Note: This method works with instances of Graph. To partition an IGraph into clusters with edge-betweenness clustering use EdgeBetweennessClustering instead.
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove or if the requested maximum number of groups is found. The clustering with the best quality reached during the process is returned.
The algorithm includes several heuristic speed-up techniques available through the quality/time ratio. For the highest quality setting, it is used almost unmodified. The fast betweenness approximation of Brandes and Pich (Centrality Estimation in Large Networks) is employed for values around 0.5
. Typically, this results in a tiny decrease in quality but a large speed-up and is the recommended setting. To achieve the lowest running time, a local betweenness calculation is used (Gregory: Local Betweenness for Finding Communities in Networks).
The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the overall computation time. The upper bound on the number of groups is graph.N()
. Also, the number of returned groups is never smaller than the number of connected components of the graph.
Complexity
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount
groups have been determined.
Preconditions
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- clusterIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs - qualityTimeRatio - number
- a value between
0.0
(low quality, fast) and1.0
(high quality, slow); the recommended value is0.5
- minGroupCount - number
- the minimum number of groups that will be returned
- maxGroupCount - number
- the maximum number of groups that will be returned
- refine - boolean
true
if the algorithm refines the current grouping,false
if the algorithm discards the current grouping
Returns
- ↪number
- the resulting number of different groups
Throws
- Exception({ name: 'ArgumentError' })
- if
minGroupCount > maxGroupCount
orminGroupCount > graph.N()
ormaxGroupCount <= 0
Calculates the local clustering coefficient for each node and returns the average clustering coefficient.
Remarks
Note: This method works with instances of Graph. To measure the clustering coefficients in an IGraph use ClusteringCoefficient instead.
The clustering coefficient measures the degree to which the nodes of a network tend to cluster together, see https://en.wikipedia.org/wiki/Clustering_coefficient. More precisely, for a node n, the local clustering coefficient is the actual number of edges between the neighbors of n divided by the maximum possible number of such edges. Hence, it is always a number value between 0.0
and 1.0
.
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- coefficientMap - INodeMap
- a map that returns the clustering coefficient for each node
Domain YNode Values number a value between 0.0
and1.0
that represents the local clustering coefficient - directed - boolean
- whether or not the graph is directed
Returns
- ↪number
- returns the average clustering coefficient
Computes the modularity of a given graph.
Remarks
The modularity value estimates the quality of the partition of the nodes into the given communities and is a value between [-1/2,1]
. High modularity values denote that nodes with dense connections are placed in the same community, while nodes with sparse connections are assigned to different communities.
To compute the modularity, a IDataProvider communityIndex
is needed that returns an integer value representing the community index for each node.
If no IDataProvider for the edge weights is given, the algorithm will do the calculation assuming that all edges have edge weight equals to 1.0
. Note that in the case where, all edges have edge weights equal to 0.0
, the modularity will be 0.0
, too.
Parameters
A map of options to pass to the method.
- graph - Graph
- The input graph.
- communityIndex - IDataProvider
- The community index for each node (nodes with same index are associated with the same community).
Domain YNode Values number an value representing the index of the community of a node - edgeWeight - IDataProvider
The weight of the edges.
If no IDataProvider is given, an equal weight of
1.0
will be assumed for all edges.Domain Edge Values number a value representing the weight of an edge
Returns
- ↪number
- the modularity of a given graph
Throws
- Exception({ name: 'ArgumentError' })
- if no IDataProvider
communityIndex
for the community indices is specified
See Also
hierarchicalClustering
(graph: Graph, distances: INodeDistanceProvider, linkage: Linkage) : DendrogramPartitions the graph into clusters based on hierarchical clustering.
Remarks
Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.
The result is returned as a Dendrogram object which represents the result of the clustering algorithm as a binary tree structure. It can easily be traversed by starting from the root node and moving on to nodes of the next level via method getChildren.
Complexity
O(graph.N() ^ 3)
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- distances - INodeDistanceProvider
- a given INodeDistanceProvider object that determines the distance between any two nodes
- linkage - Linkage
- one of the predefined linkage values
Returns
- ↪Dendrogram
- a Dendrogram which represents the result of the clustering as a binary tree or
null
if the given graph is empty
Throws
- Exception({ name: 'ArgumentError' })
- if an unknown linkage is given
hierarchicalClustering
(graph: Graph, clusterIDs: INodeMap, distances: INodeDistanceProvider, linkage: Linkage, cutOff: number) : numberPartitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given cut-off value.
Remarks
Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given cut-off value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.
Complexity
O(graph.N() ^ 3)
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- clusterIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs - distances - INodeDistanceProvider
- a given INodeDistanceProvider object that determines the distance between any two nodes
- linkage - Linkage
- one of the predefined linkage values
- cutOff - number
- the cut-off value that determines where to cut the hierarchic tree into clusters
Returns
- ↪number
- the resulting number of clusters
Throws
- Exception({ name: 'ArgumentError' })
- if an unknown linkage is used
hierarchicalClustering
(graph: Graph, maxCluster: number, clusterIDs: INodeMap, distances: INodeDistanceProvider, linkage: Linkage) : numberPartitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given maximum number of clusters.
Remarks
Note: This method works with instances of Graph. To partition an IGraph using hierarchical clustering use HierarchicalClustering instead.
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given maximum number of clusters value that is used for cutting the hierarchical tree at a point such that the number of remaining clusters equals to this value.
The maximum number of clusters needs to be greater than zero and less than the number of the nodes of the graph.
Complexity
O(graph.N() ^ 3)
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- maxCluster - number
- the maximum number of clusters that determines where to cut the hierarchic tree into clusters
- clusterIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs - distances - INodeDistanceProvider
- a given INodeDistanceProvider object that determines the distance between any two graph nodes
- linkage - Linkage
- one of the predefined linkage values
Returns
- ↪number
- the resulting number of clusters
Throws
- Exception({ name: 'ArgumentError' })
- if an unknown linkage is given or if the maximum number of clusters is less than or equal to zero or greater than the number of nodes of the graph
kMeansClustering
(graph: Graph, clusterIDs: INodeMap, nodePositions: IDataProvider, distanceMetric: DistanceMetric, k: number, iterations?: number, centroids?: YPoint[]) : numberPartitions the graph into clusters using k-means clustering algorithm.
Remarks
Note: This method works with instances of Graph. To partition an IGraph using k-means clustering use KMeansClustering instead.
The nodes of the graph will be partitioned in k
clusters based on their positions such that their distance from the cluster's mean (centroid) is minimized.
The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.
Complexity
O(graph.N() * k * d * I)
where k
is the number of clusters, I
the maximum number of iterations and d
the dimension of the points
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- clusterIDs - INodeMap
- the INodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Domain YNode Values number a non-negative value that represents the ID of the cluster to which each node belongs - nodePositions - IDataProvider
- the IDataProvider that holds a point representing the current position of each node in the graph
Domain YNode Values YPoint a point representing the current position of each node in the graph - distanceMetric - DistanceMetric
- one of the predefined distance metrics
- k - number
- the number of clusters
- iterations - number
- the maximum number of iterations performed by the algorithm for convergence
- centroids - YPoint[]
- the initial centroids
Returns
- ↪number
- the number of resulting (non-empty) clusters
Throws
- Exception({ name: 'ArgumentError' })
- if the given distance metric is not supported
k
will be set equal to 2
.k
or if no centroids are given, random initial centroids will be assigned for all clusters.labelPropagation
(graph: Graph, finalLabel: INodeMap, initialLabel?: IDataProvider, nodeWeight?: IDataProvider, edgeWeight?: IDataProvider, edgeDirectedness?: IDataProvider) : numberDetects the communities in the specified input graph by applying a label propagation algorithm.
Remarks
Note: This method works with instances of Graph. To partition an IGraph into clusters using label propagation use LabelPropagationClustering instead.
This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them. After applying the algorithm, all nodes with the same label (i.e. nodes for which INodeMap finalLabel
returns the same integer value) belong to the same community.
For a given node and neighbor, the weight of the neighbors label is nodeWeight(neighbor) * edgeWeight((node, neighbor))
. In addition, the overall weight of a specific label is the sum of such weights for all neighbors with matching label.
At the start of the algorithm, each node is associated with the label returned by method getInt of initialLabel
. Negative values are ignored, i.e., at the start there may be nodes without label. This implies that there may be results, where some of the nodes are not associated with labels, too. For such nodes method getInt of finalLabel
returns -1
. If no initialLabel
provider is given each node starts with a unique label value.
If IDataProvider edgeDirectedness
is specified, the algorithm only considers a subset of the neighbors when determining the label of a node. More precisely, it only considers the neighbors connected by edges to a node n for which
- the edge's directedness is greater than
0.0
and the edge is incoming (i.e. n is the edge's target), - the edge's directedness is smaller than
0.0
and the edge is outgoing (i.e. n is the edge's source), - or the edge's directedness is
0.0
(i.e. the edge is considered to be undirected).
Parameters
A map of options to pass to the method.
- graph - Graph
- the undirected input graph
- finalLabel - INodeMap
- the INodeMap that returns the integer labels of each node after applying the algorithm where nodes without a label receive the value
-1
Domain YNode Values number an value representing the label after applying the algorithm (nodes without a label receive the value -1
) - initialLabel - IDataProvider
IDataProvider that stores the initial integer labels of each node (negative values indicate that a node is not associated with an initial label).
If no provider is specified, each node starts with a unique label.
Domain YNode Values number a positive value representing the original label (use negative values if a node should not be associated with an initial label) - nodeWeight - IDataProvider
IDataProvider that stores the weights of the nodes.
If no provider is specified, the algorithm assumes that all nodes have the weight
1.0
.Domain YNode Values number the weight of a node - edgeWeight - IDataProvider
IDataProvider that stores the weights of the edges.
If no provider is specified, the algorithm assumes that all edges have the weight
1.0
.Domain Edge Values number the weight of an edge - edgeDirectedness - IDataProvider
IDataProvider that stores the directedness of the edges.
If no provider is specified, the algorithm assumes that all edges are undirected (i.e. have directedness
0.0
).Domain Edge Values number the directedness of an edge
Returns
- ↪number
- the number of resulting communities
0.0
results in ignoring them.edgeDirectedness
provider is specified, the algorithm assumes that all edges are undirected (i.e. have directedness 0.0
).Detects the communities in the specified input graph by applying the louvain modularity method.
Remarks
Note: This method works with instances of Graph. To partition an IGraph into clusters using louvain modularity use LouvainModularityClustering instead.
The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.
The community index of a node corresponds to the index of the associated community. If there are nodes that are not associated with a community, their index is -1
.
If no IDataProvider for the edge weights is given, the algorithm will assume that all edges have edge weights equal to 1.0
.
Parameters
A map of options to pass to the method.
- graph - Graph
- the input graph
- communityIndex - INodeMap
- INodeMap that stores, for each node, the integer community index of the node.
Domain YNode Values number an value representing the index of the community of a node - edgeWeight - IDataProvider
IDataProvider that stores the weights of the edges.
If nothing is given, the algorithm will assume that all edges have edge weights equal to
1.0
.Domain Edge Values number the weight of an edge
Returns
- ↪number
- the number of resulting communities