Search this API

y.algo
Class Groups

java.lang.Object
  extended by y.algo.Groups

public class Groups
extends java.lang.Object

This class provides methods for automatically partitioning nodes of a graph into groups.

Partitions can be defined using edge betweenness centrality, biconnectivity, k-means clustering, hierarchical clustering or community detection algorithms like label propagation or louvain modularity.

This class also contains methods that provide clustering-related information like the average clustering coefficient, the average weighted coefficient and the modularity of a graph network.

Definitions

 

Nested Class Summary
static class Groups.Dendrogram
          This class provides the result of hierarchical clustering algorithms by means of a binary tree structure.
static interface Groups.Distances
          An interface that determines the distance between two nodes of a graph.
 
Field Summary
static byte DISTANCE_METRIC_CHEBYCHEV
          A specifier for Chebychev distance metric.
static byte DISTANCE_METRIC_EUCLIDEAN
          A specifier for euclidean distance metric.
static byte DISTANCE_METRIC_EUCLIDEAN_SQUARED
          A specifier for euclidean squared distance metric.
static byte DISTANCE_METRIC_MANHATTAN
          A specifier for Manhattan distance metric.
static byte LINKAGE_AVERAGE
          A specifier for average-linkage (or mean-linkage) clustering.
static byte LINKAGE_COMPLETE
          A specifier for complete-linkage (or maximum-linkage) clustering.
static byte LINKAGE_SINGLE
          A specifier for single-linkage (or minimum-linkage) clustering.
 
Method Summary
static int biconnectedComponentGrouping(Graph graph, NodeMap groupIDs)
          This method partitions the graph by analyzing its biconnected components.
static int edgeBetweennessClustering(Graph graph, NodeMap clusterIDs, boolean directed, int minGroupCount, int maxGroupCount, DataProvider edgeCosts)
          Partitions the graph into groups using edge betweenness centrality.
static int edgeBetweennessClustering(Graph graph, NodeMap clusterIDs, double qualityTimeRatio, int minGroupCount, int maxGroupCount, boolean refine)
          Partitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman.
static double getClusteringCoefficient(Graph graph, NodeMap coefficientMap, boolean directed)
          Calculates the local clustering coefficient for each node and returns the average clustering coefficient.
static double getModularity(Graph graph, DataProvider communityIndex)
          Computes the modularity of a given graph.
static double getModularity(Graph graph, DataProvider communityIndex, DataProvider edgeWeight)
          Computes the modularity of a given graph.
static Groups.Dendrogram hierarchicalClustering(Graph graph, Groups.Distances distances, byte linkage)
          Partitions the graph into clusters based on hierarchical clustering.
static int hierarchicalClustering(Graph graph, int maxCluster, NodeMap clusterIDs, Groups.Distances distances, byte linkage)
          Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given maximum number of clusters.
static int hierarchicalClustering(Graph graph, NodeMap clusterIDs, Groups.Distances distances, byte linkage, double cutOff)
          Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given cut-off value.
static int kMeansClustering(Graph graph, NodeMap clusterIDs, DataProvider nodePositions, byte distanceMetric, int k)
          Like kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), but no initial centroids are given as input.
static int kMeansClustering(Graph graph, NodeMap clusterIDs, DataProvider nodePositions, byte distanceMetric, int k, int iterations, YPoint[] centroids)
          Partitions the graph into clusters using k-means clustering algorithm.
static int labelPropagation(Graph graph, NodeMap finalLabel)
          Detects the communities in the specified input graph by applying a label propagation algorithm.
static int labelPropagation(Graph graph, NodeMap finalLabel, DataProvider initialLabel)
          Detects the communities in the specified input graph by applying a label propagation algorithm.
static int labelPropagation(Graph graph, NodeMap finalLabel, DataProvider initialLabel, DataProvider nodeWeight, DataProvider edgeWeight, DataProvider edgeDirectedness)
          Detects the communities in the specified input graph by applying a label propagation algorithm.
static int louvainModularity(Graph graph, NodeMap communityIndex)
          Detects the communities in the specified input graph by applying the louvain modularity method.
static int louvainModularity(Graph graph, NodeMap communityIndex, DataProvider edgeWeight)
          Detects the communities in the specified input graph by applying the louvain modularity method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DISTANCE_METRIC_EUCLIDEAN

public static final byte DISTANCE_METRIC_EUCLIDEAN
A specifier for euclidean distance metric.

See Also:
kMeansClustering(Graph, NodeMap, DataProvider, byte, int), kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), Constant Field Values

DISTANCE_METRIC_EUCLIDEAN_SQUARED

public static final byte DISTANCE_METRIC_EUCLIDEAN_SQUARED
A specifier for euclidean squared distance metric.

See Also:
kMeansClustering(Graph, NodeMap, DataProvider, byte, int), kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), Constant Field Values

DISTANCE_METRIC_MANHATTAN

public static final byte DISTANCE_METRIC_MANHATTAN
A specifier for Manhattan distance metric.

See Also:
kMeansClustering(Graph, NodeMap, DataProvider, byte, int), kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), Constant Field Values

DISTANCE_METRIC_CHEBYCHEV

public static final byte DISTANCE_METRIC_CHEBYCHEV
A specifier for Chebychev distance metric.

See Also:
kMeansClustering(Graph, NodeMap, DataProvider, byte, int), kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), Constant Field Values

LINKAGE_SINGLE

public static final byte LINKAGE_SINGLE
A specifier for single-linkage (or minimum-linkage) clustering.

The distance between two clusters is determined by the minimum distance. Clusters thus are as close as their closest pair of nodes.

See Also:
hierarchicalClustering(Graph, Distances, byte), hierarchicalClustering(Graph, int, NodeMap, Distances, byte), hierarchicalClustering(Graph, NodeMap, Distances, byte, double), Constant Field Values

LINKAGE_COMPLETE

public static final byte LINKAGE_COMPLETE
A specifier for complete-linkage (or maximum-linkage) clustering.

The distance between two clusters is determined by the maximum distance. Clusters thus are as close as the pair of nodes that are furthest away from each other.

See Also:
hierarchicalClustering(Graph, Distances, byte), hierarchicalClustering(Graph, int, NodeMap, Distances, byte), hierarchicalClustering(Graph, NodeMap, Distances, byte, double), Constant Field Values

LINKAGE_AVERAGE

public static final byte LINKAGE_AVERAGE
A specifier for average-linkage (or mean-linkage) clustering.

The distance between two clusters is the average distance between each pair of nodes.

See Also:
hierarchicalClustering(Graph, Distances, byte), hierarchicalClustering(Graph, int, NodeMap, Distances, byte), hierarchicalClustering(Graph, NodeMap, Distances, byte, double), Constant Field Values
Method Detail

edgeBetweennessClustering

public static int edgeBetweennessClustering(Graph graph,
                                            NodeMap clusterIDs,
                                            boolean directed,
                                            int minGroupCount,
                                            int maxGroupCount,
                                            DataProvider edgeCosts)
Partitions the graph into groups using edge betweenness centrality.

In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove. The clustering with the best quality reached during the process will be returned.

The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the overall computation time. The upper bound on the number of groups is graph.N(). Also, the number of returned groups is never smaller than the number of connected components of the graph.

Preconditions:
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
Complexity:
O(graph.E()) * O(edgeBetweenness)

In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount groups have been determined.

Parameters:
graph - the input graph
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
directed - true if the graph should be considered as directed, false otherwise
minGroupCount - the minimum number of groups that will be returned
maxGroupCount - the maximum number of groups that will be returned
edgeCosts - the DataProvider that holds a positive Double cost or null if the edges of the graph are considered to be of equal cost
Returns:
the resulting number of different groups
Throws:
java.lang.IllegalArgumentException - if minGroupCount > maxGroupCount or minGroupCount > graph.N() or maxGroupCount <= 0

edgeBetweennessClustering

public static int edgeBetweennessClustering(Graph graph,
                                            NodeMap clusterIDs,
                                            double qualityTimeRatio,
                                            int minGroupCount,
                                            int maxGroupCount,
                                            boolean refine)
Partitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman.

In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove or if the requested maximum number of groups is found. The clustering with the best quality reached during the process is returned.

The algorithm includes several heuristic speed-up techniques available through the quality/time ratio. For the highest quality setting, it is used almost unmodified. The fast betweenness approximation of Brandes and Pich (Centrality Estimation in Large Networks) is employed for values around 0.5. Typically, this results in a tiny decrease in quality but a large speed-up and is the recommended setting. To achieve the lowest running time, a local betweenness calculation is used (Gregory: Local Betweenness for Finding Communities in Networks).

The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the overall computation time. The upper bound on the number of groups is graph.N(). Also, the number of returned groups is never smaller than the number of connected components of the graph.

Preconditions:
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
Complexity:
O(graph.E()) * O(edgeBetweenness)

In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm terminates after maxGroupCount groups have been determined.

Parameters:
graph - the input graph
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
qualityTimeRatio - a value between 0.0 (low quality, fast) and 1.0 (high quality, slow); the recommended value is 0.5
minGroupCount - the minimum number of groups that will be returned
maxGroupCount - the maximum number of groups that will be returned
refine - true if the algorithm refines the current grouping, false if the algorithm discards the current grouping
Returns:
the resulting number of different groups
Throws:
java.lang.IllegalArgumentException - if minGroupCount > maxGroupCount or minGroupCount > graph.N() or maxGroupCount <= 0

biconnectedComponentGrouping

public static int biconnectedComponentGrouping(Graph graph,
                                               NodeMap groupIDs)
This method partitions the graph by analyzing its biconnected components.

Nodes will be grouped such that the nodes within each group are biconnected. Nodes that belong to multiple biconnected components will be assigned to exactly one of these components.

 
Biconnected components are defined for undirected graphs only. As a consequence, this algorithm ignores self-loops and isolated nodes with only self-loop edges or no edges at all are not assigned to any group, i.e. the groupID for such a node will be null.
Complexity:
O(graph.E() + graph.N())
Parameters:
graph - the input graph
groupIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
Returns:
the resulting number of different groups

labelPropagation

public static int labelPropagation(Graph graph,
                                   NodeMap finalLabel)
Detects the communities in the specified input graph by applying a label propagation algorithm.

This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them. After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap finalLabel returns the same integer value) belong to the same community.

At the start of the algorithm, each node has a unique label. Use method labelPropagation(Graph, NodeMap, DataProvider) or labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider) to specify custom initial labels.

 
A node with a self-loop is considered to be a neighbor of itself. Hence, the node's label is also included when calculating the most frequent label among the neighbors. If you want to prevent this, you can use method labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider) that allows to specify edge weights. Setting the weight of self-loops to 0.0 results in ignoring them.
Parameters:
graph - the undirected input graph
finalLabel - the NodeMap that returns the integer labels of each node after applying the algorithm where nodes without a label receive the value -1
Returns:
the number of resulting communities

labelPropagation

public static int labelPropagation(Graph graph,
                                   NodeMap finalLabel,
                                   DataProvider initialLabel)
Detects the communities in the specified input graph by applying a label propagation algorithm.

This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them. After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap finalLabel returns the same integer value) belong to the same community.

At the start of the algorithm, each node is associated with the label returned by method DataProvider.getInt(Object) of initialLabel. Negative values are ignored, i.e., at the start there may be nodes without label. This implies that there may be results, where some of the nodes are not associated with labels, too. For such nodes method NodeMap.getInt(Object) of finalLabel returns -1.

 
A node with a self-loop is considered to be a neighbor of itself. Hence, the node's label is also included when calculating the most frequent label among the neighbors. If you want to prevent this, you can use method labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider) that allows to specify edge weights. Setting the weight of self-loops to 0.0 results in ignoring them.
Parameters:
graph - the undirected input graph
finalLabel - the NodeMap that returns the integer labels of each node after applying the algorithm where nodes without a label receive the value -1
initialLabel - DataProvider that stores the initial integer labels of each node (negative values indicate that a node is not associated with an initial label)
Returns:
the number of resulting communities

labelPropagation

public static int labelPropagation(Graph graph,
                                   NodeMap finalLabel,
                                   DataProvider initialLabel,
                                   DataProvider nodeWeight,
                                   DataProvider edgeWeight,
                                   DataProvider edgeDirectedness)
Detects the communities in the specified input graph by applying a label propagation algorithm.

This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them. After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap finalLabel returns the same integer value) belong to the same community.

For a given node and neighbor, the weight of the neighbors label is nodeWeight(neighbor) * edgeWeight((node, neighbor)). In addition, the overall weight of a specific label is the sum of such weights for all neighbors with matching label.

At the start of the algorithm, each node is associated with the label returned by method DataProvider.getInt(Object) of initialLabel. Negative values are ignored, i.e., at the start there may be nodes without label. This implies that there may be results, where some of the nodes are not associated with labels, too. For such nodes method NodeMap.getInt(Object) of finalLabel returns -1.

If DataProvider edgeDirectedness is specified, the algorithm only considers a subset of the neighbors when determining the label of a node. More precisely, it only considers the neighbors connected by edges to a node n for which

 
A node with a self-loop is considered to be a neighbor of itself. Hence, the node's label is also included when calculating the most frequent label among the neighbors. If you want to prevent this, you can specify suitable edge weights. Setting the weight of self-loops to 0.0 results in ignoring them.
Parameters:
graph - the undirected input graph
finalLabel - the NodeMap that returns the integer labels of each node after applying the algorithm where nodes without a label receive the value -1
initialLabel - DataProvider that stores the initial integer labels of each node (negative values indicate that a node is not associated with an initial label)
nodeWeight - DataProvider that stores the weights of the nodes
edgeWeight - DataProvider that stores the weights of the edges
edgeDirectedness - DataProvider that stores the directedness of the edges
Returns:
the number of resulting communities

louvainModularity

public static int louvainModularity(Graph graph,
                                    NodeMap communityIndex)
Detects the communities in the specified input graph by applying the louvain modularity method.

The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.

The community index of a node corresponds to the index of the associated community. If there are nodes that are not associated with a community, their index is -1.

 
All edge weights are considered to be 1.0. Use method louvainModularity(Graph, NodeMap, DataProvider) to specify custom edge weights.
Parameters:
graph - the input graph
communityIndex - NodeMap that stores, for each node, the integer community index of the node
Returns:
the number of resulting communities

louvainModularity

public static int louvainModularity(Graph graph,
                                    NodeMap communityIndex,
                                    DataProvider edgeWeight)
Detects the communities in the specified input graph by applying the louvain modularity method.

The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.

The community index of a node corresponds to the index of the associated community. If there are nodes that are not associated with a community, their index is -1.

If no DataProvider for the edge weights is given, the algorithm will assume that all edges have edge weights equal to 1.

Parameters:
graph - the input graph
communityIndex - NodeMap that stores, for each node, the integer community index of the node
edgeWeight - DataProvider that stores the weights of the edges
Returns:
the number of resulting communities

getModularity

public static double getModularity(Graph graph,
                                   DataProvider communityIndex)
Computes the modularity of a given graph.

The modularity value estimates the quality of the partition of the nodes into the given communities and is a value between [-1/2,1]. High modularity values denote that nodes with dense connections are placed in the same community, while nodes with sparse connections are assigned to different communities.

To compute the modularity, a DataProvider communityIndex is needed that returns an integer value representing the community index for each node.

 
All edge weights are considered to be 1.0. Use method getModularity(Graph, DataProvider, DataProvider) to specify custom edge weights.
Parameters:
graph - the input graph
communityIndex - the community index for each node (nodes with same index are associated with the same community)
Returns:
the modularity of a given graph
Throws:
java.lang.IllegalArgumentException - if no DataProvider communityIndex for the community indices is specified
See Also:
louvainModularity(Graph, NodeMap), labelPropagation(Graph, NodeMap)

getModularity

public static double getModularity(Graph graph,
                                   DataProvider communityIndex,
                                   DataProvider edgeWeight)
Computes the modularity of a given graph.

The modularity value estimates the quality of the partition of the nodes into the given communities and is a value between [-1/2,1]. High modularity values denote that nodes with dense connections are placed in the same community, while nodes with sparse connections are assigned to different communities.

To compute the modularity, a DataProvider communityIndex is needed that returns an integer value representing the community index for each node.

If no DataProvider for the edge weights is given, the algorithm will do the calculation assuming that all edges have edge weight equals to 1.0. Note that in the case where, all edges have edge weights equal to 0.0, the modularity will be 0.0, too.

Parameters:
graph - the input graph
communityIndex - the community index for each node (nodes with same index are associated with the same community)
edgeWeight - the weight of the edges
Returns:
the modularity of a given graph
Throws:
java.lang.IllegalArgumentException - if no DataProvider communityIndex for the community indices is specified
See Also:
louvainModularity(Graph, NodeMap, DataProvider), labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider)

hierarchicalClustering

public static Groups.Dendrogram hierarchicalClustering(Graph graph,
                                                       Groups.Distances distances,
                                                       byte linkage)
Partitions the graph into clusters based on hierarchical clustering.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.

The result is returned as a Groups.Dendrogram object which represents the result of the clustering algorithm as a binary tree structure. It can easily be traversed by starting from the root node and moving on to nodes of the next level via method Groups.Dendrogram.getChildren(Node).

 
If the Groups.Distances object returns a negative distance value for a pair of nodes, zero will be used instead.
Complexity:
O(graph.N() ^ 3)
Parameters:
graph - the input graph
distances - a given Groups.Distances object that determines the distance between any two nodes
linkage - one of the predefined linkage values
Returns:
a Groups.Dendrogram which represents the result of the clustering as a binary tree or null if the given graph is empty
Throws:
java.lang.IllegalArgumentException - if an unknown linkage is given

hierarchicalClustering

public static int hierarchicalClustering(Graph graph,
                                         NodeMap clusterIDs,
                                         Groups.Distances distances,
                                         byte linkage,
                                         double cutOff)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given cut-off value.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.

The result will be given based on the given cut-off value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.

 
The cut-off value should be greater that zero. If a negative value is given, zero will be used instead.
Complexity:
O(graph.N() ^ 3)
Parameters:
graph - the input graph
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
distances - a given Groups.Distances object that determines the distance between any two nodes
linkage - one of the predefined linkage values
cutOff - the cut-off value that determines where to cut the hierarchic tree into clusters
Returns:
the resulting number of clusters
Throws:
java.lang.IllegalArgumentException - if an unknown linkage is used

hierarchicalClustering

public static int hierarchicalClustering(Graph graph,
                                         int maxCluster,
                                         NodeMap clusterIDs,
                                         Groups.Distances distances,
                                         byte linkage)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given maximum number of clusters.

The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.

The result will be given based on the given maximum number of clusters value that is used for cutting the hierarchical tree at a point such that the number of remaining clusters equals to this value.

The maximum number of clusters needs to be greater than zero and less than the number of the nodes of the graph.

 
If the Groups.Distances object returns a negative distance value for a pair of nodes, zero will be used instead.
Complexity:
O(graph.N() ^ 3)
Parameters:
graph - the input graph
maxCluster - the maximum number of clusters that determines where to cut the hierarchic tree into clusters
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
distances - a given Groups.Distances object that determines the distance between any two graph nodes
linkage - one of the predefined linkage values
Returns:
the resulting number of clusters
Throws:
java.lang.IllegalArgumentException - if an unknown linkage is given or if the maximum number of clusters is less than or equal to zero or greater than the number of nodes of the graph

kMeansClustering

public static int kMeansClustering(Graph graph,
                                   NodeMap clusterIDs,
                                   DataProvider nodePositions,
                                   byte distanceMetric,
                                   int k,
                                   int iterations,
                                   YPoint[] centroids)
Partitions the graph into clusters using k-means clustering algorithm.

The nodes of the graph will be partitioned in k clusters based on their positions such that their distance from the cluster's mean (centroid) is minimized.

The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.

 
If the given number of clusters is greater than the number of nodes of the graph, number k will be set equal to 2.
 
If the number of given centroids is smaller than k or if no centroids are given, random initial centroids will be assigned for all clusters.
Complexity:
O(graph.N() * k * d * I) where k is the number of clusters, I the maximum number of iterations and d the dimension of the points
Parameters:
graph - the input graph
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
nodePositions - the DataProvider that holds a point representing the current position of each node in the graph
distanceMetric - one of the predefined distance metrics
k - the number of clusters
iterations - the maximum number of iterations performed by the algorithm for convergence
centroids - the initial centroids
Returns:
the number of resulting (non-empty) clusters
Throws:
java.lang.IllegalArgumentException - if the given distance metric is not supported

kMeansClustering

public static int kMeansClustering(Graph graph,
                                   NodeMap clusterIDs,
                                   DataProvider nodePositions,
                                   byte distanceMetric,
                                   int k)
Like kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]), but no initial centroids are given as input.

 
If the given number of clusters is greater than the number of nodes of the graph, number k will be set equal to 2.
 
The maximum number of iterations that will be performed is set to 100.
Complexity:
O(graph.N() * k * d * I) where k is the number of clusters, I the maximum number of iterations and d the dimension of the points
Parameters:
graph - the input graph
clusterIDs - the NodeMap that will be filled during the execution and returns an integer value (cluster ID) for each node
nodePositions - the DataProvider that holds a point representing the current position of each node in the graph
distanceMetric - one of the predefined distance metrics
k - the number of clusters
Returns:
the number of resulting (non-empty) clusters
Throws:
java.lang.IllegalArgumentException - if the given distance metric is not supported

getClusteringCoefficient

public static double getClusteringCoefficient(Graph graph,
                                              NodeMap coefficientMap,
                                              boolean directed)
Calculates the local clustering coefficient for each node and returns the average clustering coefficient.

The clustering coefficient measures the degree to which the nodes of a network tend to cluster together, see https://en.wikipedia.org/wiki/Clustering_coefficient. More precisely, for a node n, the local clustering coefficient is the actual number of edges between the neighbors of n divided by the maximum possible number of such edges. Hence, it is always a Double value between 0.0 and 1.0.

 
The algorithm ignores self-loops and parallel edges. Group nodes are handled like common nodes.
Parameters:
graph - the input graph
coefficientMap - a map that returns the clustering coefficient for each node
directed - whether or not the graph is directed
Returns:
returns the average clustering coefficient

© Copyright 2000-2022,
yWorks GmbH.
All rights reserved.