
Search this API  
PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object y.algo.Groups
public class Groups
This class provides methods for automatically partitioning nodes of a graph into groups.
Partitions can be defined using edge betweenness centrality, biconnectivity, kmeans clustering, hierarchical clustering or community detection algorithms like label propagation or louvain modularity.
This class also contains methods that provide clusteringrelated information like the average clustering coefficient, the average weighted coefficient and the modularity of a graph network.
k
clusters based on their
positions on the plane and a given distance metric.
Our implementation is based on the description in: "Near linear time algorithm to detect community structures in largescale networks" by U. N. Raghavan, R. Albert, and S. Kumara in Phys. Rev. E 76, 036106, 2007
Our implementation is based on the description in: "Fast unfolding of communities in large networks" by V.D. Blondel, J.L. Guillaume, R. Lambiotte and E. Lefebvre, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008 (12pp).
1
indicates that the node's neighbors form a clique (i.e., there is an edge between each
pair of neighbors).
Nested Class Summary  

static class 
Groups.Dendrogram
This class provides the result of hierarchical clustering algorithms by means of a binary tree structure. 
static interface 
Groups.Distances
An interface that determines the distance between two nodes of a graph. 
Field Summary  

static byte 
DISTANCE_METRIC_CHEBYCHEV
A specifier for Chebychev distance metric. 
static byte 
DISTANCE_METRIC_EUCLIDEAN
A specifier for euclidean distance metric. 
static byte 
DISTANCE_METRIC_EUCLIDEAN_SQUARED
A specifier for euclidean squared distance metric. 
static byte 
DISTANCE_METRIC_MANHATTAN
A specifier for Manhattan distance metric. 
static byte 
LINKAGE_AVERAGE
A specifier for averagelinkage (or meanlinkage) clustering. 
static byte 
LINKAGE_COMPLETE
A specifier for completelinkage (or maximumlinkage) clustering. 
static byte 
LINKAGE_SINGLE
A specifier for singlelinkage (or minimumlinkage) clustering. 
Method Summary  

static int 
biconnectedComponentGrouping(Graph graph,
NodeMap groupIDs)
This method partitions the graph by analyzing its biconnected components. 
static int 
edgeBetweennessClustering(Graph graph,
NodeMap clusterIDs,
boolean directed,
int minGroupCount,
int maxGroupCount,
DataProvider edgeCosts)
Partitions the graph into groups using edge betweenness centrality. 
static int 
edgeBetweennessClustering(Graph graph,
NodeMap clusterIDs,
double qualityTimeRatio,
int minGroupCount,
int maxGroupCount,
boolean refine)
Partitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman. 
static double 
getClusteringCoefficient(Graph graph,
NodeMap coefficientMap,
boolean directed)
Calculates the local clustering coefficient for each node and returns the average clustering coefficient. 
static double 
getModularity(Graph graph,
DataProvider communityIndex)
Computes the modularity of a given graph. 
static double 
getModularity(Graph graph,
DataProvider communityIndex,
DataProvider edgeWeight)
Computes the modularity of a given graph. 
static Groups.Dendrogram 
hierarchicalClustering(Graph graph,
Groups.Distances distances,
byte linkage)
Partitions the graph into clusters based on hierarchical clustering. 
static int 
hierarchicalClustering(Graph graph,
int maxCluster,
NodeMap clusterIDs,
Groups.Distances distances,
byte linkage)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given maximum number of clusters. 
static int 
hierarchicalClustering(Graph graph,
NodeMap clusterIDs,
Groups.Distances distances,
byte linkage,
double cutOff)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given cutoff value. 
static int 
kMeansClustering(Graph graph,
NodeMap clusterIDs,
DataProvider nodePositions,
byte distanceMetric,
int k)
Like kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[]) , but no initial centroids are
given as input. 
static int 
kMeansClustering(Graph graph,
NodeMap clusterIDs,
DataProvider nodePositions,
byte distanceMetric,
int k,
int iterations,
YPoint[] centroids)
Partitions the graph into clusters using kmeans clustering algorithm. 
static int 
labelPropagation(Graph graph,
NodeMap finalLabel)
Detects the communities in the specified input graph by applying a label propagation algorithm. 
static int 
labelPropagation(Graph graph,
NodeMap finalLabel,
DataProvider initialLabel)
Detects the communities in the specified input graph by applying a label propagation algorithm. 
static int 
labelPropagation(Graph graph,
NodeMap finalLabel,
DataProvider initialLabel,
DataProvider nodeWeight,
DataProvider edgeWeight,
DataProvider edgeDirectedness)
Detects the communities in the specified input graph by applying a label propagation algorithm. 
static int 
louvainModularity(Graph graph,
NodeMap communityIndex)
Detects the communities in the specified input graph by applying the louvain modularity method. 
static int 
louvainModularity(Graph graph,
NodeMap communityIndex,
DataProvider edgeWeight)
Detects the communities in the specified input graph by applying the louvain modularity method. 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Field Detail 

public static final byte DISTANCE_METRIC_EUCLIDEAN
kMeansClustering(Graph, NodeMap, DataProvider, byte, int)
,
kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[])
,
Constant Field Valuespublic static final byte DISTANCE_METRIC_EUCLIDEAN_SQUARED
kMeansClustering(Graph, NodeMap, DataProvider, byte, int)
,
kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[])
,
Constant Field Valuespublic static final byte DISTANCE_METRIC_MANHATTAN
kMeansClustering(Graph, NodeMap, DataProvider, byte, int)
,
kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[])
,
Constant Field Valuespublic static final byte DISTANCE_METRIC_CHEBYCHEV
kMeansClustering(Graph, NodeMap, DataProvider, byte, int)
,
kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[])
,
Constant Field Valuespublic static final byte LINKAGE_SINGLE
The distance between two clusters is determined by the minimum distance. Clusters thus are as close as their closest pair of nodes.
hierarchicalClustering(Graph, Distances, byte)
,
hierarchicalClustering(Graph, int, NodeMap, Distances, byte)
,
hierarchicalClustering(Graph, NodeMap, Distances, byte, double)
,
Constant Field Valuespublic static final byte LINKAGE_COMPLETE
The distance between two clusters is determined by the maximum distance. Clusters thus are as close as the pair of nodes that are furthest away from each other.
hierarchicalClustering(Graph, Distances, byte)
,
hierarchicalClustering(Graph, int, NodeMap, Distances, byte)
,
hierarchicalClustering(Graph, NodeMap, Distances, byte, double)
,
Constant Field Valuespublic static final byte LINKAGE_AVERAGE
The distance between two clusters is the average distance between each pair of nodes.
hierarchicalClustering(Graph, Distances, byte)
,
hierarchicalClustering(Graph, int, NodeMap, Distances, byte)
,
hierarchicalClustering(Graph, NodeMap, Distances, byte, double)
,
Constant Field ValuesMethod Detail 

public static int edgeBetweennessClustering(Graph graph, NodeMap clusterIDs, boolean directed, int minGroupCount, int maxGroupCount, DataProvider edgeCosts)
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove. The clustering with the best quality reached during the process will be returned.
The method requires the maximum number of groups that will be returned. The smaller this value is, the
faster the overall computation time.
The upper bound on the number of groups is graph.N()
. Also, the number of returned groups is
never smaller than the number of connected components of the graph.
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this
algorithm terminates after maxGroupCount
groups have been determined.
graph
 the input graphclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value
(cluster ID) for each nodedirected
 true
if the graph should be considered as directed, false
otherwiseminGroupCount
 the minimum number of groups that will be returnedmaxGroupCount
 the maximum number of groups that will be returnededgeCosts
 the DataProvider
that holds a positive Double
cost or null
if
the edges of the graph are considered to be of equal cost
java.lang.IllegalArgumentException
 if minGroupCount > maxGroupCount
or
minGroupCount > graph.N()
or
maxGroupCount <= 0
public static int edgeBetweennessClustering(Graph graph, NodeMap clusterIDs, double qualityTimeRatio, int minGroupCount, int maxGroupCount, boolean refine)
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove or if the requested maximum number of groups is found. The clustering with the best quality reached during the process is returned.
The algorithm includes several heuristic speedup techniques available through the quality/time ratio. For the
highest quality setting, it is used almost unmodified. The fast betweenness approximation of Brandes and Pich
(Centrality Estimation in Large Networks) is employed for values around 0.5
. Typically, this
results in a tiny decrease in quality but a large speedup and is the recommended setting. To achieve the lowest
running time, a local betweenness calculation is used
(Gregory: Local Betweenness for Finding Communities in Networks).
The method requires the maximum number of groups that will be returned. The smaller this value is, the
faster the overall computation time.
The upper bound on the number of groups is graph.N()
. Also, the number of returned groups is
never smaller than the number of connected components of the graph.
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this
algorithm terminates after maxGroupCount
groups have been determined.
graph
 the input graphclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value
(cluster ID) for each nodequalityTimeRatio
 a value between 0.0
(low quality, fast) and 1.0
(high quality, slow);
the recommended value is 0.5
minGroupCount
 the minimum number of groups that will be returnedmaxGroupCount
 the maximum number of groups that will be returnedrefine
 true
if the algorithm refines the current grouping, false
if the
algorithm discards the current grouping
java.lang.IllegalArgumentException
 if minGroupCount > maxGroupCount
or
minGroupCount > graph.N()
or
maxGroupCount <= 0
public static int biconnectedComponentGrouping(Graph graph, NodeMap groupIDs)
Nodes will be grouped such that the nodes within each group are biconnected. Nodes that belong to multiple biconnected components will be assigned to exactly one of these components.
groupID
for such a node will be null
.O(graph.E() + graph.N())
graph
 the input graphgroupIDs
 the NodeMap
that will be filled during the execution and returns an integer value (cluster ID)
for each node
public static int labelPropagation(Graph graph, NodeMap finalLabel)
This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent
label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them.
After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap
finalLabel
returns the same integer value) belong to the same community.
At the start of the algorithm, each node has a unique label. Use method
labelPropagation(Graph, NodeMap, DataProvider)
or
labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider)
to specify custom initial labels.
labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider)
that
allows to specify edge weights. Setting the weight of selfloops to 0.0
results in ignoring them.graph
 the undirected input graphfinalLabel
 the NodeMap
that returns the integer labels of each node after applying the algorithm
where nodes without a label receive the value 1
public static int labelPropagation(Graph graph, NodeMap finalLabel, DataProvider initialLabel)
This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent
label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them.
After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap
finalLabel
returns the same integer value) belong to the same community.
At the start of the algorithm, each node is associated with the label returned by method DataProvider.getInt(Object)
of initialLabel
. Negative values are ignored, i.e., at the start there may be nodes without label.
This implies that there may be results, where some of the nodes are not associated with labels, too.
For such nodes method NodeMap.getInt(Object)
of finalLabel
returns 1
.
labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider)
that allows
to specify edge weights. Setting the weight of selfloops to 0.0
results in ignoring them.graph
 the undirected input graphfinalLabel
 the NodeMap
that returns the integer labels of each node after applying the algorithm
where nodes without a label receive the value 1
initialLabel
 DataProvider
that stores the initial integer labels of each node (negative values
indicate that a node is not associated with an initial label)
public static int labelPropagation(Graph graph, NodeMap finalLabel, DataProvider initialLabel, DataProvider nodeWeight, DataProvider edgeWeight, DataProvider edgeDirectedness)
This algorithm iteratively assigns a label to each node. The label of a node is set to the most frequent
label among its neighbors. If there are multiple candidates the algorithm randomly chooses one of them.
After applying the algorithm, all nodes with the same label (i.e. nodes for which NodeMap
finalLabel
returns the same integer value) belong to the same community.
For a given node and neighbor, the weight of the neighbors label is
nodeWeight(neighbor) * edgeWeight((node, neighbor))
. In addition, the overall weight of a
specific label is the sum of such weights for all neighbors with matching label.
At the start of the algorithm, each node is associated with the label returned by method DataProvider.getInt(Object)
of initialLabel
. Negative values are ignored, i.e., at the start there may be nodes without label.
This implies that there may be results, where some of the nodes are not associated with labels, too.
For such nodes method NodeMap.getInt(Object)
of finalLabel
returns 1
.
If DataProvider
edgeDirectedness
is specified, the algorithm only considers a subset of the
neighbors when determining the label of a node. More precisely, it only considers the neighbors connected
by edges to a node n for which
0.0
and the edge is incoming (i.e. n is the edge's target),
0.0
and the edge is outgoing (i.e. n is the edge's source),
0.0
(i.e. the edge is considered to be undirected).
0.0
results in ignoring them.graph
 the undirected input graphfinalLabel
 the NodeMap
that returns the integer labels of each node after applying the algorithm
where nodes without a label receive the value 1
initialLabel
 DataProvider
that stores the initial integer labels of each node (negative values
indicate that a node is not associated with an initial label)nodeWeight
 DataProvider
that stores the weights of the nodesedgeWeight
 DataProvider
that stores the weights of the edgesedgeDirectedness
 DataProvider
that stores the directedness of the edges
public static int louvainModularity(Graph graph, NodeMap communityIndex)
The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.
The community index of a node corresponds to the index of the associated community. If there are nodes that are
not associated with a community, their index is 1
.
1.0
. Use method
louvainModularity(Graph, NodeMap, DataProvider)
to specify custom edge weights.graph
 the input graphcommunityIndex
 NodeMap
that stores, for each node, the integer community index of the node
public static int louvainModularity(Graph graph, NodeMap communityIndex, DataProvider edgeWeight)
The algorithm starts by assigning each node to its own community. Then, iteratively tries to construct communities by moving nodes from their current community to another until the modularity is locally optimized. At the next step, the small communities found are merged to a single node and the algorithm starts from the beginning until the modularity of the graph cannot be further improved.
The community index of a node corresponds to the index of the associated community. If there are nodes that are
not associated with a community, their index is 1
.
If no DataProvider
for the edge weights is given, the algorithm will assume that all edges have edge weights
equal to 1
.
graph
 the input graphcommunityIndex
 NodeMap
that stores, for each node, the integer community index of the nodeedgeWeight
 DataProvider
that stores the weights of the edges
public static double getModularity(Graph graph, DataProvider communityIndex)
The modularity value estimates the quality of the partition of the nodes into the given communities and is a value
between [1/2,1]
. High modularity values denote that nodes with dense connections are placed in the same
community, while nodes with sparse connections are assigned to different communities.
To compute the modularity, a DataProvider
communityIndex
is needed that returns an
integer value representing the community index for each node.
1.0
.
Use method getModularity(Graph, DataProvider, DataProvider)
to specify custom edge weights.graph
 the input graphcommunityIndex
 the community index for each node (nodes with same index are associated with the same community)
java.lang.IllegalArgumentException
 if no DataProvider
communityIndex
for the community indices is specifiedlouvainModularity(Graph, NodeMap)
,
labelPropagation(Graph, NodeMap)
public static double getModularity(Graph graph, DataProvider communityIndex, DataProvider edgeWeight)
The modularity value estimates the quality of the partition of the nodes into the given communities and is a value
between [1/2,1]
. High modularity values denote that nodes with dense connections are placed in the same
community, while nodes with sparse connections are assigned to different communities.
To compute the modularity, a DataProvider
communityIndex
is needed that returns an
integer value representing the community index for each node.
If no DataProvider
for the edge weights is given, the algorithm will do the calculation assuming that all
edges have edge weight equals to 1.0
. Note that in the case where, all edges have edge weights
equal to 0.0
, the modularity will be 0.0
, too.
graph
 the input graphcommunityIndex
 the community index for each node (nodes with same index are associated with the same community)edgeWeight
 the weight of the edges
java.lang.IllegalArgumentException
 if no DataProvider
communityIndex
for the community indices is specifiedlouvainModularity(Graph, NodeMap, DataProvider)
,
labelPropagation(Graph, NodeMap, DataProvider, DataProvider, DataProvider, DataProvider)
public static Groups.Dendrogram hierarchicalClustering(Graph graph, Groups.Distances distances, byte linkage)
The clustering is performed using the agglomerative strategy i.e., a bottomup approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.
The result is returned as a Groups.Dendrogram
object which represents the result of the clustering
algorithm as a binary tree structure. It can easily be traversed by starting from the
root node
and moving on to nodes of the next level via method
Groups.Dendrogram.getChildren(Node)
.
Groups.Distances
object returns a negative distance value for a pair of nodes, zero will be used instead.O(graph.N() ^ 3)
graph
 the input graphdistances
 a given Groups.Distances
object that determines the distance between any two nodeslinkage
 one of the predefined linkage values
Groups.Dendrogram
which represents the result of the clustering as a binary tree
java.lang.IllegalArgumentException
 if an unknown linkage is givenpublic static int hierarchicalClustering(Graph graph, NodeMap clusterIDs, Groups.Distances distances, byte linkage, double cutOff)
The clustering is performed using the agglomerative strategy i.e., a bottomup approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given cutoff value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.
O(graph.N() ^ 3)
graph
 the input graphclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value (cluster ID)
for each nodedistances
 a given Groups.Distances
object that determines the distance between any two nodeslinkage
 one of the predefined linkage valuescutOff
 the cutoff value that determines where to cut the hierarchic tree into clusters
java.lang.IllegalArgumentException
 if an unknown linkage is usedpublic static int hierarchicalClustering(Graph graph, int maxCluster, NodeMap clusterIDs, Groups.Distances distances, byte linkage)
The clustering is performed using the agglomerative strategy i.e., a bottomup approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given maximum number of clusters value that is used for cutting the hierarchical tree at a point such that the number of remaining clusters equals to this value.
The maximum number of clusters needs to be greater than zero and less than the number of the nodes of the graph.
Groups.Distances
object returns a negative distance value for a pair of nodes, zero will be used instead.O(graph.N() ^ 3)
graph
 the input graphmaxCluster
 the maximum number of clusters that determines where to cut the hierarchic tree into clustersclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value (cluster ID)
for each nodedistances
 a given Groups.Distances
object that determines the distance between any two graph nodeslinkage
 one of the predefined linkage values
java.lang.IllegalArgumentException
 if an unknown linkage is given or if the maximum number of clusters is less
than or equal to zero or greater than the number of nodes of the graphpublic static int kMeansClustering(Graph graph, NodeMap clusterIDs, DataProvider nodePositions, byte distanceMetric, int k, int iterations, YPoint[] centroids)
The nodes of the graph will be partitioned in k
clusters based on their positions such that their
distance from the cluster's mean (centroid) is minimized.
The distance can be defined using diverse metrics as euclidean distance, euclideansquared distance, manhattan distance or chebychev distance.
k
will
be set equal to 2
.k
or if no centroids are given, random
initial centroids will be assigned for all clusters.O(graph.N() * k * d * I)
where k
is the number of clusters,
I
the maximum number of iterations and d
the dimension of the pointsgraph
 the input graphclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value (cluster ID)
for each nodenodePositions
 the DataProvider
that holds a point
representing the current position of each
node in the graphdistanceMetric
 one of the predefined distance metricsk
 the number of clustersiterations
 the maximum number of iterations performed by the algorithm for convergencecentroids
 the initial centroids
java.lang.IllegalArgumentException
 if the given distance metric is not supportedpublic static int kMeansClustering(Graph graph, NodeMap clusterIDs, DataProvider nodePositions, byte distanceMetric, int k)
kMeansClustering(Graph, NodeMap, DataProvider, byte, int, int, YPoint[])
, but no initial centroids are
given as input.
k
will
be set equal to 2
.100
.O(graph.N() * k * d * I)
where k
is the number of clusters,
I
the maximum number of iterations and d
the dimension of the pointsgraph
 the input graphclusterIDs
 the NodeMap
that will be filled during the execution and returns an integer value (cluster ID)
for each nodenodePositions
 the DataProvider
that holds a point
representing the current position of each
node in the graphdistanceMetric
 one of the predefined distance metricsk
 the number of clusters
java.lang.IllegalArgumentException
 if the given distance metric is not supportedpublic static double getClusteringCoefficient(Graph graph, NodeMap coefficientMap, boolean directed)
The clustering coefficient measures the degree to which the nodes of a network tend to cluster together,
see https://en.wikipedia.org/wiki/Clustering_coefficient.
More precisely, for a node n, the local clustering coefficient is the actual number of edges between the neighbors
of n divided by the maximum possible number of such edges.
Hence, it is always a Double
value between 0.0
and 1.0
.
graph
 the input graphcoefficientMap
 a map that returns the clustering coefficient for each nodedirected
 whether or not the graph is directed

© Copyright 20002021, yWorks GmbH. All rights reserved. 

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 