public final class Groups extends Object
Partitions can be defined using edge betweenness centrality, biconnectivity, k-means clustering or hierarchical clustering.
k
-clusters based on their positions on the plane and a given distance metric.
Modifier and Type | Class and Description |
---|---|
static class |
Groups.Dendrogram
This class provides the result of hierarchical clustering algorithms by means of a binary tree structure.
|
static interface |
Groups.INodeDistanceProvider
An interface that determines the distance between two nodes of a graph.
|
Modifier and Type | Method and Description |
---|---|
static int |
biconnectedComponentGrouping(Graph graph,
INodeMap groupIDs)
This method partitions the graph by analyzing its biconnected components.
|
static int |
edgeBetweennessClustering(Graph graph,
INodeMap clusterIDs,
boolean directed,
int minGroupCount,
int maxGroupCount,
IDataProvider edgeCosts)
Partitions the graph into groups using edge betweenness centrality.
|
static int |
edgeBetweennessClustering(Graph graph,
INodeMap clusterIDs,
double qualityTimeRatio,
int minGroupCount,
int maxGroupCount,
boolean refine)
Partitions the graph into groups using edge betweenness clustering proposed by Girvan and Newman.
|
static Groups.Dendrogram |
hierarchicalClustering(Graph graph,
Groups.INodeDistanceProvider distances,
Linkage linkage)
Partitions the graph into clusters based on hierarchical clustering.
|
static int |
hierarchicalClustering(Graph graph,
INodeMap clusterIDs,
Groups.INodeDistanceProvider distances,
Linkage linkage,
double cutOff)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given
cut-off value.
|
static int |
hierarchicalClustering(Graph graph,
int maxCluster,
INodeMap clusterIDs,
Groups.INodeDistanceProvider distances,
Linkage linkage)
Partitions the graph into clusters based on hierarchical clustering, while the dendrogram is cut based on a given
maximum number of clusters.
|
static int |
kMeansClustering(Graph graph,
INodeMap clusterIDs,
IDataProvider nodePositions,
DistanceMetric distanceMetric,
int k)
Partitions the graph into clusters using k-means clustering algorithm.
|
static int |
kMeansClustering(Graph graph,
INodeMap clusterIDs,
IDataProvider nodePositions,
DistanceMetric distanceMetric,
int k,
int iterations)
Partitions the graph into clusters using k-means clustering algorithm.
|
static int |
kMeansClustering(Graph graph,
INodeMap clusterIDs,
IDataProvider nodePositions,
DistanceMetric distanceMetric,
int k,
int iterations,
YPoint[] centroids)
Partitions the graph into clusters using k-means clustering algorithm.
|
public static final int biconnectedComponentGrouping(Graph graph, INodeMap groupIDs)
Nodes will be grouped such that the nodes within each group are biconnected. Nodes that belong to multiple biconnected components will be assigned to exactly one of these components.
groupID
for
such a node will be null
.O(graph.E() + graph.N())
graph
- the input graphgroupIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodepublic static final int edgeBetweennessClustering(Graph graph, INodeMap clusterIDs, boolean directed, int minGroupCount, int maxGroupCount, IDataProvider edgeCosts)
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove. The clustering with the best quality reached during the process will be returned.
The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the
overall computation time. The upper bound on the number of groups is
graph.N()
. Also, the number of returned groups is never smaller than the number of connected components of
the graph.
IllegalArgumentException
- if minGroupCount > maxGroupCount
or minGroupCount > graph.N()
or maxGroupCount <= 0
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm
terminates after maxGroupCount
groups have been determined.
graph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodedirected
- true
if the graph should be considered as directed, false
otherwiseminGroupCount
- the minimum number of groups that will be returnedmaxGroupCount
- the maximum number of groups that will be returnededgeCosts
- the IDataProvider
that holds a positive Double
cost or null
if the edges of the graph are
considered to be of equal costpublic static final int edgeBetweennessClustering(Graph graph, INodeMap clusterIDs, double qualityTimeRatio, int minGroupCount, int maxGroupCount, boolean refine)
In each iteration the edge with the highest betweenness centrality is removed from the graph. The method stops, if there are no more edges to remove or if the requested maximum number of groups is found. The clustering with the best quality reached during the process is returned.
The algorithm includes several heuristic speed-up techniques available through the quality/time ratio. For the highest
quality setting, it is used almost unmodified. The fast betweenness approximation of Brandes and Pich
(Centrality Estimation in Large Networks) is employed for values around 0.5
. Typically, this results in a
tiny decrease in quality but a large speed-up and is the recommended setting. To achieve the lowest running time, a
local betweenness calculation is used (Gregory: Local Betweenness for Finding Communities in Networks).
The method requires the maximum number of groups that will be returned. The smaller this value is, the faster the
overall computation time. The upper bound on the number of groups is
graph.N()
. Also, the number of returned groups is never smaller than the number of connected components of
the graph.
IllegalArgumentException
- if minGroupCount > maxGroupCount
or minGroupCount > graph.N()
or maxGroupCount <= 0
minGroupCount <= maxGroupCount
minGroupCount <= graph.N()
maxGroupCount > 0
O(graph.E()) * O(edgeBetweenness)
In practice, it is faster because edge betweenness is computed for subgraphs during the process and this algorithm
terminates after maxGroupCount
groups have been determined.
graph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodequalityTimeRatio
- a value between 0.0
(low quality, fast) and 1.0
(high quality, slow); the recommended value is
0.5
minGroupCount
- the minimum number of groups that will be returnedmaxGroupCount
- the maximum number of groups that will be returnedrefine
- true
if the algorithm refines the current grouping, false
if the algorithm discards the current
groupingpublic static final Groups.Dendrogram hierarchicalClustering(Graph graph, Groups.INodeDistanceProvider distances, Linkage linkage)
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances metric. The algorithm continues until all nodes belong to the same cluster.
The result is returned as a Groups.Dendrogram
object which represents the result of the clustering algorithm as a
binary tree structure. It can easily be traversed by starting from the root node
and moving
on to nodes of the next level via method Groups.Dendrogram.getChildren(Node)
.
IllegalArgumentException
- if an unknown linkage is givenGroups.INodeDistanceProvider
object returns a negative distance value for a pair of nodes, zero will be used
instead.O(graph.N() ^ 3)
graph
- the input graphdistances
- a given Groups.INodeDistanceProvider
object that determines the distance between any two nodeslinkage
- one of the predefined linkage valuesGroups.Dendrogram
which represents the result of the clustering as a binary treepublic static final int hierarchicalClustering(Graph graph, INodeMap clusterIDs, Groups.INodeDistanceProvider distances, Linkage linkage, double cutOff)
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given cut-off value that is used for cutting the hierarchical tree at a point such that the dissimilarity values of the nodes that remain at the dendrogram are less than this value.
IllegalArgumentException
- if an unknown linkage is usedO(graph.N() ^ 3)
graph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodedistances
- a given Groups.INodeDistanceProvider
object that determines the distance between any two nodeslinkage
- one of the predefined linkage valuescutOff
- the cut-off value that determines where to cut the hierarchic tree into clusterspublic static final int hierarchicalClustering(Graph graph, int maxCluster, INodeMap clusterIDs, Groups.INodeDistanceProvider distances, Linkage linkage)
The clustering is performed using the agglomerative strategy i.e., a bottom-up approach according to which at the beginning each node belongs to its own cluster. At each step pairs of clusters are merged while moving up to the hierarchy. The dissimilarity between clusters is determined based on the given linkage and the given node distances. The algorithm continues until all nodes belong to the same cluster.
The result will be given based on the given maximum number of clusters value that is used for cutting the hierarchical tree at a point such that the number of remaining clusters equals to this value.
The maximum number of clusters needs to be greater than zero and less than the number of the nodes of the graph.
IllegalArgumentException
- if an unknown linkage is given or if the maximum number of clusters is less than or equal to zero or greater than the
number of nodes of the graphGroups.INodeDistanceProvider
object returns a negative distance value for a pair of nodes, zero will be used
instead.O(graph.N() ^ 3)
graph
- the input graphmaxCluster
- the maximum number of clusters that determines where to cut the hierarchic tree into clustersclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodedistances
- a given Groups.INodeDistanceProvider
object that determines the distance between any two graph nodeslinkage
- one of the predefined linkage valuespublic static final int kMeansClustering(Graph graph, INodeMap clusterIDs, IDataProvider nodePositions, DistanceMetric distanceMetric, int k)
The nodes of the graph will be partitioned in k
clusters based on their positions such that their distance from
the cluster's mean (centroid) is minimized.
The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.
IllegalArgumentException
- if the given distance metric is not supportedk
will be set equal to
2
.k
or if no centroids are given, random initial centroids will
be assigned for all clusters.O(graph.N() * k * d * I)
where k
is the number of clusters, I
the maximum number of iterations
and d
the dimension of the pointsgraph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodenodePositions
- the IDataProvider
that holds a point
representing the current position of each node in the graphdistanceMetric
- one of the predefined distance metricsk
- the number of clusterspublic static final int kMeansClustering(Graph graph, INodeMap clusterIDs, IDataProvider nodePositions, DistanceMetric distanceMetric, int k, int iterations)
The nodes of the graph will be partitioned in k
clusters based on their positions such that their distance from
the cluster's mean (centroid) is minimized.
The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.
IllegalArgumentException
- if the given distance metric is not supportedk
will be set equal to
2
.k
or if no centroids are given, random initial centroids will
be assigned for all clusters.O(graph.N() * k * d * I)
where k
is the number of clusters, I
the maximum number of iterations
and d
the dimension of the pointsgraph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodenodePositions
- the IDataProvider
that holds a point
representing the current position of each node in the graphdistanceMetric
- one of the predefined distance metricsk
- the number of clustersiterations
- the maximum number of iterations performed by the algorithm for convergencepublic static final int kMeansClustering(Graph graph, INodeMap clusterIDs, IDataProvider nodePositions, DistanceMetric distanceMetric, int k, int iterations, YPoint[] centroids)
The nodes of the graph will be partitioned in k
clusters based on their positions such that their distance from
the cluster's mean (centroid) is minimized.
The distance can be defined using diverse metrics as euclidean distance, euclidean-squared distance, manhattan distance or chebychev distance.
IllegalArgumentException
- if the given distance metric is not supportedk
will be set equal to
2
.k
or if no centroids are given, random initial centroids will
be assigned for all clusters.O(graph.N() * k * d * I)
where k
is the number of clusters, I
the maximum number of iterations
and d
the dimension of the pointsgraph
- the input graphclusterIDs
- the INodeMap
that will be filled during the execution and returns an integer value (cluster ID) for each nodenodePositions
- the IDataProvider
that holds a point
representing the current position of each node in the graphdistanceMetric
- one of the predefined distance metricsk
- the number of clustersiterations
- the maximum number of iterations performed by the algorithm for convergencecentroids
- the initial centroids