Partition-Based Clustering in Object Bases
From Theory to Practice


Authors:
Carsten A. Gerlhof
Alfons Kemper
Christoph Kilger
Guido Moerkotte
Technical Report:
No. 92-34, RWTH Aachen, D-52056 Aachen, December 1992.
Abstract:
We classify clustering algorithms into sequence-based techniques - which transform the object net into a linear sequence - and partition-based clustering algorithms. Tsangaris and Naughton have shown that the partition-based techniques are superior. However, their work is based on a single partitioning algorithm, the Kernighan and Lin heuristics, which is not applicable to realistically large object bases because of its high running-time complexity.

The contribution of this paper is two-fold:

  1. we devise a new class of greedy object graph partitioning algorithms (GGP) whose running-time complexity is moderate while still yielding good quality results. For large object graphs GGP is the best known heuristics with an acceptable running-time.
  2. We carry out an extensive quantitative analysis of all well-known partitioning algorithms for clustering object graphs.
Our analysis yields that no one algorithm performs superior for all object net characteristics. Therefore, we derive a multi-dimensional grid: the dimensions correspond to particular characteristics of the object base configurations and the grid entries indicate the best clustering algorithm for the particular configuration. We propose an adaptable clustering strategy by determining first the characteristics of the clustering problem - given by, e.g., number and size of objects, degree of object sharing - and then applying the most suitable algorithm which is obtained from the multi-dimensional grid.

A shorter version of this paper is published at the FODO 1993 conference.

To receive the whole postscript file klick here.


Carsten A. Gerlhof, 02.06.1994