Partition-Based Clustering in Object Bases:
From Theory to Practice

Carsten A. Gerlhof
Alfons Kemper
Christoph Kilger
Guido Moerkotte
Proc. of the 4th Intl. Conf. on Foundations of Data Organization and Algorithms (FODO),
Lecture Notes in Computer Science (LNCS), vol. 730, pages 301-316,
Chicago, Illinois, October 1993, Springer-Verlag.
We classify clustering algorithms into sequence-based techniques - which transform the object net into a linear sequence - and partition-based clustering algorithms. Tsangaris and Naughton have shown that the partition-based techniques are superior. However, their work is based on a single partitioning algorithm, the Kernighan and Lin heuristics, which is not applicable to realistically large object bases because of its high running-time complexity.

The contribution of this paper is two-fold:

  1. We devise a new class of greedy object graph partitioning algorithms (GGP) whose running-time complexity is moderate while still yielding good quality results.
  2. Our extensive quantitative analysis of all well-known partitioning algorithms indicates that no one algorithm performs superior for all object net characteristics.
Therefore, we propose an adaptable clustering strategy according to a multi-dimensional grid: the dimensions correspond to particular characteristics of the object base - given by, e.g., number and size of objects, degree of object sharing - and the grid entries indicate the most suitable clustering algorithm for the particular configuration.

An extended version of this paper is available as technical report.

To receive the whole postscript file klick here.

Carsten A. Gerlhof, 02.06.1994