Lumbermark: Resistant Clustering via Chopping Up Mutual Reachability Minimum Spanning Trees

Lumbermark

Keywords: Lumbermark, clustering, HDBSCAN*, DBSCAN, outliers, minimum spanning tree, MST, density estimation, mutual reachability distance.

Lumbermark is a fast and resistant divisive clustering algorithm which identifies a specified number of clusters.

It iteratively chops off sizeable limbs that are joined by protruding segments of a dataset’s mutual reachability minimum spanning tree.

The use of a mutual reachability distance pulls peripheral points farther away from each other.

When combined with the deadwood package, it can act as an outlier detector.

Contributing

lumbermark is distributed under the open source GNU AGPL v3 license. Its source code can be downloaded from GitHub.

The Python version is available from PyPI. The R version can be fetched from CRAN.

The core functionality is implemented in the form of a C++ library. It can thus be easily adapted for use in other environments. New contributions are welcome, e.g., Julia, Matlab/GNU Octave wrappers.

Author and Maintainer: Marek Gagolewski