ELKI is modeled around a
database-inspired core, which uses a vertical data layout that stores data in column groups (similar to
column families in
NoSQL databases). This database core provides
nearest neighbor search, range/radius search, and distance query functionality with
index acceleration for a wide range of
dissimilarity measures. Algorithms based on such queries (e.g.
k-nearest-neighbor algorithm,
local outlier factor and
DBSCAN) can be implemented easily and benefit from the index acceleration. The database core also provides fast and memory efficient collections for object collections and associative structures such as nearest neighbor lists. ELKI makes extensive use of Java interfaces, so that it can be extended easily in many places. For example, custom data types, distance functions, index structures, algorithms, input parsers, and output modules can be added and combined without modifying the existing code. This includes the possibility of defining a custom distance function and using existing indexes for acceleration. ELKI uses a
service loader architecture to allow publishing extensions as separate
jar files. ELKI uses optimized collections for performance rather than the standard Java API.
For loops for example are written similar to
C++ iterators: for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) { relation.get(iter); // E.g., get the referenced object idcollection.add(iter); // E.g., add the reference to a DBID collection } In contrast to typical Java iterators (which can only iterate over objects), this conserves memory, because the iterator can internally use
primitive values for data storage. The reduced
garbage collection improves the runtime. Optimized collections libraries such as
GNU Trove3,
Koloboke, and
fastutil employ similar optimizations. ELKI includes data structures such as object collections and heaps (for, e.g.,
nearest neighbor search) using such optimizations. == Visualization ==