Avtor/Urednik     Kejžar, Nataša; Korenjak-Černe, Simona; Batagelj, Vladimir
Naslov     Clustering of modal-valued symbolic data
Tip     članek
Leto izdaje     2020
Obseg     str. str.
ISSN     1862-5347 - Advances in data analysis and classification
Jezik     eng
Abstrakt     Symbolic data analysis is based on special descriptions of data known as symbolic objects (SOs). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special type of SO is a representation with frequency or probability distributions (modal values). This representation enables us to simultaneously consider variables of all measurement types during the clustering process. In this paper, we present the theoretical basis for compatible leaders and agglomerative clustering methods with alternative dissimilarities for modal-valued SOs. The leaders method efficiently solves clustering problems with large numbers of units, while the agglomerative method can be applied either alone to a small data set, or to leaders, obtained from the compatible leaders clustering method. We focus on (a) the inclusion of weights that enables clustering representatives to retain the same structure as if clustering only first order units and (b) the selection of relative dissimilarities that produce more interpretable, i.e., meaningful optimal clustering representatives. The usefulness of the proposed methods with adaptations was assessed and substantiated by carefully constructed simulation settings and demonstrated on three different real-world data sets gaining in interpretability from the use of weights (population pyramids and ESS data) or relative dissimilarity (US patents data).
Proste vsebinske oznake     statistične metode
hierarhično združevanje v skupine
modalno vrednoteni simbolni podatki
statistical methods
hierarchical clustering
modal-valued symbolic data