I am using hierarchical clustering to analyze time series data. My code is implemented using the Mathematica function DirectAgglomerate[...]
, which generates hierarchical clusters given the following inputs:
a distance matrix D
the name of the method used to determine inter-cluster linkage.
I have calculated the distance matrix D using Manhattan distance:
where and is the number of data points in my time series.
My question is, is it ok to use Ward's inter-cluster linkage with a Manhattan distance matrix? Some sources suggest that Ward's linkage should only be used with Euclidean distance.
Note that DirectAgglomerate[...]
calculates Ward's linkage using the distance matrix only, not the original observations. Unfortunately, I am unsure how Mathematica modifies Ward's original algorithm, which (from my understanding) worked by minimizing the error sum of squares of the observations, calculated with respect to the cluster mean. For example, for a cluster consisting of a vector of univariate observations, Ward formulated the error sum of squares as:
(Other software tools such as Matlab and R also implement Ward's clustering using just a distance matrix so the question isn't specific to Mathematica.)
sumber
agnes
in the cluster package.Jawaban:
The Ward clustering algorithm is a hierarchical clustering method that minimizes an 'inertia' criteria at each step. This inertia quantifies the sum of squared residuals between the reduced signal and the initial signal: it is a measure of the variance of the error in an l2 (Euclidean) sens. Actually, you even mention it in your question. This is why, I believe, it makes no sens to apply it to a distance matrix that is not a l2 Euclidean distance.
On the other hand, an average linkage or a single linkage hierarchical clustering would be perfectly suitable for other distances.
sumber
I can't think of any reason why Ward should favor any metric. Ward's method is just another option to decide which clusters to fusion next during agglomeration. This is achieved by finding the two clusters whose fusion will minimize a certain error (examplary source for the formula).
Hence it relies on two concepts:
So: As long as the properties of the choosen metric (like e.g. rotation,translation or scale invariance) satisfy your needs (and the metric fits to the way the cluster mean is calculated), I don't see any reason to not use it.
I suspect that most people suggest the euclidean metric because they
sumber
Another way of thinking about this, which might lend itself to an adaptation forℓ1 is that choice of the mean comes from the fact that the mean is the point that minimizes the sum of squared Euclidean distances. If you're using ℓ1 to measure the distance between time series, then you should be using a center that minimizes the sum of squared ℓ1 distances.
sumber