Pattern Recognition

4 downloads 0 Views 73KB Size Report
Pattern Recognition. Prof. Christian Bauckhage. Page 2. outline additional material for lecture 13 general advice for data clustering. Page 3. note clustering is ...
Pattern Recognition Prof. Christian Bauckhage

outline additional material for lecture 13

general advice for data clustering

note

clustering is generally an ill-posed problem

note

different kind of data will require different kinds of clustering algorithms ⇔ there is no one-size-fits-all solution for clustering

⇒ check what kind of implicit assumptions your favorite algorithm makes verify whether or not they apply to the problem at hand

note

protoype-based clustering algorithms (such as k-means) crucially depend on initializations ⇔ different runs on the same data will very likely produce different results

⇒ when using protoype-based clustering, always run them repeatedly and keep the “best” result

note

cluster quality measures or cluster quality indices suggest a form of objectivity that is hardly ever justified

⇒ do not put too much faith in cluster quality measures

question how to choose k in k-means clustering ?

answer if you perform cluster analysis to assist human experts, set k ≈ 7 ± 2 (it simply does not make sense to report to management that their customers can be grouped into 152 clusters)

if you perform cluster analysis to facilitate subsequent computations, determine experimentally whether large or small values lead to better results