Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Demographics in CCEA clustering

Is it possible to include demographic variables (e.g. gender, age) into the CCEA clustering? I was not able to find an option to upload a respective .csv file. It is possible to upload a custom ensemble file but I think this is another topic altogether.
asked Apr 29 by danny Bronze (1,260 points)

1 Answer

0 votes
Sure, it's possible to put demographic variables like gender or age into k-means clustering, by creating dummy-coding or indicator coding versions of categorical variables.  But, it probably isn't advisable or statistically robust.  K-means works better when all the variables are metric.

Keith Chrzan wrote a nice blog post on segmentation: https://sawtoothsoftware.com/resources/blog/posts/segmentation-how-to-do-it-badly-and-well

In that article, he says, "...when you want to segment on a mix of metric data and ordered or unordered categorical data you can use Latent Gold’s latent class clustering (unlike the latent class packages available in R, Latent’ Gold’s package allows basis variables of mixed scale types).  If you want to stay within a cluster analysis framework, PAM also allows mixtures of disparate variable types."

All that said, if you did want to take a crack at leveraging CCEA's ensembles technique, then you could let CCEA create its standard ensembles file (a .csv file it automatically outputs).  Then, you could think about adding new columns (custom solutions) to that file representing your demographic variables.  Then, tell CCEA to use that customized ensemble file of segmentations to come up with a consensus solution.  If your demographics variables are in there as additional columns of segmentation solutions, then they will have at least a modest impact on the consensus segmentation solution.  I think this approach is somewhat "clugey" compared to the more sound Latent Class (via Latent Gold) or PAM solutions for this.
answered Apr 29 by Bryan Orme Platinum Sawtooth Software, Inc. (184,340 points)