Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

CBC LCA groups specifications


I am running a CBC Latent Class Analysis and the results suggest that probably 2 groups suit the data.

Now I want to know which characters in the respondents make them significantly more reliable to be in group1 over group2 or the opposite. Also on the other hands the characteristics that are not significant OR in other words if a certain characteristic affects significantly the probability of joining one group over the other. How can I go about finding these characteristic differences?

asked May 15, 2020 by AMYN Bronze (3,000 points)

1 Answer

+1 vote
Best answer
If you want to find out what characteristics, outside the conjoint analysis, best predict whether they're in group 1 or 2 you can run a supervised data mining model.  

First, use the discrete segment assignment variable that's output by the LC software (not the probabilistic assignment to each segment).  Now use that variable as the dependent variable and run a discriminant analysis using other variables to predict the group membership variable.  Alternatively, use that segment assignment as the dependent variable in a logit model and use other variables as predictors.  Alternatively, predict the dependent variable using a tree-based model (I use R for this, as it has several excellent tree models available - for your case I would probably start with rpart and build a classification tree).  I usually try all three of these methods to help me identify the significant predictors.  In my experience I probably end up with discrim as the best predictive model half the time, logit maybe 10% of the time and a tree the remaining 40% of the time.   

If you wanted to get really fancy you could use the % likelihood of each respondent being in each model and use an allocation-based logit model to find the best predictors (I do this very infrequently - it's a bit more work and I haven't seen better quality results).
answered May 15, 2020 by Keith Chrzan Platinum Sawtooth Software, Inc. (99,300 points)
selected May 16, 2020 by AMYN
Thank you, Keith, for the detailed answer.

First, I wanted to confirm that the variable you are referring to is the one reported in the file [studyname_segment_membership] as a discrete variable & named "# group membership" reported as 1,2,3...etc depending on max # of groups studied.  

Next, It is disappointing to see that logit produces only 10% of the possible best solution as it is the one I am most familiar with and probably can conduct using R or STATA. Having said that, which package in R can I use for the Discriminant Analysis or in STATA? Do you recommend a particular package or command?. The tree solution sounds a bit complicated from where I stand.

Thanks again, and have a great day.

Yes, confirmed that that is the group membership variable you want to use.  

If you mean the pct certainty (or rho-squared) is only 0.10, that's not too surprising - logit is an aggregate model and this could merely indicate that you have a lot of heterogeneity and that respondent level models (from HB) or class-level models (from LC) would improve the fit quite a bit.  

I've found a couple of discrim routines in R but none that I think are very complete or that automate the process especially well.  I think the one I use most in r is the linear discriminant analysis program called lda that's in the MASS package, and a nice introduction to it is here:  http://www.sthda.com/english/articles/36-classification-methods-essentials/146-discriminant-analysis-essentials-in-r/#linear-discriminant-analysis---lda.

But I typically run my discrims in SPSS or in Systat and I'm not a Stata user.
Here's another nice reference about discrim in the MASS package, from QuickR:  https://www.statmethods.net/advstats/discriminant.html
Thank you very much for your valuable help and advice