Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Can I do continuous covariates in and HB model?

Regarding CBC HB and covarates:

I've been told that when running categorical covariates you need to be sure that you have a decent n size in each category, say n=150+. What is your reaction to that?

Additionally, what considerations should be used when running a continuous covariate. A specific use case might be to use $spend as a covariate.
asked Jun 13, 2018 by anonymous

2 Answers

0 votes
I haven't conducted any simulations on this, or seen any research recommending minimum sample size per category of a covariate.  Requiring 150+ respondents per category seems safe for certain, but perhaps a bit overly conservative (perhaps smaller size than 150 per category would be fine).  I personally would be comfortable with n=60 per category.  That' s my opinion at this point, with no firm evidence to offer.

I'd zero-center your continuous covariate, because it makes it easier to interpret the output.  But, first, clean first to eliminate outliers...Winsorize (for example, set all dollar spend above the 95% percentile spend to the 95% spend amount).
answered Jun 13, 2018 by Bryan Orme Platinum Sawtooth Software, Inc. (198,315 points)
0 votes
150 probably would more in line for modeling that relies more heavily on upper level model as opposed to lower level and for more dissimilar categories. For more balanced approach where population is not much bigger a priority, I would say 75 is a minimum (or even 60 if inter-borrowing between respondents from different categories in the covariate .. a) is allowed b) is possible, meaning some similarities are expected.

I tend to think of a situation with categorical covariate as something in between two extreme case scenarios - when you run one single model for all categories with one sample size required or running 2+ separate conjoint models each for each category separately .. each with its own  sample size required.
Covariate lets you run something in between.

In first case you need N.
In the second case you would need X * N where X is the number of categories / models

First extreme case scenario assumes  that within each category you can find respondents with similar pattern of preferences also observed among some respondents from other categories. As if each category would contain the same set of patterns of preferences. The only difference between categories would be in proportions of these patterns observed.
In that case, I would gravitate to minimal sample size boost to accommodate covariate.

The second extreme case scenario would assume that each category would contain the set of absolutely unique patterns of preferences and therefore no inter-borrowing between respondents belonging to different categories is possible. It would be like not having enough people for upper model. Therefore the required sample size would gravitate to doubling or even tripling.

The real life situation with covariate is always in between. You have to think carefully about your covariate, make assumptions to your best knowledge about how people would respond within each category.

For instance, a covariate with categories like City1 vs City2 most likely be closer to the first scenario, therefore I would go with minimal boost to no boost for this.

As opposed to a covariate with categories like Teacher vs Student or Investor vs Financial Advisor who by definition might not share any similarities in what they would prefer in conjoint. Which would be closer to a second extreme case scenario where I could pump up to 50% of sample size not just n=150
answered Jun 13, 2018 by furoley Bronze (885 points)