Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Add an interaction on the basis of 1) Counts or 2) Increase in percent certainty?

 A 2-way interaction theoretically is plausible. However, this yields 154 terms (constructed lists were used for both; a max of 5-6 levels per attribute were brought in per respondent). The expected sample size will be max 600.

This interaction would give a gain in % Certainty by about 3 percentage points, compared to the main effects model. However, I'm not sure if there are precision issues given my sample size and the increased model complexity (i.e., overfitting possibility).

Is that % Certainty gain sufficient (along with the theoretical plausibility) to justify the interaction? Or should I be able to further justify it by examining for more "obvious" observations via raw counts?

If there is concern of overfitting, perhaps a simpler model would be better and any possible interaction could be described via the counts -- is that a better approach?   

asked Jun 18, 2020 by yorkmr (455 points)

1 Answer

0 votes
Be careful about interaction effects discovered in pooled analysis (counts or aggregate logit) for CBC datasets as the increase in fit due to interactions for pooled models will usually not transfer over to HB models.

So, if you are planning on HB estimation for CBC, you should know that we have found that interactions that are statistically significant under aggregate analysis usually tend to go away in HB modeling.  (That indicates that the interactions in pooled analysis were mainly due to unrecognized heterogeneity...but HB explains that heterogeneity.)  To include them in HB modeling would not only make the HB run longer, but often can lead to overfitting.

A few years ago, we did an investigation with I think around 22 different CBC datasets and found that interaction effects only made a decent improvement in holdout validation for about 4 of the datasets.   For this investigation, we used jack-knife sampling to systematically hold out 1 or 2 choice tasks for each respondent, while estimating the HB model with the remaining tasks.  Over and over again...

Lately in our trainings, we've advised our users who intend to use HB for their final models to be very cautious about including interaction terms in their HB models.  Interactions that seem very significant in aggregate logit or counts usually just don't add much value to HB estimation.

To check your CBC dataset for the value of interactions in HB, we've built the "CBC/HB Model Explorer" which is an add-on to our CBC/HB standalone system.  If you are licensed to use our CBC/HB system, then you can install our CBC/HB Model Explorer tool.  It does the jack-knife sampling and calls CBC/HB software repeatedly in batch mode.  It takes usually about 5 to 10 hours to run to investigate interaction terms for a CBC data set.  So, it takes some time commitment

If you are interested, the CBC/HB Model Explorer may be downloaded from the following webpage (scan this page to find it):   https://www.sawtoothsoftware.com/support/downloads/tools-scripts
answered Jun 18, 2020 by Bryan Orme Platinum Sawtooth Software, Inc. (186,965 points)
Thanks, Bryan, I will check out that Model Explorer -- it sounds like the Interaction Search tool in Lighthouse shouldn't be the only go-to...(that's what I used at first to see that adding the interaction would a 3 percentage point gain in % certainty -- but perhaps this is not a meaningful gain in value).   

Also not quite sure what is meant exactly by "interaction effect tends to go away in HB modeling". Looking at the confidence intervals around the interaction terms from HB estimation would suggest that they're statistically significant because the CIs don't cover 0. But then again, those CIs are a frequentist concept.

What should I see in the results if the interaction effects are indeed "gone" when I do HB? Not quite sure what to expect.   

Overall, it sounds like I should prefer the main effects model.