Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

HB estimation settings & model validity for ACBC

Dear community,

I am struggling a bit with finding the right settings for my ACBC/HB estimation. I read quite a good number of forum posts regarding this but still feel unsure. Highly appreciate any input.

Background: Academic study. Holdout tasks not used.

1.) Prior degrees of freedom and prior variance.
In the paper "What Are the Optimal HB Priors Settings for CBC and MaxDiff Studies?" there is a recommended list of settings depending on sample size and # of attributes. Since that's a general recommendation, I am not quite sure whether I should apply those settings or keep the default ones? e.g. if having 8 attributes, I would take a prior variance of 0.3 which is quite different compared to the default. Changing prior variance and degrees of freedom affects the results (rel. importances and part-worth utilities) so I am quite cautious about which values I take for both.

2.) # of iterations
Are 20k+20k sufficient or should I change this setting?

3.) Estimate Task-Specific Scale Factors (Otter's Method) should be activated for ACBC, right?

4.) Internal validity.
In my ACBC, I do not intend to include holdout tasks to ensure that the survey does not get too overwhelming for respondents (feedback so far is that it is quite complex, so I'd better not give them additional burden with holdout tasks). This means I can't use hit rates or mean absolute error to tell how good the model is.

Taking into consideration that I don't use holdout tasks and that I don't have real-work reference figures, what can I use to state how good the internal validity is? There are factors like McFadden pseudo R2 , RLH and Chi-Square but I don't know how to interpret them.

In one study I read that the internal consistency was checked by how much BYO-selections coincided with the choice tournament's winining concept. Is that a good method? And what would be a threshold for acceptable consistency? Does e.g. an average 70% match between BYO concept and winning concept speak for a good consistency?

5.) How do I interpret Pct. Cert., RLH and Avg Variance?
I feel it is not sufficient to just say "RLH is >0.33" when having 3 concepts per choice task. What is the threshold to declare RLH good or not-so-good? Same for Pct. Cert and Avg Variance.

6.) Comparing results depending on different demographic groups
The "normal" procedure would be to export zero-centered values of all data and then run e.g. a t-test in SPSS, correct?
I am unsure because I read somewhere that demographic groups can also be taken directly into the HB estimation? That seems very complex for me.
asked Feb 22 by danny Bronze (1,260 points)

1 Answer

0 votes
1.  That research that Walt and I did only applies to MaxDiff and CBC, not ACBC.  ACBC is a different animal because it involves three types of choice tasks, appended: BYO, Screener, and Tournament Choices.  My opinion is that typical ACBC studies have much more information at the individual level than similar CBC studies; therefore, higher prior variance would be justified than a similar dimensioned study in CBC.  Prior DF probably should follow the recommendations in our article related to sample size.  So, I'd be inclined to use prior variance of 1.0 for typical ACBC studies, and Degrees of Freedom depending on sample size as reported in our article.

2.  For typical ACBC studies and when estimating main effects only, 20K + 20K should be sufficient.  But, it wouldn't hurt to increase both of those settings if you have the time.

3.  As long as sample size is above, perhaps, n=200 I'd recommend Otter's method.

4.  I wouldn't compare stated BYO with the final "winning" concept, because the winning concept can come from any new concept generated in the near-neighbor design which if using summed pricing (I'm assuming you have summed pricing) can lead to even better concepts than the BYO due to random shocks lower in price.

5.  Pct. Cert. is a pseudo R-squared.  It indicates how much better the fit HB's MNL model achieves compared to random (uninformed) choices.  Pct Cert of 0 means fit equal to uninformed random choices.  100% means perfect predictions and no respondent error.  RLH is root likelihood and is challenging to interpret for ACBC studies, since the RLH depends on the number of concepts in each choice task...and ACBC represents a mixture of BYO, Screener, and Tournament choice tasks which each can have different numbers of alternatives per choice task.

6.  The Frequentist approach indeed would be to run t-tests or F-tests in SPSS based on groups and zero-centered diffs.  There are Bayesian approaches to statistical testing between groups that would involve running HB with covariates within our software.
answered Feb 22 by Bryan Orme Platinum Sawtooth Software, Inc. (184,140 points)
Thanks a lot, Bryan! Just a few follow-ups:

4. There is no summed pricing in the study (no price attribute at all). So I guess it may make sense to compare BYO with winning concept?

As this fits the topic: In the Test Design feature of ACBC I can see that standard errors of all attribute levels are <0.05 for a particular sample size. In the ongoing study I have less respondents as simulated and I want to check how "far" I am from the targeted <0.05 mark for all levels. The approach would be to simply calculate standard errors of the attribute levels based on RAW data, not zero-centered, correct? I just want to check if I can end my survey with a smaller sample size so I am waiting to the point where all SEs are <0.05
I suppose it could make sense to compare the BYO concept to the Winning concept, though I don't know what I'd be looking for and what I could conclude.  The Winning concept depends on what occurred within the algorithm in generating the "near-neighbor" design for each respondent.  There are probably 1000s or millions of potential near-neighbor concepts that could have been generated for each respondent (depending on the number of attributes in your study).  The Winning concept for a respondent is the near-neighbor concepts that won in the choice tournament.  

In reality, the strategy of conducting a choice tournament where winners move onto later tasks is intended to just capture efficient and meaningful tradeoffs for main effects utility estimation under the logit rule rather than the specific aim to learn which of the near-neighbor concepts is best for a given respondent.