Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Chi Square Test for HB


the supervisor of my final thesis advised me to do a chi-square test for the HB results to check the significance of the attribute levels. However, I read in a technical paper that this is more likely to do a t-test between levels and Bayesian tests. Also, only for the Counts results the Chi-square value is given. So does a Chi square test make sense for the HB results? If so, what is the best way to do the Chi-square test?

Thanks for any help!
asked Nov 26, 2020 by bugsbunny (300 points)

1 Answer

0 votes
I'm confused as well regarding the recommendation for a chi-square test on HB utilities for CBC experiments.  Perhaps you can ask your supervisor for a clarification?

The only connection I could see to Chi-Square is through the log-likelihood fit resulting from the multinomial level model involved in HB.  Twice the log-likelihood is distributed Chi-Square.  But, we have LL for each respondent.  So, for each LL there is a Chi-square test comparing it to the null LL (likelihood resulting from utilities of zero).  Given what I know about HB and resulting LL fit, most respondents will pass the threshold of 95% confidence (significantly different from utilities of zero).

For example, if there are 4 alternatives per choice task, then the null fit is ln(1/4) for each choice task.  We add that across choice tasks within the respondent to calculate total LL for the null model for the respondent.  We could then calculate the likelihood of the chosen tasks for the respondent according to the logit equation.  We could add the natural logs of the likelihoods across tasks for these predicted likelihoods given the HB utilities.  Degrees of freedom would be the number of estimated parameters in the model.  BTW, Sawtooth Software won't calculate these statistics I'm referring to at the individual level from HB.
answered Nov 26, 2020 by Bryan Orme Platinum Sawtooth Software, Inc. (181,340 points)
Thanks a lot for the answer! If I have understood you correctly, I can use the Chi-square test only to check for a significant difference of 0? This would not really make sense in my survey design, because I have three attribute leves and because of the coding the mean value tends to be very close to 0. I was primarily interested in a significant difference between the respective attribute levels.
If you are interested in testing the differences between attribute levels within the same attribute, then you're back to either the Bayesian tests or the Frequentist t tests.  There is a Chi-Square test based on aggregate counts across respondents (as you know), but this is not a strong test and it is not based on HB analysis.  And, attributes (such as brand and color) for which respondents can differ in preference, can lead to results where an attribute is not seen as significant in the aggregate, even though respondents may find them quite significant in terms of driving their choices.
Hello, I am also interested in the question right now. In my study I have created an CBC with decisions regarding T-Shirts (price, design, sustainability, quality,..) and I now would like to test if there are significant differences between some groups (age, gender, purchase frequency). How can I do this best? Where can I find the Bayesian test oder the t test in Lighthouse, you talked about? Thanks a lot in advance!
The easiest way is to use standard Frequentist t-tests or f-tests for comparing groups of people on their zero-centered diffs normalized utility scores as estimated via HB.  Is that going to suit your needs?  Or, do you really need a formal Bayesian test of differences between groups of respondents?
Thank you! The F- and t-test fits my needs.
I am a bit irritated about the interpretation afterwards right now.
I did an Levene F-Test to check if the Variance is different in the three groups regarding their preferences.
For some of my items it is significant (like price and style) and for others it is not significant. So is the interpretation right, when I would say "regarding price and style the groups have different preferences. Regarding the other items the groups do not differ significantly ." Or is it just the meaning of "which item is the most important?" instead of "they like the level bio cotton more than normal cotton." Or is there the t test better?
And can I use the normal HB data, to say where the preferences differ, after I said they significantly differ? Or is there another subsequent test?
When I have a hypothesis which says "for all people the attributes style, quality and price are the most important attributes". Which test can I do for this? I see it in the normal HB data, but its always about significance results, isnt it?

Thank you!
I tend to dislike trying to make big conclusions about importance scores that are supposed to have some sociological or business meaning--because the importance of the attributes can be determined by the researcher by how broad of range of levels one includes for each attribute.  Thus, you could have potentially made the research turn out any way you wanted it to be (in terms of attribute importance) via your choice of the levels to include in the experiment.

I'm not aware of a test that would indicate whether three specific attributes were the three most important attributes for all people.   Though, I suppose I could strain a bit and think of Bayesian tests of beta draws (after convergence) at the individual level.  If we counted across the last 10000 draws (after convergence) for a respondent and found for at least 95% of the draws the three attributes you indicate are not exceeded in importance by any other attributes, then I think you would have such a test.  This test would have to be at the individual level.  Then, you could summarize something like:  "For 88% of the respondents, these three attributes were the most important attributes at or exceeding the 95% confidence level".  But, this is difficult for many to comprehend what you are meaning...so you'd have to carefully describe what kind of individual-level test you were doing on the HB draws.