Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

ACBC: HB or monotone regression for n=9?

Hello dear forum members,

I'm conducting an ACBC (n=9) with 8 attributes and 2-4 levels each. I'm not to confident about how to estimate partworth utilities, since the  sample size seems to be right on the edge between using monotone regression or HB. Do you have any recommendations? HB seems to be provide slightly better Hit Rates according to the Help manual, however I'd like to hear your opinion on that. Could I simply use both methods and compare them afterwards, or do I have to decide prior to actually conducting the survey?

Concerning Holdout Tasks: Does it make sense to include them after my ACBC tasks to check validity? Whenever I test my design, MAE are very high. Since I'm conducting the ACBC for academic purposes, I'd like to include another criterion to check validity.

Thank you for your Support!
asked Mar 29, 2018 by Tim Geyer (270 points)

1 Answer

0 votes
Best answer
I'm assuming you are dealing with an extremely small population size, such that n=9 is a census, so there would be no sampling error involved with your experiment.  In that case, you can ignore sampling error, because you are surveying every member of the population.  

I believe ACBC can do a very good job for understanding and predicting choices for individuals, one at a time (but, make sure each level of each attribute appears at least 3x and preferably 4 or 5 times, given your n=9 experiment).  Per the documentation you read, we have broken up ACBC data sets into as small as groups of 9 for running HB and have found reasonably good results (for those subsets of 9 people) compared to when running it using the entire sample size.  It's obviously better to estimate the models where each individual can benefit from a more stable upper-level model (from a larger population), but it still works reasonably well when the sample sizes are really amazingly small.  HB is robust.

Given your tiny sample size and the assumptions above, I'd use HB estimation.  Furthermore, to improve the resolution of the part-worth utilities, I would ask respondents (outside the ACBC survey) to fully rank the levels for any attribute you don't know ahead of time with certainty the order of preference (for unordered attributes such as brand, style, and color).  That way, you can impose individual-level (customized) utility constraints on the final HB utilities using those outside ranking questions.  The manual describes how to do this in the Lighthouse Studio section (use the search function to find it) entitled "Utility Constraints, Including Customized Constraints".  You'll be paying special attention to the section entitled, "Customized Constraints".

It's very strange to think about conducting a predictive validation with just 9 people.  naturally, respondents are not perfectly consistent.  For example, a good respondent will only be able to answer with 75 to 85% consistency two identical CBC questions each involving 3 or 4 alternatives if they are separated by other CBC questions.  So, given the natural variability involved with humans and your tiny 9-person sample size, I just don't see how you obtain a very robust read on validity.

Internal consistency (ability to replicate or predict similar conjoint questions answered by the same respondents) is a different thing from external validity, which usually involves at least making predictions for out-of-sample respondents (different group of respondents than who were used to estimate the utilities).  And, the highest standard for predictive validity would be to predict people's choices to real world choices or purchases, rather than to predict to answers from a questionnaire.  

If you could observe those same 9 respondents doing some purchase or choice among alternatives in the real world and compare the predictions from the ACBC exercise for those same events (involving the same attributes and levels as involved in the real world decision) and if you found that 8 or 9 out of 9 respondents were predicted correctly, then that might be impressive...and well exceed the null (random) prediction rate.  But, these conditions are rare.
answered Mar 29, 2018 by Bryan Orme Platinum Sawtooth Software, Inc. (201,565 points)
selected Mar 29, 2018 by Tim Geyer
Thanks for you quick and helpful reply, I'll probably go with HB then. As you assumed, I won't be able to check external validity - however is there any possibility to provide insights about internal consistency?
Well, ACBC will give you an internal fit score.  But, it's challenging to interpret it (other than knowing that higher represents better consistency) because ACBC mushes multiple formatted tasks together in the HB-logit estimation.  Some tasks are choices among triples, some are pairwise comparisons versus an acceptability threshold, and some are BYO choices of a single level among multiple levels of an attribute.  But, at least you could use the internal fit statistic to judge something about the relative consistency of one respondent versus another.
When using ranking questions as an input for customized constraints, do the attribute levels have to contain the "+XX €" similar to the BYO section?  I'd like to use BYO choices as inputs for customized constraints for attributes with two levels and ask separate questions other attributes (since I can't really use any global constraints).
When using "summed pricing" (where price premiums are attached to certain levels and shown in the BYO section of ACBC to respondents) AND if you are wanting to use customized constraints (constraining each respondent's part-worth utilities to follow a customized order for that respondent, such as for unordered attributes like brand, color, or style) then you should ask a separate question that isolates the attribute levels without the price changes.  You should not use the answers to the BYO questions, because the BYO questions asked respondents to tell us the best level of each attribute given price differences among levels, rather than isolating the preference for the levels keeping price constant.

For two-level attributes and customized constraints, with a separate question (separate from the BYO) you can just show the two levels (tell the respondent that they are to consider that the only difference between products containing these levels is the difference shown, and that the respondent should assume the price is the same).
That makes sense, thank you. In which way would I ask my question? I know that I have to make sure that preferred levels receive higher numbers in constraints. Does that mean that the respondent will have to rank levels from least preferred to most preferred in order to assign higher numbers to more preferred levels?
Indeed, when you actually do the HB estimation and point the "customized constraints" dialog at the specific variables in your study that contain the rankings, higher values need to correspond to higher preference.  So, if you collect the data as ranking or select or whatever, just make sure you look at the data in the data management dialog area.  Make sure higher number is for higher preference, and missing is for no information.  If this isn't how you collected the data, no problem, you can add a new variable to the table with new values transformed to how you want the constraints to go, and then just point the HB estimation dialog at the new variable you added to the data table.
That's perfect. Thank you for your patience!
I collected the data now. Which criterion would you use to decide whether to use estimated partworth utilities with or without individual constraints? Should I rely on average RLH or Pct. Cert. (both slightly higher with constraints) or on avg. variance (much lower without constraints)? Is there any rule of thumb on which method has a better "goodness of fit"?
If you have the luxury of large sample size and >4 holdout choice tasks, then you can test whether utility estimation with or without constraints provides better fit to the holdout data.  You are dealing with small sample sizes, so this isn't going to give you enough statistical power to discriminate between models.

Usually, the internal fit statistic is harmed by imposing constraints (and this is expected, because under constraints, the utilities cannot shift around wherever they want to try to improve fit to the choices).  But, just because the fit goes down doesn't mean the utilities are worse!  They could actually become more reasonable and have better predictive power even though you slightly degraded the internal fit to the tasks used in estimating the utilities by imposing utility constraints.