Why our results are strongly changing if we change the threshold for the none variable in ACBC?

Hey there,
we conducted an ACBC for eggs with attributes size of box (6 or 10 eggs), husbandry system (Organic, Free-range or Barn), male chick killing (Yes or No) and price.
After the surveys we calculated the mean of each product by applying a Hierarchical Bayes estimation and using common calculation method (e.g. Miller KM, Hofstetter R, Krohmer H, Zhang ZJ. How Should Consumersâ€™ Willingness to Pay be Measured? An Empirical Comparison of State-of-the-Art Approaches. Journal of Marketing Research. 2011).
Benchmarks for the validity of our study are all fine:
Root likelihood 0.742 (our study had 3 concepts per choice task), pseudo R-Squared 0.606,  holdout tasks was predicted correct by out model in 82.6 %, mean absolute error  5.25 %.
However, we did not get reliable and robust results for slight changes of the threshold of the none variable: For an organic carton of 10 eggs without chick killing, estimation and calculation leads to willingness to pay of 4.12 Euro (threshold 3) and -5.2 Euro (threshold
3.75) only when looking at the average. Participants only have a positive average willingness to pay up to a threshold of 3.4.
You have any idea why this result occurs and how we can improve our ACBC to get more valid results?
Felix

+1 vote
There are numerous questions that come up for me.  First, I don't know that paper by Miller et al., so I don't know what method it uses for estimating WTP.

Next, if you are using our ACBC software to calibrate the None utility per the threshold on the 5-pt scale, then it only changes the utility of the None parameter.  The other utilities remain unchanged.  So, you'll want to investigate what the Miller et al. WTP method does with respect to the None parameter.

Other questions come up in my mind such as data cleaning for respondents who are not consistent in their ACBC choices.  Also, I wonder whether you have implemented monotonicity constraints to constrain price to be negative sloping.  If you have too many "bad" respondents in the dataset, it can make WTP estimation not robust.

Everything I know about ACBC suggests that when done well, the WTP results should be robust.  So, there are so many questions surrounding what you have done.

You may or may not know that we have built in an automatic WTP estimation routine into our ACBC software's simulator (the Lighthouse Studio Simulator).

And you can search the help in Lighthouse Studio for "Willingness to Pay" or "WTP".
answered Nov 18, 2021 by Platinum (198,315 points)

Let me first give you some background information: Our final goal is to get the willingness to pay of each individual, which we want to compare. However, as first approach we would also be happy to get a robust average WTP.

Miller et al. first generate a "WTP-residuum" for a product by subtracting the aggregated utilities of the properties of one product from the none value (for one individual).  This "WTP-residuum" is converted into a WTP value using the utility of the price. Due to this procedure the none value is - as you wrote - essential for calculating the WTPs and is changing quite significantly depending on how we calculate the none variable.

- We used RLH method to get rid of inconsistent choices (only 11 of 661 had inconsistent behaviour)
- We constrained the price attribute negatively (also all other attributes as there seems to be no reasoning why e.g. people should have a preference for male chick killing or a smaller box instead a big one for the same price; however, we also played around with these constraints without getting better/ more robust or plausible results)
- Might the problem be that we only used three attributes (size of box, chicken killing and husbandry system) and the price? We observed that none of the participants got a "must-have" or "unacceptable" question even though we implemented these...
- Furthermore, we set the number of screening tasks to 6 - however, the participants always got only 3 screening questions.

For sure, we also used the simulator to acquire better results: The problem is the choice of the exponent, which seems to be quite arbitrary without having real data as comparison. Thus, we applied different exponents between 0.3 and 0.9 to evaluate the predictive quality of our model. Unfortunately this leads to highly differing results without loosing too much explanatory power. For an organic carton of 10 eggs without chick killing they lie between 7.1 Euro (exponent 0.3) and 4.7 Euro (exponent 0.9).

We would be very happy on any further ideas or understanding the mistakes that we have made and how to get a robust and none arbitrary WTP.
Thanks a lot!
Felix
ACBC isn't typically used for just 4 attributes (typical 6 or more attributes are used), but as long as you use appropriate settings, it should work out.  I've done an ACBC with 4 attributes before, and it seemed to work out fine.

Only 11 out of 661 responders as randoms seems low, in our experience.  We often find 10% to 30% of the sample answer randomly.  I'm assuming you ran a few hundred random respondents (using the data generator) through and examined the distribution of RLH (from HB estimation) for random responders.  That works well as long as the conjoint experiment is well powered enough to distinguish random responders from real.  In terms of an ACBC, that means having a reasonable number of "near-neighbor concepts" for each respondent compared to the number of parameters to estimate.  You had such a tiny experiment with so few parameters to estimate that it seems unlikely this wasn't the case.

Also, there are questions about whether you used a BYO or whether you just skipped the BYO.  I'm assuming you skipped the BYO section, which I probably would have done in your case, given the tiny design.

My understanding is that your design was tiny: 3 attributes each with 2 levels and another attribute for price.  That's usually something that would call upon a CBC study rather than an ACBC...ACBC could be overkill.    Again, depending on your ACBC settings, it could work OK for ACBC.  It's just that ACBC wouldn't be the natural choice to do given such a small conjoint study.

So, the main concern I have is the level of engagement and realism the respondents gave you.  As a gut-check, I sometimes take the survey myself four or five times, answering realistically and with WTP in my mind.  I'll take the survey while telling myself that I'm only willing to spend a certain amount more for certain enhanced features...and then I analyze the results just for my n=5.  I run the WTP analysis and see what the Sawtooth Software WTP analysis tells me (using default settings for WTP with the five randomly drawn competitors, using Sampling of Scenarios).  Doing that exercise can give me confidence that the survey and the analysis are working properly, given respondents who are answering in realistic, known ways.

I wish I could pinpoint what is going on for you, that you are getting WTP values that are much greater than expected.  I hope some of my thoughts above point you in a good direction.