we conducted an ACBC for eggs with attributes size of box (6 or 10 eggs), husbandry system (Organic, Free-range or Barn), male chick killing (Yes or No) and price.

After the surveys we calculated the mean of each product by applying a Hierarchical Bayes estimation and using common calculation method (e.g. Miller KM, Hofstetter R, Krohmer H, Zhang ZJ. How Should Consumersâ€™ Willingness to Pay be Measured? An Empirical Comparison of State-of-the-Art Approaches. Journal of Marketing Research. 2011).

Benchmarks for the validity of our study are all fine:

Root likelihood 0.742 (our study had 3 concepts per choice task), pseudo R-Squared 0.606, holdout tasks was predicted correct by out model in 82.6 %, mean absolute error 5.25 %.

However, we did not get reliable and robust results for slight changes of the threshold of the none variable: For an organic carton of 10 eggs without chick killing, estimation and calculation leads to willingness to pay of 4.12 Euro (threshold 3) and -5.2 Euro (threshold

3.75) only when looking at the average. Participants only have a positive average willingness to pay up to a threshold of 3.4.

You have any idea why this result occurs and how we can improve our ACBC to get more valid results?

Thanks in advance,

Felix

Let me first give you some background information: Our final goal is to get the willingness to pay of each individual, which we want to compare. However, as first approach we would also be happy to get a robust average WTP.

Miller et al. first generate a "WTP-residuum" for a product by subtracting the aggregated utilities of the properties of one product from the none value (for one individual). This "WTP-residuum" is converted into a WTP value using the utility of the price. Due to this procedure the none value is - as you wrote - essential for calculating the WTPs and is changing quite significantly depending on how we calculate the none variable.

Regarding your further questions/ suggestions:

- We used RLH method to get rid of inconsistent choices (only 11 of 661 had inconsistent behaviour)

- We constrained the price attribute negatively (also all other attributes as there seems to be no reasoning why e.g. people should have a preference for male chick killing or a smaller box instead a big one for the same price; however, we also played around with these constraints without getting better/ more robust or plausible results)

- Might the problem be that we only used three attributes (size of box, chicken killing and husbandry system) and the price? We observed that none of the participants got a "must-have" or "unacceptable" question even though we implemented these...

- Furthermore, we set the number of screening tasks to 6 - however, the participants always got only 3 screening questions.

For sure, we also used the simulator to acquire better results: The problem is the choice of the exponent, which seems to be quite arbitrary without having real data as comparison. Thus, we applied different exponents between 0.3 and 0.9 to evaluate the predictive quality of our model. Unfortunately this leads to highly differing results without loosing too much explanatory power. For an organic carton of 10 eggs without chick killing they lie between 7.1 Euro (exponent 0.3) and 4.7 Euro (exponent 0.9).

We would be very happy on any further ideas or understanding the mistakes that we have made and how to get a robust and none arbitrary WTP.

Thanks a lot!

Felix