Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Synthetic Data and Holdouts


I have a question regarding working with computer-generated choices. I am developing a tool which generates individual-level utilities and choices given a set of mean utilities and design, and I want this tool to be able to compute the holdout hit rates (after the choices generated are fed to HB).

My question is whether I should apply a Gumbel-distributed error term to individual level total product utilities when simulating holdout choices on "true" utilities.

I have two thoughts that go in opposite directions:
1. When people answer holdout tasks, the error term is present
2. On the other hand, when we asses the model's accuracy, don't we try to see if HB has restored the true utilities?

Could you please advise me which way I should take?
I sense it would be the first one, just need a good reason to be totally convinced.

Many thanks!
asked Oct 13, 2015 by IGaaa (270 points)

1 Answer

+1 vote
When simulating robotic respondents’ answers to a CBC questionnaire, in order to recover known “true” utilities, one simulates the respondents answering the questionnaire according to known utilities but where the total utility of alternatives within each task is perturbed by independent Gumbel error.  This is true to the logit rule and simulates the idea that respondents answer the calibration questionnaire with error.

When one wants to use the estimated utilities resulting from the choices to the CBC questionnaire to predict new holdout choices, one indeed could simulate discrete choices again using Gumbel error (applied to the total utility of concepts as estimated from the calibration questionnaire).  But, to stabilize the probabilities of choices for holdout tasks at the individual level, one would need to simulate those choices with different draws of Gumbel error 1000s of times (and average the results at the individual level across the draws).  However, we know that the expectation after millions of draws of Gumbel error of the probability of choices for holdout tasks for individuals converges to the logit rule.  So, to shortcut the simulation procedure for holdout choices one can just use the logit rule as a closed form solution.
answered Oct 13, 2015 by Bryan Orme Platinum Sawtooth Software, Inc. (181,240 points)
Thanks so much Bryan!

So, if I get you right, the calibration questionnaire should be simulated once per robotic respondent, both training and holdout choices using Gumbel error. After that, the estimated utilities could be used to simulate the holdouts with only the logit rule, just as we do in market simulators with the Share of Preference simulation method.

Typically, discrete choice answers to the calibration questionnaire are simulated once per robotic respondent...training set only.  Holdout choices are not used to estimate the utilities...they are "held out".  Rather, holdout choices may be simulated using the estimated utilities under the Share of Preference (logit rule) simulation method.
OK, I just thought one could treat the robotic respondents as people, and make them answer the holdouts too. Then, as usual, run estimation without the holdouts. And after that one could compute the individual hit rate rather than comparing the utilities.
Yes, this last statement is correct.  Robotic respondents are treated like people and you simulate their holdout responses too using the same procedure as simulating their choices to the calibration questionnaire.  So, maybe we aren't not understanding one another.