Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

In sample and Out of sample holdouts

Dear Sawtooth professionals,

I have a question about "In sample" and "out of sample" validation.

I have collected the initial of 80 customer panels, using CVA scenarios without holdout sets.

Upon reading the technical paper "how many holdouts for model validation?", something came to me and wondered:

I plan to collect 450 customer panels more using CVA scenarios, this time by adding three holdout choice tasks.

If I use the sample of 80 for "out of sample" validation and plan to use the second phase of data collection, 450 customer panels using exact same CVA scenarios used for initial 80 samples along with 3 holdout sets as "in sample" validation of my model, am I misunderstanding these concepts?

Or, if I had 3 holdout sets from the beginning for both data collection of 80 and 450, would this example make sense?

Other technical paper stated that a different set of questions needs to be asked for "out of sample" holdouts.

Would including 3 fixed holdout sets to the original CVA questions would be sufficient to be considered as a different set of questions?

I really appreciate the response.  Thank you.
asked Jun 18 by anonymous

1 Answer

0 votes
Out of sample holdouts are considered the gold standard for model validation.  Typically these are questions NOT asked of the estimation sample.  It would be a little odd to use the estimation sample to predict to the exact same questions. only from different respondents.  

What you could do, I suppose, with your sample of 530, is divide it into two groups of 265 each, call them Group A and Group B.  Then for Group A, run your model and simulate the responses to the three holdout questions.  Now compare the prediction to the holdouts from Group B.  Then reverse it and use Group B utilities to predict the holdouts for Group A.  

Now you can look at holdout validation as in-sample (the whole sample used to predict the whole sample's 3 fixed holdouts) and as out-of-sample (as described in the previous paragraph).
answered Jun 18 by Keith Chrzan Platinum Sawtooth Software, Inc. (95,775 points)
Thank you so much Keith with your clear explanation.