Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Clean data set by respondents with inconsistent answers to identical holdout questions

Dear Sawtooth team,

I am wondering if I should remove respondents, who provided inconsistent answers to my identical holdout tasks (of a total of 1300 respondents approx. 380 respondents provided inconsistent answers).

I have run a HB analysis with
(a) data set reduced by fastest respondents (=time needed for the entire survey not only CBC)
Pct. Cert.: Current: 0.755 , Average: 0.759
Fit statistics: Current: 0.734, Average: 0.738
Avg Variance: Current: 3.384, Average: 3.474
Parameter RMS: Current: 3.110, Average: 3.095
(b) data set reduced by fastest respondents and respondents who inconsistantly answered identical holdout taks:
Pct. Cert.: Current: 0.761, Average: 0.763
Fit statistics: Current: 0.740, Average: 0.741
Avg Variance: Current: 3.581, Average: 3.447
Parameter RMS: Current: 3.070, Average: 3.094
(c) data set reduced by fastest respondents and respondents who answered more than 1 choice task in 4 or less seconds:
Pct. Cert.: Current: 0.7666, Average: 0.765
Fit statistics: Current: 0.744, Average: 0.744
Avg Variance: Current: 3.324, Average: 3.324
Parameter RMS: Current: 3.107, Average: 3.107
(d) data set reduced by fastest respondents, respondents who answered more than 1 choice task in 4 or less seconds and respondents who inconsistantly answered identical holdout taks:
Pct. Cert.: Current: 0.756, Average: 0.755
Fit statistics: Current: 0.735, Average: 0.734
Avg Variance: Current: 2.912, Average: 2.970
Parameter RMS: Current: 2.903, Average: 2.923

Do you compare rather current or average values?

Based on the indicators for model fit, I would assume to not reduce my sample by the respondents, who answered inconsitantly?

How would you use the holdout task in your analysis (I included three, two of them being the same)?

Thank you very much in advance!

Best regards
asked Jan 24, 2020 by Mel (250 points)

1 Answer

+1 vote
Hi, Mel.

We typically don't look at the RLH statistic in aggregate for determining which respondents might be inconsistent responders.  We do look at it at the respondent level, and we have a recommended way of doing so described on our LinkedIn Sawtooth Software User's Group, here:  https://www.linkedin.com/pulse/identifying-consistency-cutoffs-identify-bad-respondents-orme/?trackingId=pI%2FkvpPoo65MNbNIuYtwsQ%3D%3D

I agree that you may not want to use your holdouts  to establish inconsistency.  A certain amount of inconsistency is assumed by the logit model and if your two identical holdout questions use product concepts that happen to have similar total utilities, we would expect more inconsistencies rather than fewer.   Again, the LinkedIn article noted above is a better way to measure respondent inconsistency.  

I do pay attention to the Percent Certainty as a measure of aggregate fit and based on that result as you report above, I also wouldn't eliminate the respondents who answered your two identical holdouts inconsistently.
answered Jan 24, 2020 by Keith Chrzan Platinum Sawtooth Software, Inc. (102,700 points)
In the mean while I have conducted the simulation.
Holdout question one (as I asked this exact holdout question twice I took the average share here):
actual shares:
Concept 1: 4.3%
Concept 2: 16.4%
Concept 3: 2.9%
None: 76.5%

Simulation shares:
Concept 1: 5.8%, Std Err 0.3%, CI (5.3%, 6.4%)
Concept 2: 14.9%, Std Err 0.6%, CI (13.8%, 16.1%)
Concept 3: 3.4%, Std Err 0.3%, CI (2.9%, 3.9%)
None: 75.8%, Std Err 0.9%, CI (74.0%, 77.6%)

I have done the same for holdoutquestion two.

(1) Both, predictions for Concept 1 and 2 do not lie in the CI. Would you however say that the model is good enough? Or would you continue to try to improve the prediction accuracy for the holdouts by adding f. ex. interactions or further covariates to the HB analysis.  

(2) In this case, how would you calculate the mean absolute error? Is the following approach correct?
MAE=  (1.5%+1.5%+0.5%+0.7%)/4=1.05

Thank you very much in advance!
Mel, your MAE calculation is correct, and that's the statistic folks usually look at when they use holdouts to assess the quality of the model.  I would not be surprised or concerned to see actual shares fall outside of the predicted confidence intervals.
Great! Thanks for the super fast reply!
And would you say a MAE of 1.05 is good?
If you had a larger number of holdouts, you would see that some of them have larger MAE and some smaller.  1.05% is good, but it's only one observation and without having seen a MAE based on a larger number of holdouts, it's difficult to get too excited about it.  But to answer your question, yes, all by itself MAE of 1% is pretty good (though a MAE of 1% based on 10 holdout questions would have been more impressive).
Thanks a lot, Keith! :)