Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

In- and Out of- sample validation results using MAE and Hit Rate

Dear Sawtooth Experts,

The data collection was complete for my study and I wanted to conduct In-Sample and Out-of-Sample validation for my CVA study.

I had 15 CVA questions and 1 test retest reliability CVA question, identical to the first CVA question. I also included 3 hold out choice tasks involving 3 alternative options.

My overall results seem good and highlighted some interesting things.

I did a very simple test retest reliability using SPSS with a pair of CVA task and correlation was significant presenting that the reliability is quite good.

However, I am really concerned about my sample validation.

My sample size is 424, not big.

First, for MAE, I obtained the following results:

(I estimated utility using HB OLS and for RFC/Genetic for simulation and none of my attributes were price related, I applied correlated error for all the attributes used in the study.)

In sample                                                      

39.00%    44.10%    5.10%
29.00%    23.60%    5.40%
32.00%    32.30%    0.30%

Holdout1        3.60%

35.70%    22.60%    13.10%
31.50%    25.50%    6.00%
32.80%    51.90%    19.10%

Holdout2        12.73%
30.10%    20.80%    9.30%
33.50%    54.20%    20.70%
36.40%    25%    11.40%

Holdout3        13.80%

Out of sample
(random split using SPSS, resulted in 207 & 217 samples)

Predicted (n = 217) vs Actual (n=207)

39.10%    43.50%    4.40%
28.70%    21.30%    7.40%
32.20%    35.30%    3.10%

Holdout1        4.97%

34.50%    21.70%    12.80%
32.60%    29%    3.60%
32.90%    49.30%    16.40%

Holdout2        10.93%

29.80%    19.80%    10.00%
33.00%    55.60%    22.60%
37.20%    24.60%    12.60%

Holdout3        15.07%

Predicted (n = 207) vs Actual (n=217)

Holdout1        4.07%        
Holdout2        14.23%        
Holdout3        12.83%

As presented, it seems that Only Holdout 1 seems alright and the validation is quite low for Holdout 2 and 3, if I understood the technical papers and other forum posting. Am I right? Is there a good reference range for MAE percentage? It seems that for CBC studies with 3 alternative products, the value of MAE is quite alright for 4-5. I wonder what could have caused this and what should I tell my audience for my study, rather than saying my conjoint model seems not so successful? I really need your help and advice for figuring this out.

Secondly, although Hit rate is not as good measure as MAE or MSE, I still wanted to check it, so, this is what I did:

For Sawtooth Software choice simulator, with HB OLS utility estimation, I Selected the "First Choice" rule for each scenario using an individual-level utility run (resulting in Individual Results) to get the 1/0 hit rate metric. I exported the result to Excel and compared it with the actual choice for the 3 holdout choice tasks.

Thus, for hit rate, I obtained the following results:
Holdout set 1    156/424        36.79%               
Holdout set 2        165/424        38.92%           
Holdout set 3    191/424             45.05%
The usual percentage for 3-4 product alternatives, 55-75% hit rate is the usual range, and I am quite sure that my range, 36.79% to 45.05% is not good enough.

Given the results of MAE and Hit rate, I am considering reporting this element as a limitation of conjoint portion in my study. That my conjoint model should be interpreted with caution as it has relatively low hit rate and high MAE.

At this time, the data collection is already complete and this is the actual results of my study.

Is there any other advice or suggestion for this issue?

I have looked for many peer reviewed journal articles that used Conjoint analysis, and it seems that most of the researchers omit to report reliability and validity of the conjoint analysis.

However, I wanted to do as much as I can and be transparent with my study.

Any comments are appreciated.  

I really appreciate your valuable time for looking into my very lengthy questions.
asked Aug 9 by anonymous

1 Answer

0 votes
Your results do seem low for predictive validity of holdouts.  In my experience, hit rate prediction of CBC-looking choice tasks with 3 alternatives should be somewhere around 60% to 85%.  

There are many things that could be going wrong and should be reviewed:

1. Engagement level of respondents with the questionnaire: time to complete survey, other consistency checks from questions outside the conjoint survey.  Are there too many "bad" respondents in your sample?
2.  Internal R-squared of CVA tasks (though, I don't know how many degrees of freedom you have in your conjoint analysis regressions, so I don't know if you have enough degrees of freedom to be a reliable indicator of internal consistency)
3.  Data processing errors
4.  Model specification errors (e.g. did you accidentally mark some attributes to have utility constraints in opposite order of actual preferences; specifying a linear utility term instead of part-worth term, when linear isn't justified, etc.)
5.  Errors in setting up holdout scenarios in the market simulator
6.  Errors in calculating hit rate
7.  Maybe the holdout choice task was very utility balanced and thus very hard to predict?

There are many steps that should be reviewed to make sure that data cleaning and analysis is in good shape.   If the study is very important and if you have budget to spend on hiring an outside consultant, it could be worth it to have a capable Sawtooth Software analyst review and redo your work.
answered Aug 12 by Bryan Orme Platinum Sawtooth Software, Inc. (177,015 points)