Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Internal validity measures for ACBC

Hey everyone,

first of all thanks for the great software and also for this forum which really is of great help. Now I have a request myself.

In one sentence: Which measure do I use to assess the internal validity of my HB model from ACBC? :)

I know how to assess face validity and predictive validity (via holdout sets) but am unsure about internal validity.

1) Is internal validity actually provided by the goodness-of-fit measures of ACBC (i.e. RLH, Pct. Cert.)?

1.1) Is the RLH/ Pct. Cert. actually a similar measure as the (adjusted) R-squared of e.g. a CVA?

2) As one gets these measures for every single iteration, which final figure do I use? The average that is shown at the end of the estimation? Or just the figure from the last iteration?

Thanks so much in advance! I appreciate it very much.
asked Feb 22, 2017 by FeSch91 (120 points)

1 Answer

+1 vote
Regarding ACBC, internal validity measure using the internal fit from HB is not as clear-cut as for CBC or CVA.

With CBC, respondents are always shown the same number of concepts per task.  Let's say it's 4 concepts plus an additional None concept.  That's 5 possible choices per question.  CBC/HB reports root likelihood and percent certainty.  With 5 concepts per task, the null root likelihood (assuming random utilities fitting the data) is 1/5=0.2.  Percent Certainty is a psuedo R-squared that refers to how good the utilities fit the data relative to the null (naive) solution.  

With CVA under OLS estimation, you get an R-squared which reports what percent of the dependent variable is explained by the regression weights applied to the independent variables.

But, with ACBC, the logistic regression is a mixture of choice sets with differing number of concepts per task:

1.  For the BYO section, if an attribute has 6 levels, then it is coded as a choice of a single level among six alternatives.  If an attribute has 2 levels, it's coded as a choice among two alternatives.  Further complicating matters, if you use constructed lists per attribute (to only bring the relevant levels forward to the BYO questions), then the number of concepts per BYO "task" can be different across people.
2.  For the Screeners, each choice is a binary choice (a possibility or not a possibility).
3.  For the Choice Tournament, each choice is typically a choice among three alternatives (though the software allows you to ask this section as pairs rather than triples).  And, the number of choice tasks is customized to the respondent depending on how many concepts are marked "a possibility" from the Screener section.

And, the response error is different across the three sections.

So, you can see that internal fit statistics are no longer so clear to interpret for ACBC!  The number of concepts per task and number of tasks per section can vary depending on the respondent's previous answers!  This affects the scaling of the RLH.
answered Feb 22, 2017 by Bryan Orme Platinum Sawtooth Software, Inc. (175,415 points)
Thanks a lot for the quick and extensive reply, Bryan! Just a few remarks:

-Regarding RLH: Without a lot of effort put into computing the null solution, it is hard to interpret for ACBC.
- Regarding Percent Certainty: This could be interpreted because it is already relative to a null (naive) solution?
- Which Percent Certainty would I use? The average across all iterations?

Or would you even say that because of the high computational effort and the probably low additional information provided by these measures, predictive validity is enough to assess the goodness-of-fit of my model (I am not comparing this to other models)?
Thanks in advance!
RLH is hard to intepret as output by the software since the number of concepts per choice set differs across respondents.

Percent Certainty takes into account the number of concepts shown per person and per set.  But, the mixture of different choice contexts (BYO, Screeners, and Tournament Tasks) with different numbers of tasks in each section per person, makes it difficult to use Percent Certainty as a reliable indicator of internal consistency across respondents.

So, my best recommendation is to add CBC-looking holdout tasks to each respondent's questionnaire (prior to collecting the data, obviously).  Compute holdout hit rates within each individual (the likelihood that the ACBC utilities can predict each respondent's holdout choices).
Perfect, thanks!
I calculated hit rates with holdout tasks already and they look quite well. Therefore, it should be enough for the purpose of my study.
Glad you've got those!  Good work!
Hi everyone,

i am currently analyzing ACBC-data and unfortunately, i did not add CBC-looking holdout tasks to each respondent's questionnaire.

Is there an alternative for measuring face validity and predictive validity without any holdout tasks?

Thank you very much in advance!
All the best,
Face validity to me means that the utilities make sense from a rational, managerially-informed opinion.  So, expert judgment could be used to say if the results have face validity.

Perhaps the highest form of predictive validity is the ability of the market simulator from conjoint analysis to predict actual market shares.  However, for this to work means that the assumptions of conjoint analysis (including equal awareness, equal distribution, equal time of the market to maturity, equal effectiveness of sales force, among other things) needs to hold in the real marketplace.