(This functionality is available for both CBC and ACBC analysis.)
Introduction to Interaction Effects
Most conjoint analysis studies over the last 30 years have assumed a simple additive model, where the total utility of a product concept was simply equal to the sum of its separate part-worth utilities. This approach has tended to work quite well in practice, especially for conjoint studies modeled using individual-level estimation. However, there are many cases when two or more levels (from different attributes) combining have a different utility for people than the sum of their separate parts. For example, most people like cheese; most people like chocolate--but very few people like chocolate-covered cheese!
HB modeling to estimate individual-level part-worths for CBC and ACBC has improved the predictive ability of market simulators. The vast majority of the interaction effects that appear significant under aggregate (pooled) logit estimation are largely due to unrecognized heterogeneity. As an illustration, an interaction effect between “premium brand” and “price” may owe to the fact that the same respondents who strongly prefer premium brands are also less price sensitive. Once the main effects for brands and prices are captured within individual (e.g. via HB estimation), the interaction effect that appears so significant under pooled estimation usually nearly disappear once we’ve captured heterogeneity.
This benefit of individual-level modeling has been a good thing, but researchers should still be on the lookout for potential interaction effects that could be useful under HB modeling. That said, we recently undertook a substantial investigation of about two-dozen CBC datasets, looking for interaction effects that could significantly lift holdout prediction hit rates. We only found one or two CBC datasets that seemed to benefit much from inclusion of interaction effects.
CBC provides a counting report that quickly summarizes (via the Chi-Square statistic) which interaction effects are significant. ACBC, however, uses an adaptive design strategy that doesn't make it possible to use counting analysis and Chi-Square statistics to test for interactions. Although the adaptive ACBC designs do a fine job of supporting the precise estimation of interaction effects, most ACBC users don't take the time to investigate their inclusion in utility estimation models. This may be due to a lack of knowledge, but it is most likely due to a lack of time. We hope to solve both of these problems by including this automated Interaction Search Tool within the Lighthouse Studio interface.
The automated Interaction Search tool makes it very easy to test all potential 2-way interaction effects in your CBC or ACBC data. It does this using the 2-log likelihood (2LL) test, which is a more sensitive statistical test than the Chi-Square statistic from counting analysis.
We should note that the 2LL tests computed using pooled logit do not involve the same model that most users end up estimating (HB). But until HB estimation becomes faster, for many data sets it just not feasible to use HB estimation to test for interaction effects. The findings of the 2LL tests from aggregate logit might be a useful guide for considering which interaction effects could possibly perform well within HB estimation.
Warning: including too many interaction effects can lead to overfitting, convergence problems, and extremely long run times. Be prudent and highly selective regarding the inclusion of interaction terms within your CBC and ACBC models.
The 2-Log Likelihood Test for Interaction Effects
The null hypothesis is that there is no interaction effect. We test each interaction effect, one at a time, to see if there is enough evidence to reject the null hypothesis with a given degree of confidence.
The test involves these steps (the interaction search tool automates these steps):
1. Using aggregate (pooled) logit, estimate the model using main effects only. Record the log-likelihood for this model.
2. Using aggregate logit, estimate the model using main effects plus a 2-way interaction effect. Record the log-likelihood for this second, larger model.
3. Compute the difference in log-likelihood between the main effects only model and the main effects model that includes a selected interaction effect. That value times 2 is distributed Chi-Square, with degrees of freedom equal to the difference in the number of parameters in the two models. Use the Chi-Square distribution to compute the p-value, which is the likelihood that we would observe a difference this large in fit just by chance. If the p-value is <0.05, then we are at least 95% confident that the interaction effect is significant.
Repeat steps 2 and 3 for all potential 2-way interaction effects. For an 8-attribute study, there are (8)(7)/2 = 28 unique interaction effects to test. Then, sort the results in terms of lowest p-value to highest p-value.
Please note that since we are doing a series of independent t-tests, 5% of them will pass the threshold of 95% confidence just due to chance. To reduce the likelihood of false positives, you should probably consider a much higher threshold than 95% confidence (p<.05).
Modified 2-Log Likelihood Test Using Individual-Level Utilities File
Many readers will note that the aggregate logit 2LL tests don't reflect the eventual analysis (HB) they will be running. The significant interaction effects discovered using the aggregate logit 2LL tests may be due to unrecognized heterogeneity. Indeed, it would be best to run separate HB models to account for heterogeneity while testing each interaction effect, but doing so would probably take too long for most practitioners and studies in practice.
A second available test within this automated Interaction Search Tool accounts for heterogeneity within the main effects, but estimates the strength of interaction effects using pooled aggregate logit. If you already have a file containing individual-level main effects (such as from HB estimation), you can specify that the test should leverage this file. (You'll typically find this file in a folder within your project folder called myexercise_hb, where myexercise is the name of your CBC or ACBC exercise). If you supply this additional file, then the series of 2LL tests runs even faster than basic method described above (though the small time savings is not the real reason to use this option). We'll now describe the differences between the two approaches.
In the basic 2LL test described in the previous section, we estimate main effects by first coding the attribute levels in a design matrix using effects-coding. For example, with 6 attributes each with 3 levels, there are (6)(3-1)=12 parameters to be effects-coded in the design matrix and estimated via aggregate logit. Aggregate logit finds the single set of population utility values that best fit the individual choice tasks across the population. Because respondents often differ significantly in their preferences, the fit that a single set of average utilities can provide to each respondent's choices is typically somewhat poor. After estimating main effects in this way, a second model is estimated that uses not only the 12 columns in the design matrix representing the main effects, but additional effects-coded columns for the interaction effects.
In the modified 2LL test leveraging individual-level utilities, rather than coding the main effects as an effects-coded design matrix, we code the design matrix with a single column containing the total predicted main-effect utility for each product concept as computed from the individual-level utility file. For instance, consider the example we previously described with 6 attributes each with 3 levels. The standard 2LL test involves encoding each product concept in the design matrix as a series of 12 effects-coded columns (containing +1s, 0s, and -1s). However, the modified test encodes each product concept in the design with a single value: the predicted utility of that concept for this given respondent, based on the supplemental individual-level utility file (typically coming out of HB estimation). Thus, if we looked at the design matrix we might find an X value associated with a product concept such as 3.641, where 3.641 is the total utility across the six attributes for the product shown to this given respondent within this given task. If that same product concept was shown to a different respondent, the X value in the design matrix for that respondent would almost certainly be something different, such as 2.713.
In the modified 2LL test, the additional columns in the design matrix accounting for the interaction effects are effects-coded as with the standard test.
Aggregate logit is used as before to estimate the log-likelihood fit of the model to the choices. You will note that the log-likelihood fit is much better for these models (closer to zero), since they leverage the heterogeneity as reflected in the supplemental HB utilities file. In essence, we are explaining much more of the variance using the individual-level main-effect utilities, then investigating (via pooled analysis) how much additional fit is gained by adding certain interaction effects.
Theoretically, this modified test should do a better job (than the standard LL test) of prioritizing which interaction effects would be the most useful within HB estimation.
Settings
The Interaction Search Tool provides some additional settings for you to choose.
Confidence Interval This controls the threshold for bolding the significant results and writing out data for charting, within the Charts tab. Interaction effects that meet this threshold of significance are bolded and written to the Charts tab.
Individual-level utility file This lets you specify a file containing main-effect utility estimates for the respondents for running the modified 2LL test that accounts for heterogeneity in the main effects.
Attribute Coding Allows you to set the coding (part-worth, linear, log-linear, or piecewise) used for the different attributes. (Note: we don't permit interaction effects between attributes and piecewise coded price.)
Task Filter Allows you to select which tasks to include in the analysis.
Output
The software summarizes (on the Search Summary tab) the largest interaction effects from smallest p-value (least likely to occur by chance) to largest p-value. The 2LL test is a very sensitive test and it is our experience that you will often find multiple interaction effects with p values less than 0.001. We recommend you also look at the difference in the Percent Certainty (directly related to the RLH, pseudo R-squared) values between models. Due to major differences between aggregate logit and HB models, there are no well-defined statistics for how much improvement in RLH from an aggregate logit model constitutes a clear opportunity to improve your HB models. In our experience so far, an interaction effect that can increase the Percent Certainty for the aggregate models by 1% or more would seem like a good candidate to potentially improve an HB model. If you have holdout choice tasks in your study, you should also examine whether the addition of interaction terms improves the holdout predictability.
Interactions that lead to ill-conditioned models (deficient designs) are listed with dashes (---) in the output (and are sorted to the bottom of the report). Such ill-conditioned models result when the attributes involved in the interaction also include prohibitions between them.
The net effect of the main effects plus interactions is summarized for you within tables (on the Charts tab) that can be cut-and-pasted into Excel for further investigation. Plotting these results using line graphs can help you visualize the interaction effect in question. Only the interactions that exceed the chosen significance threshold are written to the Charts tab.
You can review the detailed results of any of the logit runs by selecting the run from the Runs tab.
Notes:
ACBC doesn't support interactions involving piecewise price functions, so such interactions are skipped by the Interaction Search Tool.
Interactions between attributes and the None thresholds within ACBC are not estimable. For expediency in programming, the interaction search tool attempts to estimate these interactions, but breaks out when it finds they are inestimable. These are sorted to the bottom of the list as least valuable and you should ignore them.