Estimating Utilities with Logit

Note: Aggregate Logit has been used for more than three decades in the analysis of CBC data. It is useful as a top-line diagnostic tool (both to assess the quality of the experimental design and to estimate the average preferences for the sample), but we recommend using Latent Class or HB for developing final and more accurate results especially for conducting market simulations.

When you run Logit (Analysis | Analysis Manager... then by selecting Logit as the Analysis Type then by clicking Run), the results are displayed in the report window and stored in the internal database file. You can weight the data, select subsets of respondents or tasks to process.

If you want to gain access to the utilities in a text file, there are two ways to do that. Prior to estimating the utilities you can specify that utility runs should be saved to an output folder by clicking the Settings icon and making those changes in the File Options dialog, then the utilities are also saved into a subfolder within your project directory. After estimating a utility run, you can click Export Utilities to write the utilities out to a file.

What are Utilities?

A utility is a measure of relative desirability or worth. When computing utilities using logit, latent class, or HB every attribute level in a conjoint project is assigned a utility (also referred to as a part worth). The higher the utility, the more desirable the attribute level. Levels that have high utilities have a large positive impact on influencing respondents to choose products.

When using logit, Latent Class, or HB, the raw utilities are zero-centered within each attribute. For example:

Level Utility

$300 -0.6

$200 0.1

$100 0.5

This example shows respondents preferring lower price levels to higher ones. For information on interpreting conjoint utilities, please see the following section entitled Interpreting Conjoint Utilities.

Choosing Effects for Logit

By default, CBC estimates utilities for all main-effects. Main-effects reflect the impact of each attribute on product choice measured independently of the other attributes. Main-effect models are the simplest models, resulting in a single utility value associated with each attribute level in the study.

You can add additional terms to the model to account for two-way attribute interactions. For example, if the combination of "Red" with "Mazda Miata" results in greater preference than the main-effect utilities would suggest, an interaction term can capture and reflect that synergy. Interaction terms can also reflect differences in price sensitivity for different brands.

We suggest only adding interaction terms that result in a significant improvement in the overall fit of the model. Including too many terms in a model can lead to overfitting, which has the potential of modeling a good deal of noise along with true effects.

When using aggregate logit, it is important to investigate whether including interaction terms can significantly improve the fit of the model. With methods that recognize respondent differences (Latent Class, and HB), many complex effects (including, but not limited to, interactions) can be reflected in market simulations using only main-effects. It is our experience that including interaction terms in logit models can often significantly improve predictability of the model, but that those same terms added to Latent Class or HB are not as valuable and can even, in some cases, be detrimental. Thus, with little effort, one can achieve excellent results with Latent Class or HB using CBC's default main-effects. Even so, one should be on the lookout for opportunities in Latent Class and HB to improve the models by including interaction effects.

Running Logit Analysis

Logit analysis is an iterative procedure to find the maximum likelihood solution for fitting a multinomial logit model to the data. When only main-effects are estimated, a part worth is produced for each attribute level, which can be interpreted as an average utility value for the respondents analyzed. After logit converges on a solution, the output is displayed in the results window.

The computation starts with estimates of zero for all effects (utilities), and determines a gradient vector indicating how those estimates should be modified for greatest improvement. A step is made in the indicated direction, with a step size of 1.0. The user can modify the step size; a smaller step size will probably produce a slower computation, but perhaps slightly more precise estimates. Further steps are taken until the solution stops improving by more than a predetermined threshold amount.

For each iteration the log-likelihood is reported, together with a value of "RLH." RLH is short for "root likelihood" and is an intuitive measure of how well the solution fits the data. The best possible value is 1.0, and the worst possible is the reciprocal of the number of choices available in the average task. For these data, where each task presented four concepts plus a "None" option, the minimum possible value of RLH is .2.

Iterations continue until the maximum number of iterations is reached (default 20) or the log-likelihood increases by too little (less than 1 in the 5th decimal place) or the gradient is too small (every element less than 1 in the fifth decimal place).

The user also has the option of saving variances and covariances of the estimates. The square roots of the variances are equal to the standard errors and are always displayed. The default is not to display variances and covariances.

Here is the output for a computation, using the same data as were used in the second example described in the previous chapter on Counting analysis (about 100 respondents, each of whom answered 8 randomized choice tasks, each including 4 concepts and a "None" option).

CBC System Multinomial Logit Estimation

Main Effects

Specifications for this run:

Max iterations 20

Variances and covariances not saved

Step size 1.00000

Max change in loglike 8e-007

Iter 1 log-likelihood = -1471.20588 rlh = 0.24828

Iter 2 log-likelihood = -1462.75221 rlh = 0.25028

Iter 3 log-likelihood = -1462.73822 rlh = 0.25028

Iter 4 log-likelihood = -1462.73822 rlh = 0.25028

Iter 5 log-likelihood = -1462.73822 rlh = 0.25028

Iter 6 log-likelihood = -1462.73822 rlh = 0.25028

Converged.

Log-likelihood for this model = -1462.73822

Log-likelihood for null model = -1699.56644

Difference = 236.82822Chi Square =473.656

Effect Std Err t Ratio Attribute Level

1 0.62150 0.05126 12.12462 1 1 Brand A

2 -0.05740 0.06021 -0.95331 1 2 Brand B

3 -0.26472 0.06411 -4.12943 1 3 Brand C

4 -0.29938 0.06509 -4.59957 1 4 Brand D

5 0.13859 0.04899 2.82895 2 1 Shape 1

6 0.07652 0.04962 1.54217 2 2 Shape 2

7 -0.21510 0.05199 -4.13734 2 3 Shape 3

8 0.15207 0.04895 3.10636 3 1 Large Size

9 0.04925 0.04939 0.99716 3 2 Medium Size

10 -0.20132 0.05201 -3.87086 3 3 Small Size

11 -0.52970 0.07101 -7.45947 4 1 Price 1

12 -0.22737 0.06409 -3.54773 4 2 Price 2

13 0.17347 0.05708 3.03928 4 3 Price 3

14 0.58361 0.05185 11.25616 4 4 Price 4

15 -1.07590 0.12434 -8.65299 NONE

The output starts with a listing of values that govern the computation. The values shown here are default values, and the user will rarely need to modify them. The default value for maximum number of iterations is 100, and this computation required only 6.

Logit analyses are often evaluated by Chi Square statistics. The procedure is to determine the log likelihood that would be obtained, given the sample size and the nature of the data, if the estimated effects were all zero. That log likelihood is compared to the log likelihood for the estimates obtained. Twice the difference between those two log likelihood values is distributed as Chi Square, with degrees of freedom equal to the number of parameters estimated.

The number of parameters estimated here is 11, obtained by adding the total number of levels and subtracting the number of attributes. With 11 degrees of freedom, a Chi Square of about 25.0 would be significant at the .01 level. The obtained value of 473.6 is safely larger than this, so we would conclude that respondent choices are significantly affected by the attribute composition of the concepts.

Within each attribute, the effects sum to zero. That is because we actually omit one level for each attribute in doing the estimation, and then supply a value afterward for the missing level that is equal to the negative of the sum of the others. Our use of "effects coding" makes this possible. In the logit analysis output, "effect" is synonymous with utility.

To the right of each estimate is a standard error, and to the right of that is a t ratio. The t ratio is a measure of the significance of the difference between that level's effect and the average of zero for all levels within the attribute.

(Additional statistics such as CAIC, Percent Certainty, and Relative Chi Square are for comparison to runs produced by the Latent Class Module add-on. See the Latent Class manual for further details.)

When there are no interaction effects, as in this example, the relative attractiveness of a concept can be assessed by adding up the effects for its component attribute levels. For example, consider these two hypothetical concepts:

Concept 1 Concept 2

Effect Effect

Brand A 0.62150 Brand B -0.05740

Shape 3 -0.21510 Shape 1 0.13859

Small -0.20132 Large 0.15207

Price 3 0.17347 Price 1 -0.52970

Total 0.37855 -0.29644

These two hypothetical concepts are each "scored" by adding up the effects of their component attribute levels. Concept 1 should be strongly preferred to Concept 2 by these respondents. In fact, we can go one step further and estimate how strong that preference would be. If we exponentiate (take the antilog of) each of the total values we can then express them as percentages to predict the proportion of respondents who would choose each concept if they had to choose one or the other:

Total exp(total) Percent

Concept 1 0.37855 1.460 66.3%

Concept 2 -0.29644 0.743 33.7%

Total 2.203

If forced to choose between Concept 1 and Concept 2, about two thirds of these respondents should choose Concept 1 and one third should choose Concept 2.

To show how the "None" parameter is used, we add a third item with total equal to -1.07590, the estimate for choosing "None."

Total exp(total) Percent

Concept 1 0.37855 1.460 57.4%

Concept 2 -0.29644 0.743 29.2%

"None" -1.07590 0.341 13.4%

2.544

In a three-way contest between Concept 1, Concept 2, and the option of choosing neither, about 57% should choose Concept 1, 29% should choose Concept 2, and 13% should choose "None."

To see how logit output and analysis by counting choices output are similar, we use the same procedure to estimate the distribution of choices for four concepts differing only in brand, plus a "None" option. The resulting numbers in the logit output should be similar to the main-effects for Brand as reported in the output for analysis by counting choices.

Effect exp(effect) Prop- From Diff-

ortion COUNT erence

Brand A 0.62150 1.862 0.400 0.387 0.013

Brand B -0.05740 0.944 0.203 0.207 -0.004

Brand C -0.26472 0.767 0.165 0.173 -0.008

Brand D -0.29938 0.741 0.159 0.165 -0.006

None -1.07590 0.341 0.073 0.068 0.005

Total 4.655 1.000 1.000 0.000

Although the estimates derived from logit analysis are not exactly the same as those observed by counting choices, they are very similar. The differences are due to slight imbalance in the randomized design (and due to other factors mentioned at the end of the previous chapter). The estimates produced by the logit calculation are slightly more accurate. With a larger sample size we might expect differences between the two kinds of analysis to be even smaller.

After logit has finished running, results are displayed to the results window, including summary information about the computation is displayed (the total number of tasks and time for the computation). You are given the option to save the run to the utilities file. You can also print, or cut-and-paste the logit report to a file.

We emphasize that logit analysis is an advanced statistical technique that requires understanding to be used properly. We have made it easy to conduct logit analyses with CBC. The burden is on you to use the results wisely. We urge you to consult appropriate reference materials. We also urge you to consider the limitations of logit analysis relative to the Latent Class and HB (hierarchical Bayes) estimation approaches.

Notes on Determining Significant Interaction Effects

By default, CBC's logit program calculates all main-effect utilities. Sometimes main-effects models can be improved upon by adding interaction terms. We'll define an "improved" model as one that:

1. Significantly improves the model fit in terms of log-likelihood,

2. Improves the accuracy of the market simulator in terms of aggregate shares vs. fixed holdout choice tasks or market shares.

We urge you to only include interaction terms in the model that significantly improve results. To determine whether to include interaction terms, we suggest you first study the two-way tables provided by Counts analysis. The Chi-square statistic is one indication of possible significant interaction, but we suggest you look further. We also suggest plotting the Counts proportions with a scatter plot display to verify that deviations from proportionality in the table seem plausible given what you know about buyer preferences.

Most importantly, we suggest you conduct a "2 log-likelihood" test. We alluded to this test earlier in the section with respect to determining whether the computed logit effects significantly improve our ability to predict respondent choices relative to the null model (effects of zero). A 2 log-likelihood test can also be used to determine whether the addition of an interaction term to the logit model significantly improves the fit. We will illustrate with a simple example.

Assume a simple CBC design with two attributes, each with four levels. Under main-effects, a total of eight part-worths result. Recall, however, that each four-level attribute is coded as just three parameters in the logit model under effects-coding. Assume we run this main-effects logit model and achieve a log-likelihood of -1500.00.

The interaction between two four-level attributes, each effects-coded as three parameters, results in an additional 3 x 3 = 9 parameters added to the model. Suppose we include this interaction term and the log-likelihood improves to -1490.00. Adding the interaction term has improved the log-likelihood by 10.00. Two times the log-likelihood is distributed as Chi-square. We refer to a Chi-square table (available in any statistics textbook), and look up the p-value for a Chi-square of 20.00 with 9 degrees of freedom (the number of additional parameters added to the model). We find that the p-value is roughly 0.02, suggesting a significant improvement in the model by adding the interaction terms with a confidence level of 98%.

CBC software provides an automated Interaction Search Tool that automatically investigates all potential 2-way interaction effects under aggregate logit.

Technical Notes on Logit

CBC uses multinomial logit analysis (MNL) to estimate effects for attribute levels. Additional information is provided by Hosmer and Lemeshow in Applied Logistic Regression (Wiley, 1989).

MNL is similar in many ways to multiple regression and discriminant analysis. Like those methods, it seeks "weights" for attribute levels (or for combinations of them, if interactions are included in addition to main effects). Those weights are analogous to "utilities" in conjoint analysis, and are computed so that when the weights corresponding to the attribute levels in each concept are added up, the sums for each concept are related to respondents' choices among concepts.

MNL assumes that the relative probabilities of respondents' choosing each concept within any task can be estimated from the weights in the following way:

1. Sum the weights for the attributes appearing in each concept to get a value analogous to that concept's "total utility."

2. Convert total utilities to positive values by exponentiating them. The resulting values may be considered analogous to relative probabilities, except that they do not lie within the unit interval.

3. Normalize the resulting values so that within each task they sum to unity, by dividing the values for concepts within each task by their sum.

Suppose we have such probabilities for the concepts in one choice task. Then we can say that, according to our model, the likelihood of a respondent's choosing each concept is equal to the probability for that concept. If we assume choices for different tasks to be independent, then we can compute the likelihood (under our model) of seeing any particular pattern of choices for any number of respondents and tasks. That likelihood is just the product of a number of computed probabilities. If there are N respondents, each having responded to k choice tasks, then the likelihood of that pattern of responses is just the product of Nk such probabilities.

MNL chooses weights that maximize the likelihood of the observed pattern of respondent choices, using probabilities derived from the weights as just described.

Since likelihoods are obtained by multiplying many probabilities together and each is less than unity, likelihoods are usually very small positive numbers. It is more convenient to think about their logarithms, and users of logit estimation usually speak of "log-likelihoods" rather than likelihoods themselves. This is just a matter of convenience, because the set of weights that maximize the likelihood must also maximize the log likelihood.

In CBC we also report another somewhat more intuitive measure, "root likelihood," which we abbreviate "RLH." This is just the geometric mean of the probabilities corresponding to the choices made by respondents, obtained by taking the Nk'th root of the product of the Nk probabilities. The best possible value of RLH is unity, achieved only if the computed solution correctly accounts for all the choices made in all tasks by all respondents.

The computation method used by CBC is iterative. It starts with a "null model" consisting of weights all equal to zero, and computes the likelihood of the respondents' choices, given probabilities corresponding to that initial solution. During that computation information is also accumulated about a "gradient vector," indicating how the initial solution should be changed for greatest improvement. Those changes are made, and a second iteration evaluates an improved solution, while also accumulating information about how a third iteration might improve on the second.

The iterative process continues until one or more of the following occurs:

The number of iterations exceeds a limit. The default is a limit of 100.

The change in log-likelihood from one iteration to the next is less than a limit (convergence limit 1). The default is 1 in the fifth decimal place.

We do not anticipate that users will need to change these defaults, but it can be done by clicking the Estimation Settings button. The dialog includes four parameters:

1.	Step size (default = 1.0). This number governs the sizes of changes made in the iterative computation. If set to a smaller number, such as .5 or .1, the computation will be slower but may be somewhat more precise.

2.	Maximum number of iterations (default 100)

3.	Log-likelihood Convergence Limit (default is 1 in the fifth decimal place).

4.	Report details. If set to Full, more information about variances and covariances of estimates is provided in the output.

Effects Coding

Logit analysis, like least squares, suffers from the dependencies among attribute levels when all the levels of an attribute are entered into the computation. For this reason one level of each attribute is deleted automatically, although results for that level may be obtained by inference from the others, and those results are provided in the output.

Logit deletes the last level (rather than the first, as done by Test Design). Remaining levels are represented using "effects coding." This is a way of coding for levels so that the average effect within each attribute will be zero. After the logit computation, we provide the estimate for each deleted level just by computing the negative of the sum of the included levels. Logit analysis permits estimates not just of main effects, but also of two-way interactions.

Tests of Significance

CBC offers two kinds of significance tests: a t ratio and an overall Chi Square test. Probably the more useful is the Chi Square for the difference between two "nested" models, where the second includes all effects of the first plus one or more others. Twice the difference between the log-likelihoods of those two models is distributed as Chi Square, with degrees of freedom equal to the number of additional effects in the second model. This test is performed automatically in each logit analysis, comparing the results of that analysis to what would be obtained with no effects at all.

The test can also be computed manually to compare any other nested runs, such as one run with all main effects and another with just a subset, or one with just main effects and another with main effects and one or more interactions. (The Chi Square of the difference is the difference of the Chi Squares.)

The other test is a t ratio provided for each attribute level. This tests the difference between that level and the average of all levels for that attribute (which is zero). These t ratios can provide useful guidance, but for measuring whether an effect is significant the overall Chi Square test is preferable.

The standard errors used to compute t tests are taken from the matrix of covariances among estimates that is obtained by inverting a sum of squares and cross-products matrix. If many parameters are being estimated or there are relatively few observations, this matrix may be ill-conditioned (having determinant less than 1.0E-10). in this case a "ridge adjustment" is made by adding unity to each diagonal position of the sum of squares and cross-products matrix, reinverting, and displaying an error message. We do this so the computation will provide useful information; however, the presence of a "ridge adjustment" message indicates the solution to be unsatisfactory, and you should probably remove some interaction terms and re-estimate.