CBC Latent Class Settings

Respondent ID

If estimating utilities using Latent Class or HB, you should select the Respondent ID to use for identifying the respondents in the utility files and segment membership files. If using the utilities within the SMRT system, make sure that the variable chosen leads to unique numeric values in the range of 0 to 999999999. The online simulator can handle unique integers or unique alphanumeric strings up to 255 characters long.

Respondent Filter

Filters allow you to perform analysis on a subset of the data. Categories of the sys_RespStatus (Incomplete, Disqualified, or Qualified/Complete) variable may be selected for inclusion in the analysis. Using the Manage Filters... additional filters may be selected from variables in your study, or new variables that you build based on existing variables or combinations of existing variables.

Weights

Counts, Logit, and Latent Class can use weights to give some respondents more importance in the calculations than others. You may select an existing variable with numeric responses to use as a weight (such as the numeric variable FamilySize). In that case, values associated with each case are applied as weights. Or, you may specify weights to apply to categories within an existing select-type variable (for example, the select-type variable Gender, where Male is assigned a weight of 1.25 and Female 0.75). Weights may not be applied during HB estimation.

You may also assign weights to categories of a new segmentation variable that you create (under Analysis | Segments and Weights...). New segmentation variables are created by specifying logic relating to one or multiple variables. For example, from the Segments and Weights dialog, you could create a new segmentation variable that categorized people into four buckets depending on their gender and the size of their family: a) Male_Small_Family, b) Male_Large_Family, c) Female_Small_Samily, d) Female_Large_Family. Then, you could assign discrete weights to categories a-d for use in analysis.

Tasks

This area allows you to select which tasks to include in analysis. Fixed (holdout tasks) by default are omitted from analysis. You may wish to omit other tasks as well, such as the first task as a warm-up.

Attribute Coding

This dialog lets you select which attributes to include in your analysis, and how to code them (part-worth or linear coding). If you select linear coding, a single utility coefficient is fit to the attributes, such as a slope for price, speed, or weight.

Linear Coding of Quantitative Attributes:

When you specify that an attribute should be coded as a linear term, a single column of values is used for this attribute in the independent variable matrix during utility estimation. A weight (slope) is fit for that independent variable that provides the best fit. A column opens up for you to specify the Value used in the estimation matrix for each level of the quantitative attribute. By default, these are 1, 2, 3, etc. However, you should specify values that metrically correspond to the quantities shown to respondents. For example, if respondents saw levels of "1 lb., 2 lb, and 6 lb." then the values associated with those levels should be 1, 2, and 6. Please note that CBC will automatically zero-center any values you specify when creating the independent variable matrix. So, values of 1, 2, 6 will be converted to values of -2, -1, and +3 in the independent variable matrix prior to utility estimation.

If using linear coding and HB, please note that level values should be specified in the magnitude of about single digits to lead to quick and proper convergence. In other words, rather than specifying 10000, 40000, 70000 one should specify 1, 4, 7. And, rather than specify 0.01, 0.04, 0.07, one should specify 1, 4, 7.

Interactions

Sometimes estimating a separate set of utility values for each attribute (main effects) does not fit the data as well as when also fitting selected interaction effects. This occurs when the utilities for attributes are not truly independent. We encourage you to consider interaction terms that can significantly improve fit. But, we caution against adding too many interaction terms, as this can lead to overfitting and slow estimation times.

The interaction between two attributes with j and k levels leads to (j-1)(k-1) interaction terms to be estimated. But, when the utilities are "expanded" to include the reference levels in the reports and utility files, a total of jk interaction terms are reported.

Constraints

If certain attributes have levels with known utility order (best to worst, or worst to best) that you expect all respondents would agree with, you may decide to constrain these attributes so that all respondents (or groups, in the case of logit or latent class) adhere to those utility constraints.

Constrain groups to a common scale is a second type of constraint that restricts the solutions to those in which each group has equal scale (where we use the standard deviation across the vector of utilities for each group as a proxy for scale). This helps ensure that we don't discover groups that differ just in terms of scale (magnitude of utilities) but have the same relative pattern of preferences. More information.

For more information, please see the Latent Class standalone software manual.

Estimation Settings

Minimum and maximum numbers of groups: (defaults: Minimum = 2, Maximum = 5) are the numbers of segments for which solutions should be computed. Up to a 30-group solution can be modeled. We recommend that you compare latent class results to aggregate logit results (a single group solution), if only to assess the relative gain from fitting a second group.

Number of replications for each solution (default: 5) lets you conduct automatic replications of each solution from different random starting points. Although the results of all replications are displayed to the screen (and saved in the Report.txt file), only the solution with the highest likelihood for each number of groups is saved to the final output. If your problem is relatively large, you will probably want to use only one replication in your initial investigation of the data set. However, before accepting a particular solution as optimal, we urge that you replicate that solution from several different random starting points. The lack of agreement among replications is evidence that the data don't seem to naturally segment along the dimensions you've specified.

Maximum number of iterations (default: 100) determines how long the computation will be permitted to go when it has difficulty converging. To understand what happens during each iteration, it may be useful at this point to repeat how the latent class estimation process works:

Initially, select random estimates of each group's utility values.

Use each group's estimated utilities to fit each respondent's data, and estimate the relative probability of each respondent belonging to each group.

Using those probabilities as weights, re-estimate the logit weights for each group. Accumulate the log-likelihood over all groups.

Continue repeating steps 2 and 3 until the log-likelihood fails to improve by more than some small amount (the convergence limit). Each iteration consists of a repetition of steps 2 and 3.

The default iteration limit is 100, although acceptable convergence may be achieved in many fewer iterations. You may substitute any other iteration limit.

Convergence limit for log-likelihood (default: 0.01) determines how much improvement there must be in the log-likelihood from one iteration to the next for the computation to continue. We arbitrarily use a limit of .01 as a default, but you may substitute a different value.

Advanced

Total task weight for constant sum data This option is only applicable if respondents provided allocation-based responses rather than discrete choices. If you believe that respondents allocated ten chips independently, you should use a value of ten. If you believe that the allocation of chips within a task are entirely dependent on one another (such as if every respondent awards all chips to the same alternative) you should use a value of one. Probably the truth lies somewhere in between, and for that reason we suggest 5 as a default value. A data file using discrete choices will always use a total task weight of 1.

Include 'None' parameter if available Although you may have included a "None" alternative in your questionnaire, you may not want to include respondents' propensities to choose that alternative in the information used for segmentation.

Note: For any respondents who choose None for all tasks, there will be no information with which to classify them into clusters, and they will be automatically classified into the largest cluster.

Tasks to include for best/worst data If you used the best/worst input option, you can select which tasks to include in utility estimation: best only, worst only, or best and worst.

Report standard errors: Standard errors and t ratios are only reported if this box is checked. The numerical output of Latent Class is much more voluminous than that of Logit, so we have made this information optional.

Display re-scaled utilities and attribute importances determines whether a table is provided within the Latent Class output in which the part worth utilities for each group are re-scaled to be more comparable from group to group (using the normalization method of "zero-centered diffs"). The logit algorithm employed in Latent Class produces utilities close to zero if members of a group are confused or inconsistent in their ratings, and produces larger values if members of a group have consistent patterns of responses. Because groups differ in the scaling of their utilities, it is often difficult to interpret differences from group to group. This table re-scales the part worth utilities for each group to make them comparable: the average range of values for each attribute (the difference between that attribute's maximum and minimum utilities) is set equal to 100. For linear attributes, we compute maximum and minimum utilities by examining the maximum and minimum values you furnished (under the Values column within the Attributes area). This option also provides summaries of attribute importances for each group, obtained by percentaging the attributes' utility ranges. Note that even if you don't choose to include the re-scaled utilities in the output, you can later display re-scaled part worth utilities and importances by segment using the market simulator within SMRT or the online simulator.

Tabulate all pairs of solutions refers to tabulations of the respondent group membership for all the solutions with one another. Although each respondent has some probability of belonging to each group, we also classify each respondent into the group to which he or she has highest probability of belonging, and tabulate each solution with those adjacent to it. That is to say, the two-group solution is tabulated against the three-group solution, the three-group solution is tabulated against the four-group solution, etc. If you check this box, then all pairs of solutions are tabulated against each other rather than just those that are adjacent.

Starting seed (default: 0, meaning random start based on system clock) gives you control of the random number generator that is used to provide the initial random start. The reason for giving you this control is so that you will be able to repeat a run later if you want to. Latent class analysis will probably give you a somewhat different solution each time you re-run the analysis with the same data set. The solutions depend on the initial starting values, which are random. If you provide a value here (typically in the range from 1 to 32000) you will get the same solution each time, but different values will lead to different solutions. If you provide a value of zero, which is the default, then the time of day is used as a seed for the random number generator.