ACBC creates an array of product concepts for the respondent to evaluate within the Screener and Choice Tasks sections of the questionnaire. These concepts are designed to be "near-neighbors" to the product concept the respondent chooses in the BYO task, but still include the full range of levels taken into each respondent's ACBC survey.
Because the BYO-specified product concept differs across respondents and the number of attributes and levels taken into each respondent's ACBC exercise can be dynamic (though in most projects the list will be static), it isn't possible to create an experimental design prior to fielding the study (as is done in non-adaptive CBC studies). Customized designs must be generated on-the-fly for each respondent. Because dozens or even hundreds of respondents might simultaneously be completing surveys over the web, we are concerned about the amount of computing resources demanded of the server to generate designs. Therefore, we've developed a relatively quick algorithm. The algorithm cannot be said to produce optimal designs, but its designs are near-orthogonal and have proven to work exceptionally well in many methodological studies to date comparing ACBC to standard CBC.
Inputs
The respondent provides the following input to the customized design:
•C0, a vector with as many elements as attributes included in this respondent's BYO question, describing which levels were included in the BYO concept.
The analyst provides some inputs that control the design:
•T, the number of total product concepts to generate,
•Amin, the minimum number of attributes to vary from the BYO concept,
•Amax, the maximum number of attributes to vary from the BYO concept (restricted to be no more than 1/2 the number of attributes in the BYO question +1, not including Summed Price),
•If a "summed price" attribute is used, a range is provided specifying how much price should vary (randomly) from the summed components' price (e.g. 30% below to 20% above summed price).
The Design Algorithm
Near-orthogonal designs are generated using a controlled, randomized process. The steps involved in selecting each of T concepts in the design are as follows:
1. | Randomly select an integer (Ai) from Amin to Amax that specifies how many attributes within C0 will be modified to create new (near-neighbor) concept Ci. |
2. | Randomly select Ai elements within C0 to modify. |
3. | Randomly select new (non-BYO selected) levels for the attributes chosen in step 2 (all other attributes remain at the BYO-selected levels). |
4. | Check to ensure that the concept chosen doesn't violate any prohibited pairs and is not a duplicate to another concept previously selected for this respondent. If prohibited or duplicate, discard the concept and return to step 1. |
5. | For non-BYO selected levels, examine whether relabeling levels to another non-BYO selected level within the same attribute improves the relative D-efficiency of the design for this respondent. Examine whether swapping non-BYO selected levels between two concepts improves the relative D-efficiency. Any relabeling or swapping that increases the efficiency while not making the target level count balance worse is accepted. |
Steps 1-5 are repeated as many times as can be done within about one second per respondent. This means that faster servers can lead to slightly more efficient designs. Faster server response happens when there is less load on the server, faster hardware, or both. However, do not become too preoccupied with these issues, as even one pass through the design algorithm leads to highly efficient designs. Once a design is quite good (such as from a first pass through the algorithm), comparing the results of such a design to a different one that has an even higher degree of D-efficiency will usually show little practical difference in terms of the quality of the final utilities.
Issues Related to Level Balance and Efficiency
"Counts" arrays (at the individual level) are maintained to ensure that the designs have much greater balance than would be achieved if the above strategy involved purely random selections. A counts array keeps track of how many times each element has been selected or modified. For example, we maintain a counts array for each attribute that records how many times each level within that attribute (other than the BYO-selected level) has been included in the design. When a relative deficit occurs in the target frequency of selection, we increase the likelihood that the element with the deficit will be selected in the next concept (the mechanism for doing this is described in the section Deviation from Target Frequencies). This allows for "controlled" randomized designs, and leads to a relatively high degree of level balance (but not perfect level balance). Our approach leads to a high degree of balance for: a) how many times Amin to Amax attributes were varied when generating T concepts, b) how many times each attribute was varied, and c) how many times each level (other than the BYO-specified level) was included across the T concepts.
For attributes without logical preference order (such as color, style, and brand), we try to achieve level balance across the non-BYO chosen levels. For attributes featuring levels with numeric/logical order (such as speed, size, weight, etc.), we overweight by a factor of two the selection of levels directly adjacent to the BYO-chosen level. Thus, the counts array describing the frequency of level occurrence across T=36 concepts for four levels of speed might be as follows (assuming the respondent chose level three in the BYO concept): 3, 6, 21, 6. Levels 2 and 4 are oversampled by a factor of 2x relative to level 1.
When a "summed price" attribute is in place, we first sum the total price for the concept (according to the level-based prices specified in the questionnaire setup). We then draw a random price variation (continuous random variable, rectangular distribution) within the range of price variation specified by the questionnaire author (for example a -30% to +30% price variation involves drawing a price multiplier from 0.7 to 1.3). Our randomization strategy ensures a high degree of balance across the price variations. We divide the range of price variation into quartiles, and randomly select (without replacement) which quartile will be used for the next random price draw (a random continuous value within that quartile). For example, the range for the price multiplier of 0.7 to 1.3 is broken into quartiles 0.7 to 0.85, 0.85 to 1.0, 1.0 to 1.15, and 1.15 to 1.3 (for tracking purposes). If we need to draw a price variation from the fourth quartile in creating concept Ci, we draw a random price multiplier anywhere within the range 1.15 to 1.3. After multiplying the summed price by that random price multiplier, the resulting price can then be rounded to the nearest x dollars (or monetary units of choice), where x is controlled by the survey author. As examples, one can round to the nearest 100 dollars, the nearest 1 dollar, or the nearest 0.1 dollars. One can also choose to add a constant after the rounding, such as -0.01, such that rounding to the nearest 0.1 dollars and subtracting 0.01 leads to prices ending in 9s. The actual price shown is recorded in the data file for use in analysis.
Certain constraints (such as attribute prohibitions or avoiding dominated concepts) can make it difficult or impossible to maintain a desired degree of balance. If the design algorithm is unable to find a new concept Ci with the counts balancing goals in place, it will relax the balancing criteria in order to generate a valid concept. In that case, a message will be written to a special ACBC log file name studyname_ACBC_log.cgi stored within the Admin folder for your information and review.
From the standpoint of achieving maximum statistical efficiency, the optimal approach would vary all attributes independently, as in the traditional orthogonal array. By choosing only a narrow subset of the attributes to vary (from the BYO-specified concept) when creating each new concept, lower statistical efficiency results. However, ACBC's approach has three important benefits to counteract the loss in statistical efficiency: 1) Respondents can answer with less noise when fewer attributes are varying within each concept, 2) The concepts seem more relevant and plausible to the respondent, since they are near-neighbors to the BYO-specified product, and 3) The design concentrates on learning about preferences for the levels directly surrounding the respondent's highest levels of preference. In our previous methodological tests with about eight or nine total attributes, we have found that using 2 and 4 for Amin and Amax, respectively, works quite well. We tried varying from 2 to 5 attributes when generating each concept in one of our experiments, and didn't see an increase in performance. At a recent Sawtooth Software (2013), two additional tests were done that experimented with how many attributes to vary from the BYO selections for the near-neighbor designs. Whether fewer or more attributes were varied, ACBC results were quite robust and of high quality.
Because the designs are adaptive, one cannot estimate the exact design efficiency prior to fielding the study. However, the Test Design capability simulates respondents who answer ACBC questionnaires randomly, followed by aggregate logit estimation to investigate the size of standard errors (reflecting the precision of estimates) for pooled parameters.
When Attributes Are Dropped from the BYO Section
There are instances in which an attribute with a priori preference order seems out-of-place in a BYO question. If price premiums aren't assigned to such an attribute, it seems painfully obvious to ask respondents which level they would prefer in their BYO product. In such cases, you can drop attributes from the BYO question. The software will also let you drop attributes from the BYO question even if you don't establish a priori preference order (though this is an exception rather than typical practice, and you lose an opportunity to learn which level is preferred via the BYO question).
When attributes with known a priori order are skipped in the BYO section, for software developer's convenience, the software will automatically fill the missing BYO answers as if the respondent had picked the best level. For attributes dropped in BYO without known a priori order, the software will automatically fill the missing BYO answers with randomly selected levels for those attributes. However, in either case, the design algorithm doesn't treat these automatically filled responses as legitimately chosen BYO levels for purposes of utility estimation or design generation. All levels of the skipped attributes are sampled roughly evenly in the near-neighbor design. "Dropped" attributes from the BYO section are not included in the vector C0, and are thus not counted as an attribute to be modified from its BYO level.
Strategies for Selecting Attributes to Vary (Attribute Balance vs. Mixed Approach)
The first three methodological tests we ran comparing ACBC to standard CBC used the strategy described above, where for the generation of each new concept we randomly selected a subset of attributes to vary from the BYO-selected levels such that each attribute had been varied from its BYO-selected level about an equal number of times. However, it has occurred to us that such a strategy misses an opportunity to improve the amount of information we gain from each respondent in the typical cases in which we have asymmetric designs (where the number of levels per attribute varies). For studies with asymmetric designs, attributes with fewer levels have their levels probed more often than attributes with more levels.
Consider an attribute with 6 levels versus another attribute with 2 levels. If the number of times we decide to vary the attributes from the BYO-selected levels is equal for these two attributes, it is clear that the number of times that the 1 non-BYO selected level from the 2-level attribute will be probed (included in the design) is greater than for the 5 non-BYO levels from the 6-level attribute. However, if our strategy is to reduce uncertainty over the entire attribute level space, it would follow that we should select the 6-level attribute more often for varying from its BYO-selected level than the 2-level attribute. This can be accomplished simply by changing the manner in which we increment the counter array that keeps track of how many times each attribute has been varied from its BYO-selected level. For the first three studies we ran, every time we altered an attribute from its BYO-selected level, we added a "1" to the counts array with respect to that attribute (where the counts array has as many elements as attributes in the study). The goal was to select subsets of attributes to modify from BYO-selected levels such that the counts array was balanced, for example, [18, 18, 18, 18, 18, 18] for a study involving six attributes, to indicate that each attribute had been varied 18 times from its BYO-specified level.
If we want to select attributes to vary such that there is greater balance in the number of times each non-BYO level has been probed, we could add 1/k to the counts array each time we select an attribute for variation from its BYO-selected level, where k is equal to the number of levels in that attribute. This would reduce the uncertainty across the entire attribute level space, but at the cost of adding a potential psychological bias. If one attribute is shown to vary from its BYO-selected level much more often than another across the product concepts, then undue attention may be called to that attribute. For that reason, we have implemented a mixed strategy that is a middling position between these two approaches (attribute-selection balance versus level-selection balance). It puts more attention toward testing non-BYO levels for attributes with more levels than for attributes with fewer levels; but it doesn't enforce strict level balance. We believe that the mixed approach should be better than the attribute balance approach, but look to future empirical research regarding this issue. Such research must involve human respondents rather than computer-generated respondents, since it involves a potential psychological bias.
Generating Replacement Cards
A key aim of ACBC questionnaires is to identify any levels that are "unacceptable" or "must haves." Rather than ask the respondent upfront regarding cutoff rules, we formulate hypotheses regarding what levels might be unacceptable or must-haves based on the respondent's observed choices within the Screening section. For example, if we notice that the respondent only selects as "a possibility" concepts featuring Brand A, we might suspect Brand A is a "must-have" level. After we have observed a pattern of such choices, we present a list of "unacceptable" or "must have" rules that we suspect the respondent might be employing. If the respondent confirms a non-compensatory rule on the list, we then mark as "not a possibility" any concepts not yet evaluated that would fail to satisfy that rule. This leads to more efficient questionnaires and the opportunity to create "replacement cards" that probe deeper within the respondent's relevant preference space. For example, if after a respondent has evaluated the first 10 concepts (marking each as a "possibility" or "not a possibility"), the respondent verifies that Brand A is a "must-have" rule, this might eliminate from consideration 8 of the upcoming concepts (all featuring a brand other than Brand A) that we had planned to ask the respondent to evaluate. We generate 8 replacement concepts (all featuring Brand A) according to the design algorithm specified above, the exception being that the brand attribute will be removed from consideration as a potential selection in step 2 (above).
Concept Presentation in the First Few Screening Tasks
We rely heavily upon respondents' evaluations of the first few screens of concepts within the Screening section to identify cutoff rules. It is typical to show four or five product concepts per screen within the Screening section. It seems useful to us that the first two screens of concepts should reflect the full variety of levels included in each respondent's attribute list. Therefore, after generating T concepts as described above, we attempt to select sets of concepts to show in the first two screens that contain the full range of levels included in the study. We employ a simple algorithm that involves counting how many levels are represented on each screen, and selects sets from our T concepts that lead to a relatively high score on that count.
Concept Presentation in the Choice Tasks Tournament
The concepts the respondent indicates are "possibilities" in the Screening section are taken forward to the Choice Tasks section where they are compared in triples until an overall winner is identified. It takes t/2 triples to identify the overall winner, where t is the number of concepts to be evaluated (in the case that t is odd and t/2 is not an integer, one rounds down to determine the required number of triples). It is easier to compare the concepts if some of the attributes are tied (share the same level) across the triple. When this happens, we "gray-out" the row so that respondents can focus on the differences when choosing the best concept. We can reduce the cognitive difficulty of the task if we assemble the concepts so that a healthy number of rows are "grayed-out." To that end, we take modest steps to increase the likelihood of tied attributes, but stop well short of maximizing the "amount of gray." This sacrifices some design efficiency (in the traditional sense, assuming compensatory processing), but the tasks are more easily completed and the responses will contain less error.