Technical Details

Top  Previous  Next

What is the bandit in Bandit MaxDiff? One-armed bandit is a slang term for slot machines in casinos.  They have one arm (the lever you pull) and they usually take your money (like bandits).  For at least the last sixty years, statisticians have been interested in what they have called multi-armed bandit problems.  For example, if you want to invest your resources over multiple time periods across multiple activities, each with uncertain outcomes (like pulling different arms across multiple slot machines), how should you allocate the resources (your bets) to maximize the long-term payoff?  Bandit solutions maximize the expected return over multiple time periods by effectively exploring the possible outcomes (advertising, clinical trials, product launches, website modifications) while exploiting the information learned along the way.

Bandit MaxDiff employs Thompson Sampling to select the items to oversample for each new respondent.  Thompson Sampling leverages prior estimates of each item’s mean and variance which can be estimated via aggregate logit or adequately enough via much quicker ways such as counting analysis.  

We recognize that our prior estimates of item preferences are subject to uncertainty, so we pick the items for the next respondent probabilistically via Thompson sampling by perturbing the prior means by normal draws with standard deviation equal to the population estimates from logit or counting analysis.  For each new respondent, the perturbed item scores are sorted from best to worst and the top t items (the NumThompsonItems argument in the BanditMaxDiff listbuilding command) are selected for inclusion in the respondent’s MaxDiff questionnaire, where NumThompsonItems may be specified by the researcher.  To make the approach even more robust (see the section below regarding misinformed first respondents), we recommend that at least some of the items be selected with a very diffuse prior (very large variance) or even purely randomly.  The specific solution we use for sampling a subset of items with a very diffuse prior (items making up the difference between Items and NumThompsonItems in the BanditMaxDiff instruction) is actually to select those few remaining items among the items seen fewest times to this point by prior respondents.

Lighthouse Studio’s Bandit MaxDiff capability utilizes counting analysis to estimate each item’s preference and variance (for the purpose of Thompson sampling), where the variance is estimated via the well-known pq/n formula for the variance of proportions.  Our strategy for counting analysis involves exploding the MaxDiff task into all inferred pairwise comparisons and simply counting the percent of “wins” (p) for each item across all inferred pairs (across all tasks and respondents).  We compute n as the average of the sample size and the number of times each item appears across all MaxDiff tasks for all respondents.

As a prior (and to avoid dividing by zero), we initialize the percent "win" for each item to 50%.  More specifically, before the first respondent completes a survey, we assume that each item already has been involved in 10 paired comparisons and has been chosen "best" in 5 of those 10 comparisons.

Page link: http://www.sawtoothsoftware.com/help/lighthouse-studio/manual/index.html?technical-details-bandit-maxdiff.html