Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Maxdiff design imbalance - how to remedy?


I used Sawtooth Designer to generate 10 MaxDiff module versions for fieldwork, allocated randomly to each of 600 respondents. MaxDiff module was composed of 18 tasks, 6 items in each, 35 items in total. Recently, client asked to analyze a Subgroup of respondents, n=393.

I noticed, that not all MaxDiff versions were displayed evenly in the Subgroup. One of the 10 versions was displayed only 32 times (8.1%), while one other - 45 times (11.5%). Going further, I checked display frequency of each of 35 items. The mean frequency varied from 2.96 to 3.14 - some items were displayed up to 6% more often than the others.

I used t-test to compare mean display frequency between the Subgroup and the rest of sample - it showed a few significant differences, but eta2 were rather low (typically between 0.01 and 0.10). Friedman test, comparing the display frequencies of 35 items inside the Subgroup, was statistically significant - the 6% bonus seems to be relevant.

So my first question is how to determine if I can safely analyze results for a subgroup for which quotas on MaxDiff versions were not controlled?

Moreover, I also noticed some intriguing patterns for the Overall Sample as well. The imbalance of version sage was rather small, close to perfection - from 59 (9.8%) to 61 (10.2%) per version. Nonetheless, mean frequency of display of 35 items was quite varied - from 3.00 to 3.11 - some items were displayed up to 4% more often than the others. Again, Friedman test was statistically significant, which - by theory - would lead to a conclusion that items were not displayed completely randomly. And that is stated only for the aggregated level of entire module, I haven't checked e.g. average display position in exercise (as 1st/2nd.../6th), nor average excercise number in the module (1st/2nd/...18th), where most likely I would observe additional significant discrepancies.

So my second question is how Sawtooth CBC-HB is addressing the issue that some items are displayed more frequently than the others?


asked Nov 21, 2016 by Piotr (185 points)

1 Answer

0 votes

The MaxDiff designer achieves excellent balance across a large number of versions, but in most cases it cannot produce perfect balance within each version (your case is a great example:  asking 18 sets of 6 items each makes for 108 items shown in total.  Dividing that by 35 shows that each item is shown 3.0857 times for a given person on average.  So most respondents will see a given item 3 times but a few will see it 4 times).  This happens necessarily, because your number of questions x number of items/question is not an integer multiple of the total number of items.  And even if it were an integer multiple, we might be able to balance the "one way" item frequencies, but not the "two way" frequencies of how many times each item is shown with each other item (you only get this when a balanced incomplete block design exists for your exact problem and this is not always the case at all).  

So there is necessarily imbalance between versions.  Versions are assigned randomly across respondents when they START the survey and there is no provision for making sure we get the same number of completes per version, and this includes subgroups:  you can, will and in fact did see different versions being shown more or less often within your subgroup of respondents.  

This would be an important problem if you were analyzing you data with one of the simple count analyses described in the academic literature about MaxDiff:  those count analyses work so well just because the experiments tend to include small numbers of items and the researchers have the luxury of making perfectly balanced designs - usually simple one block designs in fact.  Unlike the counts and proportions utility estimates you may read about in the MaxDiff academic literature,  when we use the multinomial logit statistical model to estimate utilities, it provides statistical control for the slight imbalances in the designs that respondents see.  

Now, all that said, we know that there are version effects but research we've done finds that they are small, accounting for 2% to 5% of the variance in MaxDiff utilities, according to analysis done and published in our 2013 Conference Proceedings (see the paper by Chrzan and Hill here:  http://www.sawtoothsoftware.com/support/technical-papers/100-support/proceedings/1426-proceedings2013)
answered Nov 21, 2016 by Keith Chrzan Platinum Sawtooth Software, Inc. (93,025 points)