Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

MaxDiff Design judgment


I wanted to ask about the design report of the MaxDiff:

1- how to read and evaluate the report of design to know it is good, bad, needs an update, etc

2- If I got a relatively bad design (One-way frequency St. Div of 6.2; 2-way  frequency 2.2 & a warning of Lack connectivity of items) but I still want to use the design because the cases in the design are being used in CBC too since I want to compare the behavior of the respondents in the 2 methods; how will this affect my results? what remedies are suggested to improve my overall analysis?
The aim of the Study is to produce part-worth utilities to variable levels using CBC, MaxDiff and another method, then compare the 3 methods both based on the average results of the sample and the individual of each respondent, so I am using half of the cases generated by CBC in the MaxDiff to keep the cases consistent and reduce the burden of the questionnaire.

Thanks for your help.
asked May 6, 2019 by AMYN Bronze (2,980 points)

2 Answers

+1 vote
The standard design report for our MaxDiff software assumes a standard (Case 1) MaxDiff study, where the researcher isn't setting up mutually exclusive (conjoint-style, best-worst case 2) type prohibitions.

Standard MaxDiff (case 1) is so robust to prohibitions (and also because we wanted to make the software easy for beginning researchers to use) that we decided not to build in a more advanced experimental design testing capability.  We didn't want to make the software interface confusing for a junior researcher.

If more advanced researchers want to go further in terms of evaluating the statistical efficiency of their MaxDiff designs (beyond the simple counts reporting with standard deviations of one-way and two-way frequencies), then we recommend people use the random responses data generator to generate a dataset of random responders of essentially the same sample size you expect to be collecting.  Then, run aggregate logit and examine the standard errors.  Compare those standard errors to another data generation run of random responders who receive the same MaxDiff questionnaire setup...except for no prohibitions.  This allows you to see the loss in design efficiency for a design with prohibitions compared to a design without prohibitions.

Lack of connectivity is a potential concern in what you are describing.  The software complains if it finds lack of connectivity within any of the unique versions (blocks) of the design.  Sometimes, researchers purposefully create designs that lack connectivity within each specific version; as long as such designs have connectivity when the multiple versions are pooled for the analysis (such as for aggregate logit or latent class MNL with relatively few classes).  For example, Sparse MaxDiff, Express MaxDiff, and Bandit MaxDiff approaches all lack connectivity within versions, but have connectivity when multiple versions are pooled and considered together.
answered May 6, 2019 by Bryan Orme Platinum Sawtooth Software, Inc. (177,015 points)
Thank you, Bryan, for your suggestion. But If I removed the prohibitions the design will become Case 1 not Case 2 & I need Case 2 since some of the levels belong to the same attribute and cant appear together.  
So Will lacking connectivity and imbalance be considered a major drawback of the design that requires modification or I can run away with it?
+1 vote
To add to Bryan's comment, I'm a little concerned that you're using half the items from the CBC.  I'm not sure how you chose what half to ask, but unless you did so carefully, your selection process itself could be causing some of the imbalance you're seeing.  I really wouldn't care at all as much about using (some of) the same product profiles from the CBC as I would about using a well-constructed set of product profiles for Best-Worst Case 2 - you'll be open to harsher criticism for using a substandard Best-Worst Case 2 design than you would for using profiles in the Best-Worst Case 2 experiment that were not present in the CBC (in fact, I'm not clear on the value of using ANY profiles from the CBC in the Best-Worst Case 2 experiment, since the response tasks are so different).
answered May 6, 2019 by Keith Chrzan Platinum Sawtooth Software, Inc. (95,775 points)
Glad that Keith is chiming in on this.  I wanted to clarify something as well regarding the use of MaxDiff designer to create conjoint-style (best-worst case 2) designs.  I've done this with good success in our MaxDiff software.  Unless the number of attribute levels is constant across attributes, then it naturally will be reported as having poor one-way and two-way level balance.

For example, if one attribute has 4 levels and another has 2 levels (and you set up the proper prohibitions in the MaxDiff design such that only one level from each attribute can be brought into a MaxDiff question), then naturally the items from the 2-level attribute must appear 2x more often than the levels from the 4-level attribute.  And, the designer will give you a "bad" evaluation of your one-way item frequency.  This is of course expected in the case of best-worst case 2 when the number of attribute levels differ across attributes.  And, it (lack of one-way frequency balance across items) wouldn't by itself pose any problem for analysis.   However, one must be vigilant for the other design issues that affect conjoint designs and best-worst case 2 designs--namely correlation in the design matrix across factors and connectivity.
Hello Keith,
thank you for sharing your thoughts.
I am using half of the cases in CBC following the design published in this paper:
Krucien N, Watson V, Ryan M. Is Best–Worst Scaling Suitable for Health State Valuation? A Comparison with Discrete Choice Experiments. Health Econ 2016:n/a-n/a. doi:10.1002/hec.3459.

Since one of the aims is to compare CBC and MaxDiff performance in producing the utility of a certain Preference Based Patient Reported Outcomes Measure. In the paper, they used the left Case in the CBC in the MaxDiff to be evaluated (half of the cases) as a method for comparing the resulted utilities. other advantages of using the same cases are reducing the burden of the respondents and giving a chance to compare the effects of personal characters of respondents across the different Exercises.

A final point I want to share is that I go a suggestion to reduce the number of tasks in CBC, increase the number of versions (to 100 from 18), include BOTH cases in the MaxDiff design with the same number of versions, what do you think about this suggestions?
It will leave me with a huge load of manual programming with the Third method I am using in my study (Time-Trades-Off) as all its questions are done through Free Formatting. (need to have 100 versions of 5 questions each). but if you think this suggestion will make the study more efficient then I will consider applying it.

Note that One of our attributes has 6 levels while all other attributes (5) has 5 levels each. (total 31 levels)

No, I think your idea of using the left alternative in each choice set to build your MaxDiff is a good one.  Do be sure to check, however, that doing so gives you a balanced MaxDiff experiment, lest you inadvertently impair the quality of your MaxDiff and thereby harm your empirical comparison with CBC.
So unfortunately when I did so it did not give me a balanced MaxDiff experiment. Correct me if I am wrong, this imbalanced MaxDiff is by itself an inferior design to the CBC and so if the CBC results were superior this might be due to the design NOT that the technique is better.
So I can use half of the cases but ith more versions & this will probably improve the balance OR should I use both cases (left and right) and get a better-balanced design than half of the cases but might increase the burden of the questionnaire on the respondent?
CBC is about 8; MaxDiff will be either 8 or 16!

One more thing, Do you think that repeating one task  (for example task 2 is the same as task 10 in CBC) to measure the reliability of the respondent is a good/bad idea compared to adding one fixed task for all respondent or using some relatively complicated statestics for the same aim.
Correct, if you use an imbalanced MaxDiff design you're adding an unwanted design artifact to your experiment.  But see my other comment about the design necessarily being different for MaxDiff than for CBC if you include death in the MaxDiff but not in the CBC.
Removing the prohibitions is just to create a comparison (in terms of aggregate logit standard errors) to the real design you want to field (the BW Case 2 design).  It's to ensure that your estimates will be reasonably precise, relative to a design that isn't burdened with prohibitions.   

 And, testing the design with robotic respondents and estimating the parameters via aggregate logit is a way to ensure that across respondents you at least have connectivity, even though within any specific version (block) of the design you might not necessarily have connectivity.

When you put the mutually-exclusive prohibitions in place, if the attributes have different numbers of levels, then the one-way frequencies will obviously be shifted some compared to Case 1 design that doesn't have the prohibitions in place that create the conjoint-style BW Case 2 design.