Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Exclusion criteria

Hi everyone,

I closed my CBC survey and downloaded the data.

I have a hard time figuring out what criteria to use to exclude participants. I currently exclude respondents that took too long (more than 20 minutes) and that were too fast (less than 4 minutes). Here it was difficult for me to define which time is too fast and which time is too slow.

I also deleted any respondents who always picked the None-alternative.

I saw that I could also identify and exclude bad respondents using RLH.

However, I don't want to end up excluding half of my respondents. I got 320 complete responses. Using the above criteria I already excluded about 70. Using the simulator I estimated that I would need about 150 responses for statistical significance. So it's basically a relatively small survey for my thesis.

I would kindly like to ask you for advice on which criteria to use in my case?

Thank you very much.

asked Jun 22 by maxive94 Bronze (780 points)

1 Answer

0 votes
Best answer
Unfortunately, some sources of online panelists today lead to very large numbers of bad respondents.  It's not always easy to avoid this.  Though, good panel providers have many ways to detect bots and fraud and remove them before they ever have a chance to take your survey.  However, there are many server farms out there with professional respondents who are only in it for the incentives and are very hard to detect.

With CBC, there are statistical ways to try to recognize respondents who are answering randomly (RLH from HB analysis).  However, it's not very hard for cheaters/fraudsters to fool CBC by always picking the same brand or always picking the lowest price--two examples of easy-to-implement decision rules that will be rewarded with high RLH.

To use HB's RLH statistic to identify respondents, first you need to have enough CBC questions relative to the complexity of your attribute list to identify them with high reliability.  A good rule of thumb is that every level should have an opportunity to be seen at least six times by each respondent.  So, if you have a 6-level attribute and you are showing 3 concepts (other than the None) per task, then it takes 12 tasks to show each level six times.

If you have enough tasks to identify random respondents with a high degree of reliability, then you can generate 100s of random responders for identifying an appropriate RLH cutoff level.  You can use Lighthouse Studio's random data generator and a copy of your original project.  Or, you can import 100s of "paper-and-pencil" respondents that you generate randomly in Excel using the RANDBETWEEN function to make choices to CBC tasks.  

Run HB analysis on your random data (perhaps in a copy study of your real one).  Sort the HB results by RLH, and find the RLH value below which 90% of random respondents fall.  This means only a 10% likelihood that random responders will be able to be lucky enough to get a higher RLH.  Setting the value about here (at 90% cutoff) will keep you from accidentally discarding very many "good" human respondents.
answered Jun 22 by Bryan Orme Platinum Sawtooth Software, Inc. (198,715 points)
selected Jun 24 by maxive94
This post reminds me of the best presentation at the Sawtooth Software 2022 conference in Florida.
Thank you very much Bryan. It is an honor to get an answer from the CEO!
This is why I think Sawtooth Software are special. Those at the top show a ton of care and assist the smallest of clients with wonderful advice.
That is true! I am thankful though for anyone that helps me out on this forum. You guys are all great at what you do!