Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

duplicate sys_respNum in CAPI possible?

I am fielding a CAPI survey using a field service provider to which I have given the Sawtooth CAPI tool. The survey itself was programmed by me.
When importing the csv data files I am notified several times of duplicate sys_respNum; each a complete data set but otherwise quite different. I recall reading that the CAPI sys_respNum is generated automatically in a way that actually makes it very unlikely for duplicates to occur.

Is it possible that the service provider has manipulated the data files before passing them on to me?
asked Aug 26, 2013 by alex.wendland Bronze (2,410 points)
retagged Aug 26, 2013 by Walter Williams

1 Answer

+2 votes
Best answer
Hi Alex,

I wouldn't immediately conclude that the service provider had manipulated the data.  Theoretically, duplicate respondent numbers in CAPI shouldn't happen very often.  But, it turns out that the random number generator we use for that particular feature isn't nearly as random as we'd like, and duplicates are actually pretty common.  (Note that the random number generator we use for CAPI is NOT the same one we use for most other processes, such as conjoint design generation.)

On a recent survey with about 600 respondents, we found that duplicate respondent numbers had occurred about 30 times. And, the more respondents you have, the more likely it is that duplicates will occur, even if you had access to a perfect random number generator!  If we used a perfectly random generator to pick 100 numbers between 1 and 1,000,000, there's a 0.5% chance that we'd draw at least one duplicate.  With 200 draws, the cumulative probability of a duplicate jumps to 2%.  With 500 draws, there's a whopping 12.5% chance of a duplicate, and with 1,000 draws, there's about a 50/50 chance that at least one duplicate exists.

More recent versions of SSI Web (starting with SSI Web v8.2.0) implement a smarter CAPI data merge, that reassigns respondent numbers when duplicates are encountered.  It stores the original number as well, so it can recognize the duplicate later if it encounters it again.

So, while I wouldn't rule out data tampering, this certainly isn't a "smoking gun" indicating that data manipulation has occurred.
answered Aug 28, 2013 by Aaron Hill Gold Sawtooth Software, Inc. (10,995 points)
selected Aug 29, 2013 by alex.wendland
Hi Aaron,
thanks for clarifying. I guess this is a good example of probability at work :) In fact, duplicates occured about 5 times in 500 here and the sample will eventually reach 1000. So your response and experience give me some peace of mind.
Hi Aaron,

There is a high possibility of data manipulation in CAPI projects as it gets downloaded in csv file which can be easily edited.

I strongly recommend to device a system through which downloaded file is password protected. Password can be same as modify password of admin module.

Interviewer can simple copy paste rows and change data smartly to complete the desired quotas/numbers
Hi Aaron,
I had also thought about ways to avoid or at least detect manipulation. Isn't possible to add an extra variable/number to CAPI data files that is calculated or constructed via some rule based on all variables in the set? When imported into SSI again this variable can be checked against the rest of the variables to see if it's a match or if data and control integer suggest that values have been altered.
I also like the idea of exporting already password protected csv files.