Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

linear coding of categorical variable in CBC

I would like to compare the results of my CBC-HB (part worth utilities) with a linear model. Thus I am wondering if there is a useful way to code my categoric variable (attribute with 7 levels e.g. red, blue, purple ...) with linear coding?
Also it's interesting to know how to code e.g. ordinal scale variables (no, minimum, medium, maximum) linear? Is it simple 1,2,3,4?
Thanks a lot for your help.
asked Mar 20, 2019 by bs77 Bronze (855 points)

1 Answer

0 votes

Linear coding is (sort of) plausible for the second case, IF you make the big assumption that the difference between No and Minimum is the same size as the difference between Medium and Maximum.  That's a little bit of a stretch, but perhaps not too much, especially if you test it and find that the linear coding is an improvement over the categorical, using the -2 log likelihood test..

For colors, I don't know how to turn colors into numbers.  They seem irreducibly categorical to me.  Maybe a physicist would tell us that we could arrange them in the order they are on the spectrum, but I still think that's too much of a reach.  I couldn't really show this to an audience while keeping a straight face, so I would not do this.
answered Mar 21, 2019 by Keith Chrzan Platinum Sawtooth Software, Inc. (115,150 points)
Ok, I was expecting that, but I wanted to be sure. Of course its always an issue to "guarantee" the equidistance, but I also think that in this case someone can do that.

Respectively the situation of the nominal variable "colors" I still wonder if there is a comfortable way to handle this. Just to think loud: Would it be possible to somehow calculate an individual "mean" of the part-worth utilities of the attribute "colors"? Of course the level oriented information would be lost but however if I need to only have an individual utility for the variable itself (and not for each level) is there a way to calculate this (maybe in post processing)?.
Sorry, Boris, I just don't wee how what you want to do would make sense. The mean of the color utilities will be zero because of the way it is coded.  I do not know of a sensible way to do what you want in post-processing, either.
Hey Keith
Thats true. When I wrote that post, I forgot that they sum up to zero. Sorry for that.
But let's stay within this thought experiment: what if colors (I dont really have this attribute but lets stick to it for now) could be clustered in e.g. warm colors (red and orange) and cold colors (blue and arctic). Is it legitim to assign the utilities of the 4 attribute levels (red, orange, blue, arctic) to 2 new attribute levels (warm, cold)? and if so, what would be the best way to do so? Just sum up the utilities, so that the total sum of warm/cold utilities sums up to zero again?
I would not do this by post-processing the data.  If I really believed I could replace 4 colors with two color categories, I would recode the design matrix so that the original 4 color variable because a 2-level color category variable.  Then I would test to see whether my simplification is warranted by running the -2 log likelihood test and seeing if having 4 colors provides a significantly better fit than having the 2 color categories.