# Difference between ZC Diffs from HB and Latent Class

I did a CBC on the subject of the distribution of mail.

I ran a HB analysis to calculate WTP.
From the HB analysis, we saw a logical order in the average utilities of the levels of the attribute "delivery location":

Post point                                                                                  -11.7
Post office                                                                                  -18.6
Parcel locker 500 meters from my house                   10.3
Parcel locker 2 km from my house                                  -34.0
Parcel locker 4 km from my house                                  -83.8

The client now asked to run a segmentation on the CBC output.
So I ran a Latent Class analysis within Lighthouse.

When I look at the "1 group" solution, I see the same order in the ZC Diffs of the levels of the attribute "delivery location" as seen in the HB analysis.

Post point                                                                               -21.9
Post office                                                                                   -28.9
Parcel locker 500 meters from my house                     9.3
Parcel locker 2 km from my house                                   -45.8
Parcel locker 4 km from my house                                   -94.8

However, the absolute values are very different. My first question therefore is: what causes this difference?
If I am correct, the average utilities in HB are the average ZC Diffs calculated by HB. And in Latent Class, we also get average ZC Diffs. But what is the difference between the ZC Diffs from HB and Latent Class?

A second question is regarding the 5-group solution I get from the Latent class output.
As you can see from the 1-group output, the level "Parcel locker 500 meters from my house" is preferred over "Parcel locker 2 km from my house" and "Parcel locker 4 km from my house", which seems logical.
However, in my 5-group solution, there is one segment that has the following ZC Diffs for "delivery location":

Post point                                                                                    53.76
Post office                                                                                    63.57
Parcel locker 500 meters from my house                  -311.07
Parcel locker 2 km from my house                                    53.43
Parcel locker 4 km from my house                                   18.29

So this is a group that really doesn't prefer "Parcel locker 500 meters from my house", but does prefer the parcel lockers at a larger distance from the house. This seems a bit strange.

I wanted to check the ZC Diffs for this segment. But can of course only do this on the ZC Diffs from HB (as the ZC Diffs on respondent level are not an output in the Latent Class analysis).
But when I single out the respondents from this segment and calculate the average ZC Diffs I get the following output:

Post point                                                                                  -10.33
Post office                                                                                  -23.52
Parcel locker 500 meters from my house                   7.42
Parcel locker 2 km from my house                                  -65.27
Parcel locker 4 km from my house                                 -128.52

So now I find, for the respondents for this specific segment, that the "Parcel locker 500 meters from my house" IS preferred over "Parcel locker 2 km from my house" and "Parcel locker 4 km from my house".

So my second question is, what causes this difference? Latent Class gives me a segment that is really opposed to "Parcel locker 500 meters from my house", but when I single out these respondents and look at the HB ZC Diffs, this doesn't seem to be the case.

Many thanks for any help

For the first question, HB is going to produce a model for each individual and then report the averages of those models.  That will produce something different than a 1-group Latent Class approach, which is more like combining all respondents together to create a single model that tries to fit all of the choice tasks (like 1 respondent answers all of the questions).  So generally it would be expected that those two different approaches would be generally in agreement with each other (as long as your data is good), but they will also definitely be different from each other since they are different approaches to modeling the same data set.

For the second part of your question, as is the case with most clustering algorithms, it's not guaranteed that you will always get a nice 2/3/4/5 etc. group solution each time.  Sometimes one of your groups might just be people who didn't fit nicely into the other groups, or maybe it has a lot of respondents who were being a bit more random than others.  You might want to check on the fit statistics of these people.  It could be that if they are answering more randomly, HB is leveraging the other respondents more heavily to kind of "smooth" them towards the sample means, while something like Latent Class is going to be a lot more unstable when you put a lot of random-answering people together into a group and try to calculate a group-level solution for them.
answered Apr 20, 2021 by Platinum (55,820 points)
Hi Brian,
The answer to my first question is clear.

For the second question, I checked the fit statistic for this segment; and the average RLH for this group is actually higher than the average RLH over all respondents.
Does this mean that this is a "real" group of people, that just doesn't prefer "Parcel locker 500 meters from my house" but DOES prefer the parcel locker at larger distances? And not just a group of people that random-answered the tasks? Or can it still be a group of respondents that "just don't fit in" anywhere else; and that is the only reason they are together?
I still find it a bit strange that the very big aversion towards "Parcel locker 500 meters from my house" is not reflected in the HB utilities.
How do you  handle such a group?

Also, is it allowed to create segments with Latent Class, but then use the average HB utilities per segment to compare the segments?

Many thanks
kind regards
Tina
It sure seems weird that that one parcel level would be so negative, and that the averages of those same respondents would not exhibit that preference when they are modeling with HB. I can hypothesize more, like maybe a lot of the respondents in that group don't find location important, but it sure feels difficult to give any sort of concrete answer to you without diving into the data.  That's something someone from our consulting team could do as a small project if you wanted to reach out to them at analytics@sawtoothsoftware.com.

Generally speaking, I think it's much more common to use Latent Class to identify groups but use HB for running simulations.  It's just so odd that you aren't seeing similar patterns between the Latent Class group and the averages of HB, so again it makes it difficult to say sure, use the Latent Class group assignments but report averages using HB (which don't reflect the reason the Latent Class group seems to exist).  Maybe that is evidence not to use the 5 group solution?