Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Calculating the model null log-likelihood

Hi all,

Despite some very informational posts like https://legacy.sawtoothsoftware.com/forum/12252/rlh-and-percent-certainty I am still struggling in understanding and calculating model related indicators like the log-likelihood.

- assume I have 15 choice tasks and 800 respondents.
- each choice tasks consists of 2 alternatives plus one "none" alternative, which makes it 3 alternatives in total per choice task.

HB Analysis provides me information about each respondents RLH and an average (model) RLH of about 0.6.

The null LL (the "floor" as called in the upper mentioned post) would be:
"number of choice tasks" * LN(1/"number of alternatives") = 15 * LN(1/3) = -16.479.  This represents the benchmark for the lowest fit of my model as a number greater than that (more negative) would represent that my model shows no improvement compared to a complete random model.

The best possible solution would be 15(ln(1.0)) = 0.000.

But what I am struggling with is to calculate the actual model LL. Is this happening just by sum up all LL values of the 800 respondents? This would lead to a total LL of  about -7000. But I assume that I understood anything not correct, as this would not make any sense (my model would be by far worse than a complete random model, which cannot be the case).

Also this would lead to a ridiculous result when calculating the pseudo R-squared.

Happy for any ideas on that.
Thanks!
asked Jun 7, 2021 by bs77 Bronze (855 points)
edited Jun 7, 2021 by bs77

1 Answer

+1 vote
You're almost there:  just multiply your "floor" calculated above per respondent by the number of respondents and now it's -16.479 x 800 = -13,183.2 and you're home.
answered Jun 7, 2021 by Keith Chrzan Platinum Sawtooth Software, Inc. (111,275 points)
In addition to Keith's correct direction, remember that our software reports for each respondent their RLH.  (Root Likelihood).  That RLH is the average across the RLHs of the respondent draws.  So, if you are trying to see if you can reproduce the calculation of the RLH for each respondent, then don't forget that the RLH is reported based on the average RLH across the respondent's used draws.
Thanks to both of you. That clarified a lot. So to calculate the explained information in comparison to a totally random model with the same number of attributes and -levels one can do the following, right?:
(LH of estimated model * 100) / LH null model -> Improvement in percentage
Almost:  rho-squared = 1-(log likelihood of estimated model/log likelihood of the null model).
...