I think averaging across the iterations reported in the log file and averaging across the respondents should give very similar results. I think if you were looking at respondent draws, it would be better to average the LLs or RLHs across respondent draws. That's what our software does when it reports the RLH for each point estimate (average across the draws) for each respondent: we average the RLH fit across the draws for each respondent and report that as the respondent's summary RLH.

Our software reports RLH (root likelihood) as you are aware. RLH is one measure of fit, but it doesn't have the nice property as Pseudo R-squared (Percent Certainty) of scaling it with respect to a 0-100 scale.

But, you can convert scores from RLH to Percent Certainty with a bit of algebra:

To keep things simple, let's imagine we had a CBC exercise with just 4 choice tasks, where each choice task had 4 alternatives. The null log likelihood (fit due to chance) for a single respondent would be equal to 4* ln(1/4) = -5.5452.

Let's say our software reported an RLH of 0.5 for a beta draw for that respondent. We can easily convert that to Log-Likelihood as follows: 4*ln(0.5) = -2.7726.

The best possible LL for a respondent is 0.

Now, Pseudo R-squared is computed by finding what percent of way the observed fit for the beta vector is between the null LL fit and the perfect fit (which is 0).