This is a very involved question. First, two points:
1) Obtaining highest fit in terms of RLH or Pct. Cert. is not the goal for HB and shouldn't be your goal either. HB purposefully looks for a compromise solution that fits each respondent's data reasonably well while also obtaining individual-level estimates that seem to have a high likelihood of coming from a multivariate-normally distributed population (with variances according to the priors set in the software).
If we were just interested in obtaining the highest individual-level internal fit to the data, we'd ignore the latter part of the hierarchical model (the upper-level model) and would just run individual-level logit. You can approximate this by setting the prior variance in the HB settings to be huge, such as 100 and the prior degrees of freedom to be huge, such as 5000. You'll see your fit VASTLY improves...but you'll be overfitting and these individual-level utilities will not have as good of predictive fit to new observations (e.g., out of sample choices).
2) The default priors (prior variances and covariances) we've set in our HB software work well as long as the beta coefficients are in the expected range. We're using a prior variance of 1.0, which we've found works well for beta parameters when the design matrix is coded as effects-coded or dummy-coded (involving 1s, 0s, and -1s). However, if you code a quantitative attribute in a way that leads to very different magnitude of beta than those we're assuming, then the priors we've set aren't quite right. Of course, you could just adjust your prior variances to account for doing very different things with your X matrix coding...alternatively, you could just code your X values to keep things well behaved with our default priors. We've recommended the latter in our documentation.
I've found that if you keep your quantitative (linear) coding for variables in the X matrix in the range of around single digits in differences, then convergence tends to be good (given our default priors). Best convergence happens if the range of your quantitative coding of an attribute (for the X matrix) is around 1 or 2 units and if the values are zero-centered. However, if you code things from (say) -100 to +100 for a quantitative attribute in the X matrix, convergence will suffer and parameters may be biased. Also, if you code things from -0.01 to +0.01, you'll also have troubles with convergence. That's because in either of these two examples, the expected variance of the resulting beta is too far different from the priors.