Tricky question, indeed. The scale factor issue can cause troubles unless you watch for it. The greater the response error, the lower the scale (smaller differences between the utilities). The lower the response error, the higher the magnitude of the parameters.
One of the biggest pitfalls is in comparing respondents or groups of respondents on raw parameters resulting from logit-based estimation routines (aggregate logit, latent class, or HB). Unless you normalize the scale so that each respondent (or group of respondents in the case of LC) has the same range of utility scores, you can draw incorrect conclusions due to large differences in scale factor. And, if you used raw utilities is k-means cluster procedures, the scale differences could become the main driver of cluster membership (bad!).
Luckily, Rich Johnson noted a related issue with ratings-based ACA back in the 1980s, so he used a normalization transformation on the utilities so that our software would report average utilities for groups or the market as a whole using the normalized scale. (And, he recommended the rescaled scores when submitting to k-means cluster). Today, we use a very similar normalization procedure (when presenting average utility results in our simulation routines) called "Zero-Centered Diffs" that ensures that each respondent gets equal weight. It is a post hoc "band-aid" to try to remove the differences in scale factor.
But, market simulations use the raw utilities with their potential differences in scale. But, this has been argued to be proper, since respondents who have more response error should also have probability predictions for market choices that are more even (closer to 1/k where k is the number of alternatives in the market simulations). And, scale factor differences across people have less impact in market simulations since each respondent's choices must sum to 1.0.
I don't know whether one discrete-choice based conjoint method should be more immune to scale effects than another.