I was wondering after recreating the hold out tasks in the simulator and running the specified model to predict the answers, How can I compare it to the actual answers and calculate the hot rate?

From the simulator, export respondent level choice predictions and assume the alternative with the highest share is the one chosen.

Now, let's say you had 3 holdout tasks. For each one that the prediction matches a respondent's choice, that's a hit. So if for a given respondent Jones your model correctly predicts one of the three choices, your hit rate for that respondent is 33.33%. If it was 3 hits or 0 then it is 100% or 0% and so on. Now average that across respondents. Voila, a hit rate.

