The larger the sample size you have, the smaller the MAE should be. In my experience, when I'm using about n=800, I often can get MAE across a series of holdout tasks to be about 3 or a bit lower on average.
Only 2 holdout choice tasks is probably enough to gain a cursory view of whether the model seems to be predicting OK. But, we've found that to have strong confirmation and to be able to use holdouts to distinguish between alternative predictive models (such as a main-effects model vs. the same model with additional interaction effects), one would typically need about 5 to 7 holdout choice tasks to have enough statistical information.
Sawtooth Software by default "suggests" 2 holdout choice tasks, mainly to suggest to the user the idea of using holdout validation; but two isn't enough to do a thorough job for strong validation for the purposes of academic-type reporting or journal articles.