Great questions, and I like your thinking.
I've been doing holdout CBC tasks for years, and probably the biggest mistake we made early on was to create holdout tasks with minimal overlap--meaning that levels didn't ever repeat within the choice task. But, in the real world, products often share at least some characteristics. And, holdout tasks with substantial level overlap are often more difficult to predict than minimal-overlap holdout tasks. So, they are a more stringent test of our models.
One of the biggest questions is to ask: "Do I really need holdout choice tasks?". From an academic perspective, holdouts are very nice, since they allow us to test our models, test different versions of the models, and assess respondent consistency. From a practical standpoint, they allow us to test realistic product scenarios and competitive situations, and demonstrate to others that the models actually do a good job predicting a new situation that wasn’t involved in the model building.
But, from a practical standpoint, holdout choice tasks and holdout respondents seem like a large additional cost to impose on data collection and analysis. Are those costs worth it for the given situation?