View Single Post
  #77  
Old September 23rd, 2019, 12:38 AM
dok's Avatar
dok dok is offline
GenCon Main Event Champion - 2010, 2011, & 2017
 
Join Date: October 9, 2008
Location: USA - CO - Denver
Posts: 23,750
Images: 112
Blog Entries: 17
dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth dok is a man of the cloth
Re: 2019 C3V Playtest Tournament (Round 3 until 9/29)

So, I didn't have any hand in organizing this event, and I don't speak for the playtesting department, but I disagree with you on a pretty fundamental level, @infectedsloth . And I think this disagreement boils down to a pretty basic idea - playtesting is ultimately about a qualitative analysis of results.

This isn't an absolute - of course, the playtesters look at collected results of many games and if one unit keeps winning and winning (or losing and losing), that's a red flag. But the idea that we get enough results to make this a matter of statistics, where we can look at results of a bunch of games against a wide range of known armies of a given power level and parse out level of the unit in question by its results, is just an absolute fantasy. And FWIW I've been involved in playtesting for commercial products (with Plaid Hat) and I'm here to tell you that it's a fantasy in commercial game production, too. It's too many games, too hard to parse out, way too much effort for too little return.

Rather, at the most fundamental level, we rely on the impressions of our testers. We're not just collecting W/L data, we're collecting impressions. And we're in luck, because most of our testers are experienced folks who know what a strong unit or a weak unit feels like. Again, going back to my experience with commercial game development - most of the biggest mistakes I've seen in overshooting or undershooting power level have happened not because too little independent data was collected, but because the data was predominantly less experienced players deploying the tested figures in ways that didn't push them.

So... yeah, we're testing units against other testing units. And if we were trying to compile meaningful win/loss percentages over dozens of games against a range of armies at a specific power levelt, that would be a big no-no. But we're not. We don't have the resources to do that, and frankly, even if we did there would be better things to do with them.

The one area where I kinda agree with you is that bring 2 has some limitations here. Namely, as people figure out the power levels of these armies, we're seeing some armies get a lot more play than others. So in the end we're getting a lot of data but maybe not as broad a data set as we'd like.
Reply With Quote