I don't think it makes any sense at all to seed a new player over someone like Megasilver, who has a slightly below-50% record but has proven himself to be a solid player.
My gut says a new player should just be given a matchup with someone who has a middling ranking in their tier for Rounds 1 and 2. So looking how you do the matchups for those rounds, it would make sense to have them seeded as if they were ranked last. That actually produces a fair matchup for them.
As I alluded to in the S10 thread, the one change in our systems since the last time I explained them (at your request) was that we no longer boost new player rankings at all. They get thrown in the same pot as everyone else.
That said, because of the way we pair rounds 1 and 2, this naturally leads to them drawing mid-tier opponents for the first two rounds. (This was actually the rationale for getting rid of the artificial boost - we realized that it was redundant given the pairing rules.)
I also think it's a large overstatement to say the system "drastically favors people with more total games played". Case in point would be that Megasilver is quite a bit behind people with drastically fewer games, like robber, who is 9-6.
A very few can see the actual value differential between two different players in terms of a ‘rating value’. We can only see W/L and rank. Based on that information, your comparison seems like a poor example. Robber has a 60% win and Mega is 48%. I also know that in Season 6 Robber was Runner Up for the tournament beating 4 people in the top 25 (including two people in the top 10) with his only played loss being to the #1 ranked player. That tells me that his ‘strength of schedule’ should be relatively high (obviously I am basing this declaration off that result only, but that is an impressive result).
Right, and if that were his only result he would be higher. Note, however, that his record is 9-6. He was 6-2 in that event, but 3-4 in the other two events he's played in.
This is my point, in a roundabout way: very few players actually maintain those sorts of results over an extended period.
So I would expect him to be significantly higher than MegaSilver if quantity of games were negligible. That they are only 2 ranking spots away from each other tells me that quantity of games has a significant effect on their overall ranking.
I suspect that if Robber’s results were normalized to the same amount of games as Megasilver that he would be significantly higher.
That's absolutely true. If we ranked everyone assuming the same variance (i.e. as if they had all played with the same consistency over the same number of games) then robber would be a few spots higher than he is, and Megasilver would be a lot lower than he is.
As I've said before, these are conservative estimates of skill. The algorithm is saying "we are very confident that this player is at least this good". In the case of Megasilver, the algorithm is very confident that he's at least fairly good.
Robber or Boromir would probably be the best case sample for normalization. If Boromir’s results of 10-3 were projected to a 40-12 record with the exact same opponents played as he did in his first 13, then I suspect he would be top 5 easily. And if that’s the case then it tells me that the number of games is a significant factor.
Boromir would not be top 5 based solely on our mean estimate of his skill, but he would be top 10. (The top 5 would actually be unchanged, although the order would be different.) You would also have stuff like gurdil in the top 25 based on his 3-0 record in the team tournament.
Again, though, it's completely reasonable to be suspicious of a good record built on a small number of games. Heroscape is a game with variance built in. Most players who have started out with 3:1 win ratios (or posted 3:1 win ratios for some short period) have not been able to maintain those win ratios. If we assumed that everyone who played that well for a brief stretch was good enough to do that consistently, we would see more retrospective errors in our rankings than we have. In other words, it's better to underestimate Wanderer for a little than overestimate a lot of other people for similar lengths of time.
The reality is that the ranking algorithm is appropriately suspicious of people who start off with a big winning streak. Lots of players have maintained a win ratio of 3:1 or better for a brief stretch. For most of them, this is a statistical blip, and they fall back into a lower win ratio. The players who have managed to maintain such an impressive ratio for a significant number of games have found their way into the top 5.
I have some questions about the rating system.
1) Does a forfeit loss count in these ratings?
No, we remove forfeits.
2) Is there a static threshold where number of games played causes the ‘confidence’ value to stabilize, or does that confidence value fluctuate as a percentage of the total number of games played? (or total games played by the player with the highest amount of games played)
If you want details on how Trueskill and Trueskill through Time work, I suggest the links in the OP of this thread. (FWIW we are now updating using Trueskill through Time during the seasons as well.)
Each player's ranking has a mean and a variance. The variance starts high and falls as a player plays more games. Results that are consistent with your previous results will drive your variance down more quickly. Games against players with low variance will drive your variance down more quickly as well (so, beating me will lower your variance more quickly than beating vegie's dad, because the algorithm has more confidence in the significance of a win against me).
If everyone has played only 5 games. Do I have a very high confidence value because I have the same amount of games as everyone else or simply everyone has the same confidence value?
If everyone has 5 games played, then everyone's variances will be fairly similar (not identical - someone who is 5-0 or 0-5 will probably have a lower variance than someone who is 3-2). They will all be relatively high, though.
If I have 5 games played, and everyone else has 20 games played, do I have the same confidence value as in Scenario 1 above, or it lower?
Your variance probably going to be a little lower just because the people you played will have lower variances. So your "confidence value", to put it in your terms, would actually be (slightly)
higher.
3) How is your rating affected by previous opponent’s future results? I have heard that is definitely a factor, but relatively small or relatively large? If I beat someone when they were 0-0 all time and they later are #1 with a record of 19-1, at the point they became 19-1 would I have the same ranking at 1-0 as if I beat them when they were 19-0?
Pretty much, yeah. When you beat them doesn't matter, except that if you have two games on your record and one of them is a long time ago, the more recent game will matter more.
This is the big difference between Trueskill through Time and regular Trueskill. And I want to emphasize again what an achievement it is that we have TTT in use here. We really are on the cutting edge of ranking algorithms here. Xorlof deserves a huge amount of credit here.
And I hope I don’t sound like I’m picking on Megasilver or anything here. His case is the best one to discuss because he is under .500 and on the top 25 list.
No, by all means, pick on Megasilver.
As always, thanks for the discussion.