Hey, creator of Social RV here! Awesome to hear you're enjoying what we're doing.
Some answers to your questions:
- the target pool has 275 targets in it
- we USED to use the last 9 targets as decoys, but changed to randomly sampling 9 targets from the pool instead several months ago. I've updated the FAQ to reflect that
- the unique identifiers we show the LLM for the decoy targets is not the file name but rather the DB primary key for that target. There should be no information in it the AI could use to bias a decision
- in regards to the tournament-style elimination, we have a new judge coming out soon that does a single pass. When this was originally built, the single-pass wasn't reliable enough on available models
Thanks very much for your thoughtful feedback and questions about what we're doing!
You should do a show HN post, it's definitely interesting. One of the few good cases where blockchain is useful.
And maybe find some statistician to define a proper metric for statistical significance here, since my gut feeling is that naively using a uniform distribution of ratings as the null hypothesis isn't correct (see https://news.ycombinator.com/item?id=46438181)
Some answers to your questions: - the target pool has 275 targets in it - we USED to use the last 9 targets as decoys, but changed to randomly sampling 9 targets from the pool instead several months ago. I've updated the FAQ to reflect that - the unique identifiers we show the LLM for the decoy targets is not the file name but rather the DB primary key for that target. There should be no information in it the AI could use to bias a decision - in regards to the tournament-style elimination, we have a new judge coming out soon that does a single pass. When this was originally built, the single-pass wasn't reliable enough on available models
Thanks very much for your thoughtful feedback and questions about what we're doing!