2024-08-14T15:17:02+00:00 | 🔗
Thoughts on lmsys. Say there is a large cohort of human annotators giving feedback for RLHF and generating data. You train a model on that. Then you direct the same cohort of people onto lmsys. Your model will artificially perform better because it has the prefs of those ppl