pull down to refresh
68 sats \ 2 replies \ @optimism 6h \ on: Achieving 10,000x training data reduction with high-fidelity labels AI
Another one of these and I can train the next frontier model on my RPi2B!
It seems like they are using existing statistical techniques to filter the data to the ones that will be most impactful for training and pull good examples from the groups... Very cool system
reply
I was looking at these
k
chartsand wondered whether .38 for higher complexity and .56 for lower complexity a great result, if human experts reach .78 and .81 among themselves?
I knew I'd seen a paper about this: https://arxiv.org/abs/2501.08167, but it's kinda stone age:
Comparisons Percentage Agreement Cohen’s Kappa Human vs Claude 2.1 Ratings 79% 0.41 Human vs Titan Express Ratings 78% 0.35 Human vs Sonnet 3.5 Ratings 76% 0.44 Human vs Llama 3.3 70b Ratings 79% 0.39 Human vs Nova Pro 76% 0.34
Looks awesome if we realize that Google's results were with a 3.25B model, but the evaluation data provided in the paper was "a mockup", so we don't know if this is apples-to-apples. Nevertheless, I'm a big fan of "less junk in".
reply