pull down to refresh
50 sats \ 0 replies \ @optimism 21h \ on: Search-capable AI agents may cheat on benchmark tests AI
I was reading about this last week I think.
Here's the eval logs: https://huggingface.co/datasets/ScaleAI/stc/tree/main
pull down to refresh