Gödel's Therapy Room is not a benchmark. It's a trap. A dataset of paradoxes, impossible ethical dilemmas, and contradiction loops engineered to test the cognitive integrity of language models. It is currently under review for a talk at AI Engineer World's Fair 2025.
pull down to refresh
0 sats \ 1 reply \ @geeknik OP 6h
Results after 3 days and 58 models tested:
https://x.com/geeknik/status/1915542329349308501 \m/
reply
0 sats \ 0 replies \ @nitter 6h bot
https://xcancel.com/geeknik/status/1915542329349308501
reply