pull down to refresh
287 sats \ 1 reply \ @optimism 7h \ on: GDPval: Measuring the performance of our models on real-world tasks - OpenAI AI
This is why I feel that this is all a sales pitch. Also, I don't hire in the bottom 50%. I hire in the top 2%. Get 100 resumes burn 98, invite 2, hire 1.
The other thing is that I'd be Gell-Mann-amnesia-style betraying my own conscience by believing this, as just this morning I got code that didn't work when I tried something. And it wasn't even that hard to do it right. So expert level? No. Only if you are a lil yolo bitch with a big mouth on twitter that calls themselves an expert. In that case, you shall lose your internet credits. Preferably yesterday.
Well, a lot of this really depends on who the "expert" humans in the blond test were.
I read 50% on the chart to mean that it is a coin toss to know whether graders thoought the human or the ai did better work. Less than 50% means graders tended to rank ai as doing less good work than the humans. Greater than 50% means they tended to rank ai as doing better work than humans.
So the important factor is were the humans the ai was graded against "top 2%" kind of people.
Also, the point about ai failure being more likely tonne catastrophic is valid.
Finally, I'd say I have no doubt that openAI is pumping their own bags with a sales pitch in every piece of info they put out. But even so, there is something here.
It feels to me like when social media was bursting onto the scene. I mostly dismissed it because I didn't see the utility and I didn't trust the promoters. Yet, lately I come and lately I see that there may be some utility here. It may be an open question whether it is a net benefit, but it certainly is a powerful tool to do something. I see ai in the same light (and perhaps I'm just scared of repeating what I now see as a mistake in my attitude toward social media).
reply