pull down to refresh

Solve, don't ask me any questions
I told you

This "adverserial" style of evaluating an LLM is interesting, but not the best evaluation. The best evaluation for how good an LLM is is if you give your best prompt instead of antagonizing the chatbot. That's the best way how we find out what its maximum capabilities are.

reply

Because it asks tons of questions otherwise. My best prompt had always been to just share the screenshot, and that worked great in the past. Now it just antagonizes me with its stupidity, laziness and lies.

reply

The image on the second one doesn't show for me. A bug perhaps?

reply

Yep, they no longer store it in the app history. This was the screen:

reply

Right. So I'd guess it's either a bug, or a filter to specifically stop people from using gpt to solve Duolingo.

reply

Unlikely it's a filter. There is no benefit whatsoever to cheat on Duolingo. You pay to learn.

It's a new model that prefers to engage in conversations rather than do what it is told. They also reduced the number of picture uploads per day from 3 to 1 on a free account. Push people to pay for this crap.

reply

It could be cheating on tests in general

reply

gpt5-main (the non-thinking model) has (still unsolved, I guess they don't wanna) instruction following regressions.

Just out of interest I ran your image with the same instruction through a small gemma3 distill:

ggml-org/gemma-3-4b-it-GGUF:Q4_K_M using llama.cpp server:

I don't know if the answer is in any way correct, but this is all runnable with minimum memory (this particular one should run with 4GB memory), locally.

reply

It simply did the ORC of the japanese symbols

reply

I noticed this pattern recently a lot

Me: Solve problem X ChatGPT: I did Y, would you me to also do Z (which Z is obviously the reasonable thing to do in the first place as part of solution to X) Me: Don't ask me. Just do complete the task and Z is part of solution.

reply

It seems they just churn the paying customers to spend API calls

reply

Or just they optimized their model for 'user engagement' too much.

Their models always had different hooks for further conversation. Perhaps this objective was optimized too heavily in the training.

reply

So true

reply

Tried again that absolutely useless piece of garbage ((

https://chatgpt.com/share/68f7579b-a70c-800c-b98f-d9fd5962c899

reply

This is why I have a massive ollama model backup.

reply