pull down to refresh

The image on the second one doesn't show for me. A bug perhaps?
reply
Yep, they no longer store it in the app history. This was the screen:
reply
Right. So I'd guess it's either a bug, or a filter to specifically stop people from using gpt to solve Duolingo.
reply
Unlikely it's a filter. There is no benefit whatsoever to cheat on Duolingo. You pay to learn.
It's a new model that prefers to engage in conversations rather than do what it is told. They also reduced the number of picture uploads per day from 3 to 1 on a free account. Push people to pay for this crap.
reply
gpt5-main (the non-thinking model) has (still unsolved, I guess they don't wanna) instruction following regressions.
Just out of interest I ran your image with the same instruction through a small gemma3 distill:
ggml-org/gemma-3-4b-it-GGUF:Q4_K_M using llama.cpp server:
I don't know if the answer is in any way correct, but this is all runnable with minimum memory (this particular one should run with 4GB memory), locally.
reply
It simply did the ORC of the japanese symbols
reply
It could be cheating on tests in general
reply
Solve, don't ask me any questions
I told you
This "adverserial" style of evaluating an LLM is interesting, but not the best evaluation. The best evaluation for how good an LLM is is if you give your best prompt instead of antagonizing the chatbot. That's the best way how we find out what its maximum capabilities are.
reply
Because it asks tons of questions otherwise. My best prompt had always been to just share the screenshot, and that worked great in the past. Now it just antagonizes me with its stupidity, laziness and lies.
reply
I noticed this pattern recently a lot
Me: Solve problem X ChatGPT: I did Y, would you me to also do Z (which Z is obviously the reasonable thing to do in the first place as part of solution to X) Me: Don't ask me. Just do complete the task and Z is part of solution.
reply
It seems they just churn the paying customers to spend API calls
reply
Or just they optimized their model for 'user engagement' too much.
Their models always had different hooks for further conversation. Perhaps this objective was optimized too heavily in the training.
reply
So true
reply
This is why I have a massive ollama model backup.
reply