pull down to refresh

The work that we do as developers typically includes a need for precision. The LLM's lack of intelligence can be well hidden in an essay it may write, which contains some facts, some claims and plenty of filler language. However when boiled down to a logic problem delivered with human language, the statistical model that an LLM is quickly breaks down. We're not looking for an answer, we're looking for the answer.
If you happen to deliver your prompt the "right" way, you can stumble upon correctness:
But, if you prompt a different way, it will be confidently wrong.
So, is this "intelligence"? Can it gather user requirements better than you? No, there is not thought behind this, it depends on you to stop asking questions when you get the right answer.
A lot of these challenges can be overcame by teaching the LLM to use a separate tool [1]. For example, the LLM doesn't need to know how to play chess, but just needs to know how to use a chess engine. It doesn't need to know how to calculate, it just needs to know how to write a python script using Numpy/Scipy.
If you take your example and have the LLM respond with Python code that counts the letters and returns the positions it is more likely to get it correct. At some point the OpenAI team will figure out how to get it to do this behind the scenes. You can actually ask it to analyze a CSV file and it automatically writes Python code to do it, so this is already close to reality.
Eventually we'll see these systems translate prompts into logic arguments and then use tools like Coq [2] to make provably true statements. They can start leaning on compilers and language servers to error correct as they write code. Along the way they can write tests to check their work. The bar these LLMs have to clear is just to produce no more bugs than the average software engineer and that's honestly a fairly low bar.

  1. https://arxiv.org/abs/2302.04761
  2. https://coq.inria.fr/about-coq
reply
The example OP used is like saying, "look, this sports car can't go up stairs, therefore walking is the superior form of transport."
LLMs are not meant to count instances of a character in a string. We have better tools for that.
reply
That's why its called "artificial" intelligence
Its just predicting the statistical likelihood of the next token based on the corpus of human language in writing.
Autocorrect isn't an author, therefore LLMs aren't developers. But thanks to spell-check, and autocorrect, human authors don't need to hire as many human editors and proof-readers.
Likewise, project managers may not need to hire as many devs to write software that meets their requirements.
reply
Would you therefore argue that spell check ultimately took away editing jobs?
reply
In a sense, yes. But also, spellcheck (and wordprocessing in general) made it easier than ever to become an author. Which may have increased demand for human editors.
Likewise, with LLMs its never been easier to be a project manager. So the demand for human devs may still increase even if the number of devs per project decreases.
reply
I'm skeptical that LLMs make it easier to be a project manager, but I suppose time will tell.
reply
Someone who programmed in raw assembly in 1980s would probably say a Python dev in 2010 is basically writing English (and is not a 'true' programmer).
Now imagine that Python dev is looking ahead 30 years at someone chatting with an LLM to produce code. Maybe, its the PMs that are going to lose their jobs. Maybe the devs of tomorrow look a lot more like the PMs of today.
reply
Which model is this GPT 3.5, 4, or 4o?
reply
Ok, finally found it. This is the 4o model.
reply
I just asked 4o, then swapped model to 3.5, then back to 4o:
That's hilarious.
reply
It's kinda dumb that it considers itself as two different models. It would be like you having a conversation with me, then someone swaps my brain in and out, and I would apologize for that other brain making a mistake.
reply
I still cannot understand the hype in the developer community. I tried Github's Copilot 6 months ago, and several plugins with integrations to ChatGPT 3.5 and 4 a few months ago, and for me the experience has been very bad. At most, it's a better auto-complete, but not so much better. The only time I've seen it saving time for me was when trying to write a "If this element is not in this data structure, then add it", which anyways would have taken me 20 seconds to write...
But looking at some people telling how AI is "making me 10x productive", or "not using it is like typing with just one hand"... I just cannot understand it.
Last week I asked one of those "10x faster" people what IDE and specific model he was using, to try and see if I'm doing something wrong, and he replied me "ok, if it doesn't work for your specific workflow, then too bad".
So no, I don't think our jobs are going to be taken by AIs anytime soon...
reply
the problem was produced by the client (you :D). prompt engineering is not that simple as others would expect. and using chatgpt doesnt work anyway to use it for coding. its no problem to create fullstack apps with 'one' (iterative process) perfect prompt using gtp4-omni. but this requires hardcore coding skills + excellent prompt engineering and expect some billion tokens to be communicated. (1mil input 5$ - 1mil output 15$). it takes some hours to go trough the whole process and if you have done well you will end up with a working fullstack app + test suite and every best practices and even documentation. otherwise you will spend days reviewing code which has tons of bugs or doenst work at all and finally ending up dumping the whole repo. in that case openai will not refund you for the api usage xD
reply
What process are you using through the chatgpt API to generate a fullstack app?
I imagine there are instructions and some code examples but how does this get up to 1mil tokens? How many instructions do you have to add?
reply
its iterative. and it defenitly needs a bunch of scripts to handle the communication with the API and the command execution. there are some tools like gpt-engineer or gpt-pilot, which - unfortunately - dont work very well. the approach is not low enough imo. definetly use a vm, some good bash scripts for the api requests with the right parameters to get a response which has some specific synthax so that it can be parsed effictively. use the response to either execute commands (downloads, file creation etc) or request more client input (since 4o this can be even the client infront of a camera ;)). than use the console logs and other relevant log data for the next iteration. sorry for the very rough explanation, but I think it's enough to understand why the process can potentially produce not just couple of mils, no I mean mire like 100mil+. oh and possible but NOT recommended for obvious reasons: its no problem to even involve a wallet/payment data and give gpt the ability to create other vms in the google cloud and deploy and get a domain etc.
reply
I’m a dev. What I tell other devs is that AI and LLM/ML aren’t going to take our jobs. However, those that don’t use these tools will be left in the dust and will make themselves obsolete.
reply
It’s intelligence of its own. Important is that the learning curve of these languages is enormously steep. In a year or two we will be laughing at this.
reply