AI and replacing people
A tale of a new manager and new hires, a parable that doesn't quite absolve AI slop and fantasizing.
I have been dribbling out posts recently on the AI that is coming for our jobs. My rationale has been that:
AI (Generative AI, as in chatbots, and diffusion models) are able to take commands and generate output
This output seems reasonable
The higher in the org a leader is, the more they’re convinced that AI (again generative AI) is the bee’s knees (as their output is high level flowery stories without much detail, but it sounds good)
The further down in an organization you go, the more that the output of GenAI models falls short.
By and large, this problem is because AI as it exists today is not really “smart”, and it really doesn’t reason.
Sure, there are chain of reasoning models that attempt to iteratively improve their performance to be less … well ... bullshit. Yet, as we are seeing with the latest from OpenAI, that their o4 model is more prone to hallucination than the prior release.
Yet, that is not stopping corporations from pushing the pedal to the metal to replace workers with AI1.
That brings me to an outstanding piece by Shrivu Shankar:
In this post, he articulates a theoretical new leader who hires three people, a Market Analyst, a Coder, and a Recruiter. This new manager then gives them each a task with some common constraints, and he walks through the shortcomings, and how this leads to “hallucination” by the flesh and blood.
I strongly recommend reading this, even if you are not a people manager, or have ever been in this situation2.
In short, the terse instructions lead to some ambiguity, some conflicting instructions, and then some corrective actions.
The point being that the hallucinations happen with real people, and that by increasing the specificity of the asks, with a cascading ladder of constraints, you can “train” the LLM to get closer to the original intent.
The author makes this comparison in good faith, and I agree with his reasoning.
That is, you hire three people, you spit orders at them with one or two clarifying blurbs, and you iterate.
So far, so good. (although, I will point out that Alice probably needed some coaching on how to onboard and prepare her new team to perform)
The point being that even in meat-space, people make mistakes, misinterpret commands, and get it wrong. (And that it is incumbent on Alice, the “leader” to recognize these shortfalls, and to correct the behavior in the future)
Shrivu’s point is that you can work with LLMs to get there. This is the whole “prompt engineering” phenomenon, that removes the blame for bad results from the model to the person interacting with it.
And that is only half the story.
The Real Difference between people and LLM’s
As I mentioned, that goes only so far. Yes, you can learn to better prompt, to reject results that are bogus, and to refine to get a decent result.
And in this case, with Alice using presumably three different models to perform the three tasks, Schrivu lays out how to build better constraints in the prompts to get closer to an optimal result.
Yet, as anyone who has managed people, you have to give them some autonomy, and while early on, they struggle, as time goes on their comfort and experience grows, and the amount of handholding and course correction will naturally lessen. As they settle in, they will begin to understand terse instructions, to read between the lines, and to deliver the correct results.
But LLMs? No, they do not think. Sure, if you are using some commercial model, as you interact with it, it internalizes what you toss at it and adds it to their training pool (and if enough — tens of millions — of people are doing similar tasks, this extension of the training set with actual prompts and internalized data will begin to evolve).
Thus, even if that happened an infinite number of times, the model will still be a huge repository of information that has been digested, tokenized, and an adjusted statistical model will still be selecting likely or probable strings to spit out (yeah, I know this is over simplified), but is does not think. It just guesses. Sure, you can improve the quality of the guess, but it still doesn’t have any cognitive ability, no matter how much it may seem like it.
And that is why three models to do Market Analysis, Coding, and recruiting will not be as good as having three real people in those roles, who bring judgement, prior experiences, and the ability to question whether the results they are getting are aligned to the original intent3.
If Alice was a LLM or a bot, she wouldn’t know that she was getting garbage. She would just surface it, let it stand, and move on.
That scares the fuck out of me, that we (corporations large and small) are rushing to replace people with these models (the next big thing are AI Agents that can do general-purpose tasks, except that they are terrible at some pretty common things).
Sam Altman and Dario Amodei are bullish on true AGI4 appearing soon, but the clear thing is that the current trajectory of LLM’s and Diffusion models are hitting a wall and tossing exponentially more compute and training data are not leading to measurable improvements in performance. Indeed, the o4 release of ChatGPT’s chain of reasoning model is objectively worse than the prior version, while costing multiples more to process queries and output results.
Oops.
What it means
Well, it’s not good. Corporate America is chomping at the bit to increase proliferation of AI to replace their flesh and blood workers, particularly in areas like coding and marketing. The first wave is going to slaughter the early in career positions, but that will cascade.
Have you experienced Generative AI in the workplace? Please share your thoughts in the comments.
My company is quietly making it much harder to hire flesh and blood, requiring a delicate dance about why you can’t just use AI for the work needed.
Since you aren’t gonna read it, the TL;DR is: Alice is given three requisitions, she hires three people, a Market Analyst, a SW Engineer, and a recruiter. She gives them terse instructions, augments them with some confusing clarifications, and they each deliver a reasonable result, but not what she was seeking. The gist of the article is sure, AI can do things so-so, but so do people, and you can iteratively improve your asks (that is in the GenAI lexicon “prompt engineering”) and get good results. Thus AI is not really that different than real live people.
The best of both worlds is to have three people who then work with three models to get results, and as the people learn, they can judge the efficacy and utility of the LLM output
AGI is artificial general intelligence, that is a machine that is as smart as and can do anything a skilled human can accomplish on just about any task.
I’ve been trying Chat GPT some. It certainly is quick to generate creative ideas and writing possibilities. First rule: Never accept or forward the outcome without first reviewing and editing. At this point it appears that Human control/contribution is always needed.
We may one day ride a driverless taxi.
But not yet, for me.
🙏❤️
Since I am retired, I’ve not experienced AI at work. 😁 I did, however, read an interesting piece on AGI this morning that you might appreciate. https://www.aisnakeoil.com/p/agi-is-not-a-milestone