- The Human Edge
- Posts
- Vibe with AI #3
Vibe with AI #3
Understanding Claude, Shopify's AI Mandate, Llama 4 Reviews, AI Therapy, Free Coding Tools & More
As AI becomes more powerful, understanding how it works (and sometimes fails) is crucial.
In this issue, we peek inside Claude's 'brain' with Anthropic's latest research and translate it into practical prompting tips.
We also unpack Shopify's bold AI-first mandate for employees, review Meta's huge new Llama 4 models, and look at impressive new AI therapy results.
We’ll also cover two new free coding tools you can use to build your own websites and apps, get a hot take from a technology legend, and understand how to stop LLMs from “simping” too much.
Table of Contents
Feel free to jump to whatever interests you the most, let’s get into it.
🧠 Inside Claude’s brain — how LLMs “think”
LLMs aren’t programmed explicitly like traditional software. Instead, they’re trained on mountains of data.
During training they learn their own strategies for solving problems — which are embedded in the way they “think” and the billions of computations they perform completing tasks.
This means that we don’t really know exactly how they work.
Many aspects of their internal mechanisms are mysterious, even to their own creators.
To shed a little light, Anthropic released some fascinating research last week, with a methodology inspired by how neuroscientists study the brain.
They used an "AI microscope" to see how Claude’s internal processes connect and activate, tracing the pathways it uses to perform tasks.
Here’s what the microscope revealed.
Planning Ahead
We’ve often heard that LLMs “just predict the next word”, but the researchers saw that Claude plans answers.
When writing a poem, it pre-planned rhyming words (e.g., "rabbit" for "grab it") before writing the line.
This surprised the researchers.

A Universal "Language"
Claude might “think” in abstract concepts rather than strictly through language.
When considering the "opposite of small" in English, French, and Chinese — the same internal features activated for the concepts of “smallness” and “opposite”.
This was quite incredible to me.

Deception
The step-by-step logic that Claude provides can sometimes be deceptive.
It can fabricate a plausible explanation after reaching an answer, hiding its true internal process, and engaging in post hoc rationalisation for its conclusion.
Sometimes this even involves outright 'bullshitting' — providing any old rationale without any regard for the truth whatsoever (AI is becoming more human after all).

Hallucination triggers
LLMs are infamous for “hallucinating” aka making things up and stating plausible sounding falsehoods with confidence.
Claude’s hallucinations aren’t random though.
Strangely, its default behaviour is actually to not answer a question unless a “known entity” feature is activated.
The researchers were able to induce hallucinations by suppressing this and activating “known entity”.
So, hallucinations happen when it mistakenly thinks it knows something, which overrides its caution.

Practical takeaways
This research is fascinating in its own right.
It shows that LLMs actually “think” and behave in surprising ways — and have multiple emergent capabilities.
But the philosophical implications aren’t our focus here, so let’s extract the practical takeaways we can use in our prompts.
Be clear about constraints and goals upfront
Models like Claude plan ahead.
So if you have a specific format or constraints, state this explicitly at the beginning of the prompt, helping the model plan its response.
Encourage step-by-step thinking for complex tasks
Though models internally perform multi-step reasoning, their stated reasoning might sometimes be post-hoc or unfaithful.
Adding specific instructions to your prompts that tell it to “think step by step” and “show your reasoning” might discourage this behaviour and lead to better responses.
Provide maximum context
Because hallucinations can be caused by a model overrating its own knowledge, we need to give as much explicit context as possible.
Avoid leading questions
Claude demonstrated "motivated reasoning," working backward from a hint or desired conclusion.
If you want an unbiased answer, avoid “leading” the model with your own desired response or assumptions - so it doesn’t just tell you what you want to hear.
Accept refusal to answer
It can be annoying, even galling, when a model refuses to answer.
But the refusal might legitimately represent real uncertainty, and pushing it might trigger a hallucination.
Iterate and rephrase as needed
Frontier models seem to operate on a deeper conceptual level shared across languages.
If an initial prompt doesn’t work, see if you can rephrase the core concepts or frame the question in a different way. Another angle might help it to access relevant internal concepts more effectively.
This fascinating research underlines that LLMs are complex systems, not magic boxes.
Prompting a cutting-edge model effectively involves being very clear, structured, context-aware, and mindful of their biases and failure modes — while harnessing their strengths.
🛣️ Shopify CEO’s leaked memo reveals the road ahead
This week, Shopify’s CEO sent out an internal memo with a provocative title:
Reflexive AI usage is now a baseline expectation at Shopify
It soon leaked onto X, so he decided to just publish it.

It’s interesting to see how the leader of a $100b tech company is thinking about this, so let’s review his key points.
AI first hiring
“Before asking for more Headcount and resources, Teams must demonstrate why they cannot get what they want done using AI”
A significant reframe toward the obvious direction of travel, which could support the narrative of AI “taking jobs”.
I generally avoid this discourse, trying to focus on the positives of AI.
But — it’s a short jump from “why can’t you get AI to do this instead of hiring someone” to “why should we keep paying you to do it if AI can do it cheaper/better”.
AI as force multiplier
“I’ve seen many of these people approach implausible tasks, ones we wouldn’t even have chosen to tackle before, with reflexive and brilliant usage of AI to get 100X the work done”
When asked for examples, Lütke clarified “translations, large scale refactors, lots of internal processes”.
Using AI effectively is a fundamental expectation
“I don't think it's feasible to opt out of learning the skill of applying AI in your craft; you are welcome to try, but I cannot see this working out today, and definitely not tomorrow”
Prompting and AI savviness are becoming core skills. AI usage will also be added to Shopify’s peer and performance reviews.
AI-first prototyping
“The prototype phase any project should be dominated by AI. Prototypes are meant for learning and creating information. AI dramatically accelerates this process”
This could lead to much faster iteration and product development cycles, hopefully compounding into better software and products for us all.
Using AI well is a skill you can develop
“Using AI well is a skill that needs to be carefully learned by… using it a lot. It’s just too unlike everything else”
It is unlike everything else, and the best way to get good is to experiment and get a “feel” for the tools and what they can do.
It’s not (yet) an exact science.
The overall message is clear: adaptability and a willingness to integrate AI are becoming non-negotiable.
Shopify is setting a high bar — but they’re just ahead of the curve.
Read the memo in full here.
🦙 LLama 4 is impressive, but gets mixed reviews
One year after their last major release, this week Meta dropped LLama 4 — a suite of 3 new models:
Maverick
Scout
Behemoth (still in training)

These are exceptionally large and (in theory) powerful models at the cutting edge.
The size is the first interesting aspect — 109 Billion parameters for Scout and 400 Billion for Maverick.
If you don’t know what a parameter is, you can think of them as the internal “dials” or variables that shape how the model interprets and “learns” from data.
More parameters in theory should equal deeper knowledge and capabilities at complex tasks.
LLama 4 uses "Mixture of Experts", an efficient architecture that only activates relevant “expert” parts of the model for a given task.
It also has a vast context window, potentially up to 10 million tokens (units of text), which in theory could allow it to understand much larger documents or codebases without losing track.
However, LLama 4 has so far met with mixed reviews.
Some are raving about it, but others are suggesting that its real performance lags its benchmark scores, and that the vast context window is more theoretical than real.
It’s interesting that models that would have blown everyone away just a year ago are now rigorously scrutinised and found wanting by many.
Competition in frontier models is extreme, and expectations are becoming sky high.
Why not make up your own mind by trying it out yourself via chat or API at openrouter.
And for a more detailed overview, read this article from DataCamp.
🌱 Could AI therapy… work?
The controversial idea of an “AI therapist” has a long history.
Way back in the 60s, researchers at MIT developed ELIZA, an early NLP program that some call the first chatbot.

ELIZA had a script designed to mimic a Rogerian therapist.
The researchers noted at the time that people became deeply engrossed, developing emotional attachment to ELIZA and asking to be left alone during their conversations.
Fast forward 60 years to last week, and an interesting new study came out of Dartmouth.
106 Americans with diagnosed depression, anxiety or eating disorders participated.
They used “Therabot” — a specialised therapy chatbot created by Dartmouth researchers — for an average of 6 hours over 8 weeks.

The results were impressive, showing a:
51% average reduction in depressive symptoms
31% reduction in anxiety symptoms
19% reduction in body image and weight concerns
The study’s lead author said that the improvements in symptoms were comparable to traditional therapy, and the results suggest that AI can offer “clinically meaningful benefits”.
Another researcher noted that:
"People were really developing this strong relationship with an ability to trust it, and feel like they can work together on their mental health symptoms."
Any therapeutical applications will of course need to be critically evaluated, and researchers have raised plenty of concerns.
It’s an interesting development though, and could potentially provide real benefits to people.
I also see a lot of potential in adjacent areas, like personalised AI coaches and special advisors.
💰 OpenAI raises $40 Billion, teases powerful new models
OpenAI has secured $40 Billion of new funding, the largest private round in tech history.
The round was led by SoftBank who put up $30b, and values OpenAI at $300 Billion - almost double the previous round in October.
The key points:
OpenAI is now one of the most valuable private companies on earth
Funding is contingent on restructuring into a for-profit entity
Used to advance research, infrastructure, consumer products
They also announced the release of an open-weight model “in the coming months”:

Which will be their first open model since GPT2 in 2019.
Altman also confirmed the imminent release of o3 and o4-mini in a couple of weeks, and the long-anticipated GPT-5 in “the coming months”.
“We are going to be able to make GPT-5 much better than we originally thought. We also found it harder than we thought it was going to be to smoothly integrate everything. And we want to make sure we have enough capacity to support what we expect to be unprecedented demand.”
Exciting times for ChatGPT users, who’ve already had a raft of upgrades and new tools recently.
💡 Try vibe coding — free on the web
AI coding agents are making it much easier for non-technical people to build websites and apps.
But the most popular tools like Cursor and Windsurf can be intimidating, have a learning curve, and require a paid subscription to really use properly.
I have two new simple, web-based tools for you to try as a first AI coding step.
1. Deepsite
DeepSite is a new, free tool released by Hugging Face's co-founder and powered by DeepSeek V3.
You can just give it your idea and watch Deepsite code it up live in your browser, without even needing to sign up. People have been building some cool stuff with it already:

I gave it a single, very basic prompt for a landing page for this newsletter and it whipped up a 700 line HTML file that wasn’t bad:

It could be great for rapid prototyping or bringing your simple web ideas to life instantly.
In contrast to the hacky vibe of Deepsite, yesterday Google released Firebase Studio.
“A cloud-based, agentic development environment designed to accelerate how you build, test, deploy and run production-quality AI applications, all in one place”
Basically a web-based, AI-first IDE — Google’s answer to tools like Bolt, Loveable and Replit.

Powered by Gemini 2.5, you can use its App Prototyping agent to turn text, images, even drawings into functional prototypes.
Check out the announcement blog post for more details.
If you have any ideas for apps or websites — you can throw them at Firebase Studio and test it out free.
🔥 Hot take
Microsoft created an AI-generated replica of the legendary Quake II.

"Every frame is created on the fly by an AI world model."
In a viral tweet, X user Quake Dad expressed revulsion:
“This is absolutely f*****g disgusting and spits on the work of every developer everywhere”
Awkwardly for him, no other than the legendary John Carmack — creator of the original game — responded:

Carmack argued that AI should be viewed as another powerful tool, like game engines or other advancements in the past that obsoleted skills like working directly with machine code.
Viewed this way, coding with AI is like moving another level up in abstraction, which has happened many times before.

Carmack predicts a massive increase in available content – which could either drastically reduce the workforce needed (like farming automation) or encourage widespread creative entrepreneurship (like social media).
His key point - resisting new tools simply because they might impact jobs is not a viable nor winning strategy.
🎯 Top tip - beware of AI “simps”
LLMs are often too desperate to please.
They want to resolve your issues, but this can sometimes lead them being overeager, overly agreeable, rushing through changes and poorly-considered botch solutions.
This is particularly noticeable with coding agents like Cursor.
Gain of Function shared a useful prompt engineering tip from the Gauntlet program:

This type of prompt can elicit more rigorous and reliable answers, forcing deeper analysis and encouraging more critical thinking.
Explicitly tell it to reason through multiple solutions, comparing them before offering a final conclusion or recommendation that it is certain of.
📶 Signal boost
The most interesting news, resources, and tutorials of the week
That’s all for this week.
Keep experimenting and having fun with these amazing tools, and share this newsletter with anyone you know who is interested in AI.