The Human Edge
Posts
Vibe with AI #1

Vibe with AI #1

Vibe coding struggles, voice agents, Gemini 2.5, and the rise of MCP

Tom Goss
March 28, 2025

AI is radically expanding what one person can accomplish.

The technology and best practices are moving rapidly, people are building remarkable things, but the signal-to-noise ratio is abysmal.

The gems are scattered across platforms, buried in feeds, and drowning in hype. A goldmine, but a mess.

That's why I'm launching Vibe with AI — a weekly snapshot designed to help professionals, builders, and creators stay informed while cutting through the noise.

I've worked in tech for years, but it wasn't until I started using tools like Cursor and Bolt that I really felt the ground shift. Suddenly, projects and tasks that were beyond my skills and resources became possible.

I've become obsessed with following the space and testing out the new tech and tools, using it to bring my ideas to life. I’m learning and experimenting every day, trying to discover what works, and I want to bring you along too.

We won’t get deep into abstract theory — just curated news, practical techniques, useful new tools to try out, and guides that actually work.

Each week, you'll get:

A roundup of the week's most important AI developments
Real-world examples of interesting projects people have launched
Frameworks, guides, prompts, and tutorials you can use right away

The goal? Help you understand and use AI to transform your work — because mastering this tech today is one of the highest leverage things to focus on.

Now that you know what to expect, let's jump into this first edition.

🥷 There’s no such thing as vibe security
🗣️ Voice Agents will really take off soon
🚀 MCP Madness
🥇 Gemini 2.5 Pro is really impressive
🖼️ OpenAI’s new image tool is insane
🧠 Top Tip
⚒️ Tool of the week
🔥 Hot take

🥷 There’s no such thing as vibe security

“Vibe coding” is a new trend on X. Some think it’s cringe, others are having great fun with it.

Andrej Karpathy, a famous engineer, coined the term a couple of months back:

❝

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it.

Vibe coding is incredible indeed, and it's now real thanks to tools like Cursor and Windsurf.

But it’s one thing to vibe code as an elite level programmer and technologist like Andrej, quite another as a mere mortal —

This was illustrated last week when Leo posted about how he built his SaaS with “zero hand written code” and already had paying users.

Nice, right?

Well, the problem with releasing vibe coded apps into production became painfully clear when he started getting hammered with security exploits within hours of his post going viral.

Turns out shipping AI-generated code straight to production without proper security audits isn't a good idea.

In a follow-up tweet, Leo shared how he fixed some security issues:

"Rolled all API keys and moved them to environment variables"
"Implemented authentication to API endpoints"

These are not advanced security measures - they're day-one basics that any production app should have. It's like launching a jewellery store and remembering after someone has robbed the place to put locks on the doors.

The point isn’t to attack the guy, he actually took it all in his stride and is now completely rebuilding the app with Bubble. He will bounce back.

The real point is that while AI can help you write code much faster, and can now build entire features and even whole apps and websites — you cannot trust it to build them securely.

This is less of an issue for building your own tools for personal use, or to share with friends, family and colleagues. But it’s critical if you want to launch publicly, store user data, or charge money for a product.

This incident sparked a lot of discussion. Many voices were unsympathetic, but some helpful advice was shared too.

Ted Werbel shared some good security tips:

Are you a vibe coder building in public?
Here are a 5 important tips to keep your code + users protected 👇
1/ don’t share a photo like the one below!! if sharing your screen in videos or live streams, do not show your env variables. Make sure you know the difference b/w client
— Ted Werbel (@tedx_ai)
3:04 PM • Mar 19, 2025

Kaamiiaar also put out a solid 17 point security checklist tailored to NextJS apps.

The bottom line though:

You need to know what you are doing or pay someone who does.

You simply cannot “trust” an LLM to do a job properly when the price of failure is high. It requires supervision and checking its output.

They can do a brilliant job on one task and blow you away, then randomly screw everything up and cut a ton of corners on the next….. for no apparent reason.

Personally, I am going to brush up on security best practices, take a couple of good courses and learn the basics. I am also going to hire an expert for an audit before I deploy anything I want to take seriously.

🗣️ Voice Agents will really take off soon

OpenAI just unleashed a trio of audio models that’ll make voice apps sound more human than ever.

Three new state-of-the-art audio models in the API:
🗣️ Two speech-to-text models—outperforming Whisper
💬 A new TTS model—you can instruct it *how* to speak
🤖 And the Agents SDK now supports audio, making it easy to build voice agents.
Try TTS now at .
— OpenAI Developers (@OpenAIDevs)
5:25 PM • Mar 20, 2025

Firstly, they released two new speech-to-text models:

gpt-4o-transcribe
gpt-4o-mini-transcribe

That both outperform the previous king, Whisper, and have transcription error rates as low as 2.46% in English.

Flipping the use case, their new gpt-4o-mini-tts text-to-speech model lets you direct its tone — from “sympathetic customer service rep” to “medieval knight narrating an epic tale” or even a “true crime buff.”

They’re available in the API, and the Agents SDK now supports audio, so we’ll see an explosion of voice agents built with this tech in the coming weeks and months.

Try it out yourself at openai.fm (it’s quite fun!).

There are so many applications of this, here are a few ideas that you could build:

Automated Customer Support Role-Play Tool: a training tool for customer service teams that simulates challenging caller scenarios, using voices like a frustrated customer or a confused senior, with the system listening to responses and providing feedback.
Dynamic Audio Marketing Campaigns: a service for businesses to generate personalized audio ads on the fly, narrating in styles like a professional announcer or a casual surfer, tailored to customer demographics, and instantly converting website text into engaging audio for podcasts or social media.
Interactive Audiobook Adventures: choose-your-own-adventure audiobooks that narrate in dynamic voices—like a medieval knight for fantasy scenes or a true crime buff for mysteries—adapting tone based on your choices, while listening to your voice commands to steer the story.
Virtual Language Tutor with Accent Coaching: an app that analyzes your spoken language, identifies accent nuances or mispronunciations, and models correct pronunciation, offering real-time feedback to help you improve.
AI-Powered Radio Drama Generator: a tool where you input a short script, and it generates a full radio drama with distinct character voices—like a surfer for a laid-back hero or a soothing bedtime narrator—complete with emotional tone shifts, ready to share as a podcast.

The possibilities are endless, thanks to AI we really might go into an age of voice tools after a few false starts over the past decades.

Lettercast.ai gave us a demo of the models in action, with a mad scientist and a chill surfer narrating Sam Altman’s “The Intelligence Age”:

Also this week — Sesame AI released CSM-1B.

It’s a billion-parameter voice model that powers its viral assistant Maya, and generates eerily realistic voices from text or audio inputs.

Check out the demo.

They’ve also released it open source under an Apache 2.0 license!

The real game-changer is how these tools are being packaged for developers. With simple APIs, reasonable pricing, and high-end open source — we're about to see an explosion of voice-enabled applications.

🚀 MCP Madness

If you follow AI news at all, you’ll have heard about Model Context Protocol aka MCP.

MCP is an open protocol that acts like a “translator” to help AI systems connect to tools and data sources.

Imagine you have an AI app that can answer questions or do tasks, but it needs to access things like your Notion database, a browser, or a calendar to get the job done.

The MCP server acts as a middleman: it describes the tools and data in a way the AI can understand, and then lets the AI use them properly. It will dramatically simplify how we build AI tools that need to interface with multiple data sources.

If you want a deeper primer:

A lot of businesses and platforms have been adopting MCP lately, and there are more than 1800 MCPs on cursor.directory, including many big names:

This week though, MCP got even bigger.

Firstly, Sam Altman announced that OpenAI is adding MCP support across their products, starting with the Agents SDK.

This will allow developers to connect agents with tools through a massive (and growing) range of MCP servers and open up a lot of new possibilities.

This wasn’t on everyone’s radar, and has taken some industry veterans by (generally pleasant) surprise.

That’s because MCP was built and released by Anthropic, and collaboration between AI giants isn’t the norm in an industry known for cutthroat competition. This makes it all the more significant as a step toward industry-wide shared standards and interoperability.

Next up, Zapier launched an MCP server, enabling AI assistants to connect with over 8,000 apps in its ecosystem.

They announced that:

We just launched Zapier’s Model Context Protocol (MCP) – a simple way to give your AI the power to do real things like sending emails, updating spreadsheets, scheduling meetings, and more, across 8,000+ apps. Instead of building custom integrations, you expose Zapier actions to your model with a single endpoint. It’s secure, scalable, and works with any LLM that supports function calling or tools. We built this for developers and AI builders who want to move fast and skip the plumbing.

As if those two weren’t enough — Cloudflare launched a way to build and deploy remote MCP servers to their global network.

As their Director of Product said, this will change the internet.

This is a big deal because it eliminates the need for local server setups. Previously you had to set everything up on your own machine, and MCP hadn’t really been brought online.

Now though, they can live in the cloud with built-in auth, making them accessible to anyone through a simple login, bringing AI agent capabilities to a much wider audience.

Whether you're building AI tools or just using them, the days of them being confined to chat windows are rapidly coming to an end.

If you want to see some more MCPs in action, check out:

This trend is still heating up, and it seems like MCP has emerged as the AI integration standard.

p.s: if you feel like getting a little more hands-on, Ian Nuttall released a cool short tutorial on setting up your own MCP server in Cursor — check it out.

🥇 Gemini 2.5 Pro is really impressive

Google seemed to get off to a sluggish start in the AI race, but they’ve now found their stride.

Yesterday they launched Gemini 2.5 Pro, their most advanced model yet.

It’s number 1 right now on the LmArena leaderboard — a crowdsourced AI benchmarking platform - showing excellent abilities in reasoning, coding, math, and creative tasks.

It has the signature Gemini massive context window (soon expanding to 2M), and handles multimodal inputs like text, audio, images, and video.

Learn more:

Video review of the model, where it one shots a Rubik’s cube, virus simulator, lego sim, and other tasks that top models struggle with.
Cool breakdown and demo

Overall this seems like a real step forward for Gemini — give 2.5 Pro a try!

🖼️ OpenAI’s new image tool is insane

GPT-4o, OpenAI's first multimodal model, just got a major upgrade — with native image generation across ChatGPT tiers.

Previously, ChatGPT called a separate image generation model, DALL-E.

But GPT-4o is natively multimodal, meaning it understands text, code, and images as one unified language.

Also interesting - it's also an autoregressive image generation model.

Most image gen models, like DALL-E or Midjourney, are diffusion models. This means that they start out with "noisy" images then refine them, gradually cutting out noise until the image is clear.

Autoregressive models, by contrast, create images sequentially - building them "pixel by pixel" and predicting the next part based on what’s already generated.

This makes them well suited for:

Image-to-image transformation
Photorealism
Following instructions, accurately rendering text and diagrams

The text rendering is the best I’ve seen:

It’s fantastic for creating infographics and educational images:

And seems like a knockout for marketing materials and ads.

Credit Lucas Crespo

People have been having a lot of fun with this on X, and generating some truly impressive images from simple, single prompts.

Here are a few examples:

This all went super viral over the past couple of days, maybe a bit too viral…..

Anyway, AI is revolutionising creative design and marketing just as much as it is software development — and the rise of the “vibe marketer” is inevitable.

🧠 Top Tip

André Luis Pulcherio, a student of Gauntlet — an intensive bootcamp for AI-first engineers — shared a snap from a lecture with some useful prompting tips.

Keep these in mind when working with coding agents.

Use Clear Indicators

Capitalization: Highlight critical instructions with capitalized letters like "DO NOT", "ALWAYS", uppercase "IMPORTANT."
Bullet points or numbers: Separate key details into clear, digestible chunks.
Explicit Markers: Surround important content with markers or XML tags (e.g., >>> IMPORTANT <<<).

Highlight Key Information

Front-load Critical Info: Place the most essential instructions first, as the model prioritizes initial context more heavily.
Reduce Noise: Limit extraneous details. Clearly articulate the main goals, constraints, or rules upfront.
Repetition for Emphasis: Repeat particularly critical points once or twice (not excessively) to reinforce their importance.

A couple of examples:

"You are creating a React application. IMPORTANT: Always use TailwindCSS. NEVER include inline styles. The priority is accessibility."

"Generate a React form component for user registration. IMPORTANT: Ensure the form validates inputs client-side. The form should have accessibility features (use ARIA attributes). Again, it’s important the form is accessible."

Having worked with these tools a lot over the past few months, I have noticed that SHOUTING at the agent seems to get a better result…… sometimes.

⚒️ Tool of the week

This week Eyal Toledano released Task Master, a CLI tool that converts product requirement documents (PRDs) into a local task management system for Cursor Agent.

If this works well it’ll be very helpful and I love the pitch:

“Graduate from building cute little apps to ambitious projects without overwhelming yourself or Cursor”

It’s available on npm, and I’m going to try it over the weekend — I’ll let you know how it goes!

🔥 Hot take

One of my favourite YT channels, Fireship, put out an interesting video this week on the vibe coding phenomenon.

Though he appears at first to compare it to a mental illness, he goes on to offer a balanced perspective and actionable advice.

Drawing the distinction between coding (which AI is automating) and programming - he makes the point that fundamental programming knowledge is still absolutely required to "vibe code" with any degree of success.

Most successful vibe coders are following three critical rules:

Stick to popular tech stacks where LLMs have abundant training data, because LLMs excel at solving problems people have already solved on GitHub and Stack Overflow
Master Git and use it religiously: when AI takes control of your code, it can delete working parts in an instant, and it's nearly impossible to prompt that code back into existence
Think like a product manager: break complex requirements into small, specific steps with all the necessary context. You want your LLM to be as deterministic as possible, not creative.

Relying solely on vibes is still a bad idea for most of us.

AI assisted coding works best with disciplined, structured approaches - and as the tools get better, your ability to think systematically becomes more valuable, not less.

📶 Signal Boost

A few of the cool resources, projects, and releases I came across this week:

Indie Hackers compiled an ultimate guide to building games with Cursor
Ian Nuttall turned Cursor into a content writing tool with an agent
An extension to turn simple requests into expert level prompts
Great guide on building effective agents by Elvis Saravia
Solid workflow for vibe coding a SaaS
Vibe coding workflow leveraging Grok, Cursor and MCP
SpatialLM, an LLM for spatial understanding
Anthropic launched a new blog for practical advice and new developments
Interesting thread demoing AI avatars which are getting scary good
Nano browser, an open source web automation alternative to Operator
StarVector, a model for generating SVGs from images and text
An AI content strategist, which I might test out this week
8 new Claude Code features
Microsoft released free courses on generative AI + JavaScript

That's all for this first edition of Vibe with AI.

Until next week, keep experimenting and having fun with this exciting new technology.