The Human Edge
Posts
Hands-On with Gemini 2.5 Pro

Hands-On with Gemini 2.5 Pro

How Google's latest AI model delivers advanced reasoning, vast context, and multimodal power you can use today

Tom Goss
April 01, 2025

Last week Google released their latest frontier model - Gemini 2.5 Pro.

I’ve been trying it out over the weekend, and it’s really impressive.

In this guide I’ll explain:

What it is
Why it matters
What it’s useful for
How to try it out (free)

Let’s get straight into it.

Core capabilities

Gemini 2.5 Pro has a few key qualities that make it so impressive — let’s go through them to understand the significance.

It uses “advanced reasoning“

2.5 pro is a reasoning model.

This means that unlike traditional LLMs, which generate answers through direct token prediction, it’s trained to generate a "thinking process" as part of its response.

It breaks down tasks into structured steps to work through complex problems methodically, allowing it to tackle logic-heavy tasks.

It’s a bit like an expert working step-by-step through a puzzle.

You can see its reasoning as it works through a problem, to understand how it came to a conclusion:

It isn’t the first reasoning model.

It might be the best though (so far).

It has a Huge 1 million context window

An LLM’s context window is like its working memory, how much text it can “see” or consider at a given moment.

Let’s explain it briefly.

The current LLM boom kicked off in 2022 with the launch of ChatGPT. The model that powered it — GPT-3.5 — had a context of around 4000 tokens (words or bits of words).

That’s only around 3000 words, the most it could “remember” at once.

It’s why talking to early ChatGPT was a bit like interacting with a goldfish, or talking with someone who only remembered the last few sentences.

It would constantly “forget” earlier parts of the conversation, as it pushed them out to make room for the newer parts.

It couldn’t:

Process long documents
Have long, in-depth conversations
Understand codebases

Since then context windows have rapidly expanded, turning LLMs from what felt like toys into genuinely useful sidekicks.

Now back to Gemini 2.5 Pro. Its context window is massive.

1 million tokens — soon to be expanded to 2 million.

This is industry-leading.

OpenAI's o3-mini and Claude 3.7 Sonnet tap out at 200K tokens, and DeepSeek R1 at just 128K. Only Grok 3 matches it, for now.

This means it can handle even extremely long conversations with ease, and take on challenging tasks like:

Processing large documents like entire books (even War and Peace!)
Analysing and summarising massive reports, studies, and docs
Understanding and working on an entire (medium-sized) codebase

These huge context windows are a real shift in what AI can do for you.

Along with improved reasoning, they’ve allowed LLMs to evolve from quirky chatbots into partners that can take on complex, interconnected problems.

It’s Multimodal

Gemini 2.5 Pro is a multimodal model that can take text, image, video and audio as inputs.

Multimodal models are built to handle it all —like Swiss Army knives of AI.

Source: dida.do

How they work is too complex to get into, but think of it like having built-in “universal translator”.

It’s (kind of) analogous to how our brains process information through sight, sound, physical sensations and internal thoughts.

Multimodal models are trained to convert the language of pixels, sound waves, and text into a common internal representation – a shared space of understanding and processing.

Along with the context size, the multimodality gives Gemini 2.5 Pro some interesting potential use cases:

Developers: drop in key files, UI screenshots, and voice memos for analysis
Marketers: Upload competitor videos, creative, and copy to quickly understand their strategy
Educators: Combine textbooks, images, and lecture audio to build lesson plans
Analysts: Upload reports, charts, and call audio to keep on top of trends and developments
Managers: Drop in meeting recordings, team reports, and project docs to get concise summaries

Ultimately, the multimodal capability gives the model new powers to process information in a richer, more intuitive way.

It’s crushing benchmarks

Whenever a big new model comes out, it gets put through its paces on a series of ‘benchmarks’.

These are like obstacle courses for AI — challenges that test its brainpower and performance on a series of tricky tasks. The idea is to get an objective sense of how “smart” a model is relative to others.

Gemini 2.5 Pro has been performing very well on these.

It’s leading the pack across a series of challenging tests, showing particular skill in reasoning, maths, and tasks based on visual understanding.

In coding there’s fierce competition but it performs extremely well.

It’s also leading on LMarena, an open platform where people vote on which model gives the best responses.

Finally — and it’s debatable how valid this form of testing is — it apparently scored 130 on an online IQ test.

Assuming the test is legit, that’s equivalent to the top 2% of humans!

The Bottom Line is that Gemini 2.5 Pro is a highly capable, SOTA model.

Demos

If you need a little inspiration, here are some fun real-world projects and demos I've come across.

Building a “personal intelligence agency”

Gemini 2.5 builds neat Physics simulations in 3js

One-shotting a 3d “Minecraft clone”

Using it as a “tax advisor”

Turning a 3 year old’s scribbled lyrics into a song

Testing it out

Now you know all the essentials about this model, you should go and try it out.

It’s free (for a while), all you need is a Google account.

Give it a try:

‎Gemini - chat to supercharge your ideas

Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.

gemini.google.com

Make sure you also try out Canvas, which is an interactive space for iterating on documents and code. Here’s Canvas showing a (playable) chess game I got Gemini to make earlier:

You’ll see the option to turn on Canvas at the bottom of the chat.

Here are a few ideas for testing out its capabilities:

Give it a large document and ask it to summarise it, comment on certain parts, or answer specific questions
Give it an app idea and get it to mock up the UI in Canvas
Upload a public (small to medium) GitHub repo and ask for a comprehensive code review and architecture suggestions
Upload your blog posts and ask for specific ways to repurpose them for different formats
Upload a YouTube video and get it to write a blog post version of it
Give it a complex image and ask it to write a poem inspired by it
Give it a series of charts or visualisations and ask it to analyse and write up a report based on them
Give it a podcast episode and ask it to turn it into practical takeaways and advice on one page
Share screenshots of an app and ask for suggested UX improvements
Send it a photo of the food you have in your fridge and tell it to suggest recipes
Upload a song and get it to analyze the meaning, then write an essay as a music critic on the core themes
Upload your CV with the links to 10 job descriptions and ask which positions are the best fit with specific talking points for each
Upload technical documents or articles and ask it to explain the concepts across increasing levels of expertise (5-year-old to PhD)

If you haven’t tried a frontier model in a while, it’s a great chance to test one out at no cost.

I’m impressed with it, and will switch to it for everyday usage along with the free versions of Grok and Claude which are great for certain tasks. I’m also using it in Cursor, with solid results.

Some doubted Google’s gen AI strategy, since they seemed to get off to a slow start over the past couple of years. That was never a smart bet though, and now that impression should be fully put to bed.

They weren’t the first ones to build a search engine either.

But they were the best.

Give Gemini a try and let me know what you think!

‎Gemini - chat to supercharge your ideas

Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.

gemini.google.com