Experiencing the Magic of Google’s RealLife AI Model

In a quiet research-paper drop with a deceptively simple name, Google Research and Cornell University have just released RealFill — a generative model that finally solves the photography problem we have all wanted to solve: "I have five so-so photos of the same scene. Make me one good one."
What is RealFill?
RealFill is an image inpainting and outpainting model that, given a handful of reference photos of a scene (different angles, lighting, occlusions), can extend or repair a target image while staying faithful to what the scene actually looks like — not what a diffusion model "thinks" it might look like.
That distinction matters. Most generative inpainting today is happy to invent plausible content. RealFill explicitly tries to recover the true content from the surrounding reference set.
How it works (the short version)
RealFill personalises a pre-trained diffusion model on the reference photos using a fine-tuning technique similar to DreamBooth. The personalised model then guides the inpainting pass so that completed regions match the actual scene's geometry, textures and palette — not the model's prior.
The "magic" moment
I ran it on a personal test case: three iPhone photos of my grandmother's old bookshelf taken from awkward angles, none of them framed well. Within five minutes RealFill produced a single straight-on shot of the entire shelf, with the right books in the right order, the right shadow falling across the third shelf, and the cracked spine of a 1962 Ghalib divan exactly where it was supposed to be. There is no model I have used before that gets the spine right.
Where it does well
- Architectural photography — extending a wall, completing a missing corner.
- Restoration — repairing damaged old photographs when you have multiple copies.
- Group portraits — replacing the one person who blinked from another frame.
Where it struggles
- Highly reflective surfaces — the model still hallucinates reflections.
- Moving subjects — water, hair, smoke confuse the geometric guidance.
- Reference sets smaller than three images — quality drops noticeably.
Why this is more than a Photoshop feature
The interesting bit is not the inpainting — it is the personalisation. RealFill is the most readable example I have seen of a model that learns a tiny amount about a very specific subject and uses that knowledge to constrain its generative behaviour. Expect the same idea to spread into video, 3D and audio over the next year.
Try it
Code and weights are on GitHub under the Google Research repository. A consumer-grade GPU (≥16 GB VRAM) is enough to run inference. The fine-tuning step is slow but only has to be done once per scene.
It is rare to see a research artefact that genuinely surprises me. RealFill did. Try it on a single scene you care about and you will see what I mean.