Johannes

Forseth

Build log

How I built an AI that edits my videos for me

I record raw, ramble, and fumble takes. Then a system I built does the cut, the animations and the captions on its own. Here’s exactly how it works — the real tools, with the full build on video.

June 16, 2026·7 min read

I don’t really edit my videos anymore. I pick up the camera, talk for as long as it takes — fumbles, restarts, three versions of the same line — and hand the raw footage to a system I built. A minute or so later it hands me back a tight, cut, captioned video. Here’s exactly how it works, and the actual tools behind it.

Quick honesty up front: this is built around how I work, not a product you can buy. I record a certain way and the system is shaped to that. Treat it as a build log — the principles transfer, the exact setup is mine.

Why I built it

Recording is easy. You can film a week of content in an afternoon. Editing is where it all backs up — hours per video, the boring middle between “I made it” and “I shipped it.” So the clips pile up unedited and the channel stalls. The bottleneck was never ideas. It was the cut. So I made the cut the one thing I don’t do.

It’s two tools, and an editor named Carl

The whole thing runs in Claude Code — the AI editor I call Carl. When I say “send it to Carl,” I mean I hand the raw footage to Claude Code and it takes over. It’s really two skills I built: one does the cut, one does everything that makes it look like mine. I just record and review.

Step 1 — Record a mess (on purpose)

For the clip in this post, that’s seven and a half minutes of me talking — a basket of dirty laundry. I run the same lines a few ways: one take straight off the script, one half-script half-rambling, and one where I just go and chase the energy. Fumbles, tangents, “let me start that over,” all of it. I never stop to get it right, because sorting that out is the system’s job.

Step 2 — The cut (a skill I call shortform-cutaway)

First it transcribes every word with WhisperX. Whisper is an AI model that runs on my own machine and turns speech into text; the “X” part force-aligns each word to the audio, so it knows the exact start and end of every word. With that, it cuts on the word edges and drops the silences — no clipped words, no dead air.

Then comes the real work: the red thread. The model — Opus in my case, but any capable one works — reads the context across all my takes and, going off my tone and even the imagery on screen, decides which take of each line to keep and which to throw away. It stitches the best version of each line into one clean story. (It’s not magic — English isn’t my first language, so now and then it mishears a word and I fix it. About 99% of the time it’s right.)

That clean cut gets dropped into DaVinci Resolve. I keep that step in Resolve on purpose — it’s where I get motion blur, the effects, and my colour grade, without touching anything else. I record for seven minutes; about a minute later Carl hands me a tight cut and asks what I think.

I talk for as long as it takes. The system gives me back a tight, captioned cut with the story intact — in the time it takes to make coffee.

Step 3 — The look (a second skill, shortform-finish)

This is where it stops looking like a raw talking head and starts looking like mine. The skill analyses the cut: it pulls a frame, finds the negative space so a graphic never lands on my face, respects the TikTok safe zones, and adds the zoom moves on a nice curve. Then it reaches into an asset library I built.

Every animation in that library is built in Remotion — so it’s just code, reusable, with the sound effects already baked in. Carl grabs whichever one fits. But here’s the part that still feels like cheating: it rewrites the text inside the animation to match the video. A template might say “5 viral hooks”; it reads what this video is actually about and rewrites that line in the Remotion code itself — same animation, my topic, nothing typed by hand.

How that library exists at all: I found animation styles I loved, reverse-engineered them in After Effects through an After Effects MCP server, then ported them into Remotion — because the AI is far better at Remotion and code than at driving After Effects by hand. That’s hundreds of animations now, in a style that’s mine.

Carl, the part people remember

Somewhere in there, Carl stops being invisible and shows up on screen — an editor with opinions who cuts in, makes a joke, takes over a section, and hands it back. He started as a throwaway gag and turned into the signature; people comment just to talk to him. The same system that makes the cut makes him too.

Captions, then it’s done

Last, it lays the captions word-by-word — each one timed to the exact moment I say it, reusing the word timings from the cut — in the bold style I use across everything. It renders them as a transparent layer, lays them over the video, and that’s it. I never opened a timeline by hand.

Here’s one it cut

I recorded this. The system did the rest — the cut, the captions, the animations, all of it.

What broke along the way

Plenty. A normal transcription guessed at the word timings and everything landed early, so the first cuts felt off — that’s the whole reason for the force-alignment. The Remotion timeline got messy enough that I keep the final assembly in Resolve instead. Captions drifted out of sync on footage with an odd frame rate. And because English isn’t my first language, the odd word still gets misheard. None of that is in the highlight reel — but it’s the real build, and it’s why this actually works now.

I run this on my own channel every week — and it’s the kind of system I build for the people I work with. I find the best way to automate the slow parts of a business so it can actually scale, build the whole thing for you, then hand you the keys — it’s yours to run, and it’s easy. Want something like it?

Let’s talkWatch the full build for a deeper explanation
ShareXLinkedIn
The newsletter

Get every install in your inbox.

Systems, breakdowns and real numbers — documented as it happens, not after it works.

Next

A used-car dealer now gets every lead while they’re busy fixing cars

Johannes Forseth

Let's work

together

Version

2026 © Edition

Local time

Norway, GMT+1