Published May 6, 2026

Ascynd – a local AI video clipper, written in Rust

We built Ascynd, an AI video clipper that runs entirely on your machine. No uploads, no credits, no monthly caps. One Rust codebase across Mac and Windows.

Ascynd Team

Ascynd – a local AI video clipper, written in Rust

Today we're shipping the public beta of Ascynd — a local AI video clipper that takes long-form video, picks the best moments, and exports them as captioned vertical clips. The whole pipeline runs on your machine. No upload step, no per-minute meter, no monthly credit cap.

Ascynd is written in Rust end-to-end — the GUI, the encoder, the inference glue, the segmentation logic. One codebase, one binary on Mac and Windows.

This is the writeup we wish we'd had when we started it.

Why we built it

We record long podcasts and lectures and wanted to repurpose them. The cloud tools in this category — Opus Clip, Vizard, Klap, Submagic — are competent. They're also expensive at any serious volume, and they all want you to upload your raw footage to their servers before clipping starts.

Pricing details vary, but the structural thing they share is metered usage. You pay per credit, where one credit is roughly one minute of source video. A creator who records four hours a week burns through an entry tier in a couple of sessions. The next tier is double or triple the cost, and the cap just moves up a step.

The other thing that bothered us — more values than economics — is that we didn't love uploading every raw recording we make to a third party. Some of what we record is rough, unedited, and full of things we'd rather not hand off. The vendors are reputable. The architecture is still "send us your data and we'll send a clip back." That's not the shape we wanted.

So we built it locally.

What Ascynd does

The pipeline is roughly:

Transcribe the source video on-device with a Whisper-family model.
Score every segment of the transcript with a few engagement signals — semantic completeness, hook strength, pacing, sentiment shifts.
Cut the highest-scoring segments into 30–90 second clips.
Detect and remove silence and obvious filler words ("um," "uh," repeated false starts).
Reframe to vertical (9:16) using a saliency tracker so the speaker stays in frame.
Render animated word-by-word captions in a few common styles, baked into the output.

Everything on that list runs on your machine. There's no server-side step. On first launch it pulls the model weights down, and after that it works offline.

The audio model and the encoder are the slow parts. With GPU acceleration the pipeline runs at a workable pace; on older machines without it, it still works — just slower.

Why an all-Rust stack

We didn't set out to write everything in Rust. We started with a Tauri prototype — Rust backend, JS frontend — and ended up consolidating onto Rust for the UI as well. The reasons, in roughly the order we felt them:

One codebase across platforms. Mac and Windows from a single project. No Electron, no ifdef forest. Linux is mostly a CI question after that.
Binary size. The current installer is small enough that early testers asked whether the download had failed. No Chromium runtime tagging along for the ride.
Performance ceiling. The encoder, the segmentation pass, and the ranking step all benefit from staying in tight native code without crossing a language boundary. The Rust↔JS bridges in the prototype were a real source of latency on long videos.
Memory footprint. AI clipping is memory-hungry by nature — the audio model alone is meaningful. Leaving headroom for the user mattered.
The UI story is more workable than we expected. We won't oversell this. UI in Rust is rougher than UI in React. But the iteration loop is livable, and we'd rather pay that cost once than pay the bundle and runtime cost on every install.

The downside: some things took longer to build than they would have in Electron. Anything that looked like "drop in a webview-friendly library and move on" was usually a small reimplementation. On the flip side, anything involving native OS bits — file pickers, drag-and-drop, codec setup — was easier than we'd feared.

Limitations

A few things Ascynd doesn't do well, or doesn't do yet:

No collaboration features. No shared dashboards, no multi-user libraries. Single-user desktop tool by design. If your workflow needs team review built in, this isn't the right shape.
B-roll generation, voice cloning, and translation are out of scope for now. Some cloud tools include those. We'd rather do clipping well than do everything mediocrely.
Linux is unofficial. It builds and runs there, but we're not testing every release on it. That'll improve.

Try the beta

Ascynd is in free public beta. Mac and Windows builds, no watermarked output during the beta. If you record long-form video and want to feel what a local-first version of this category is like, you can grab it at ascynd.io.

See all posts