...

How to Make Captions Like Alex Hormozi (Full Style Breakdown)

Ascynd Team

Ascynd Team

How to Make Captions Like Alex Hormozi (Full Style Breakdown)

TL;DR: Hormozi captions are large, ALL-CAPS, word-by-word animated captions in a condensed heavy sans-serif (Montserrat Black, Anton, or Bebas Neue) with a thick black stroke and a yellow highlight on emphasized keywords. They sit in the lower-middle of the frame, reveal one to three words at a time synced to speech, and cover roughly 10–15% of the screen height. This guide breaks down every element and shows you how to replicate the style — manually or with AI.

Alex Hormozi built one of the largest business audiences on the internet — 13+ million followers across Instagram, TikTok, YouTube, and X as of 2026 — and the visual signature of every one of his clips is the same: big, bold, word-by-word captions that snap onto the screen in rhythm with his voice.

The style is so distinctive that creators now call it by name. "Make it look like Hormozi captions." "Use the Hormozi style." It has become the default caption format for business, fitness, and self-development short-form video.

This guide breaks down every element of the Hormozi caption style — the font, the size, the stroke, the highlight color, the positioning, the timing — and shows you exactly how to replicate it on your own videos.

Table of Contents

  1. What Are Hormozi Captions?
  2. Why the Hormozi Caption Style Works
  3. The 7 Elements of a Hormozi-Style Caption
  4. The Exact Specs (Font, Color, Size, Position)
  5. How to Make Hormozi Captions Manually
  6. How to Make Hormozi Captions Automatically with AI
  7. Platform-Specific Adjustments
  8. Mistakes That Kill the Hormozi Look
  9. FAQ

What Are Hormozi Captions?

Hormozi captions are a specific caption style popularized by entrepreneur and author Alex Hormozi, founder of Acquisition.com and author of $100M Offers and $100M Leads. The format features:

  • A heavy condensed sans-serif font (typically Montserrat Black, Anton, or Bebas Neue)
  • ALL-CAPS text for every word
  • A thick black outline (stroke) around white letters
  • A yellow or green highlight on the emphasized keyword in each phrase
  • Word-by-word timing — captions appear as each word is spoken, not as full sentences
  • Lower-middle screen placement — roughly 60–70% down from the top
  • A caption size that covers 10–15% of screen height per line

The format is intentionally loud. It is designed to be read at a glance on a muted mobile feed, and to force the viewer's attention to the single most important word in each beat.


Why the Hormozi Caption Style Works

The Hormozi style isn't popular because it looks cool. It is popular because it is engineered around three facts about how people watch short-form video:

  1. 92% of mobile video is watched with the sound off (Verizon Media / Publicis Media). Captions have to do the job the audio can't.
  2. The first 3 seconds decide retention. On TikTok, Reels, and Shorts, completion rate is the dominant ranking signal. Captions that punch in fast buy you those 3 seconds.
  3. The average mobile attention span on vertical video is ~8 seconds per clip. Word-by-word animation creates a rhythmic pull that holds the eye longer than static subtitles.

TikTok's own research found that videos with text overlays and captions receive a 55.7% higher impression rate than videos without them. Captioned videos see up to a 40% increase in views compared to uncaptioned ones, and 80% of viewers are more likely to watch to the end when captions are present.

Hormozi's caption style takes every one of those variables and optimizes it. Big text wins the mute-scroll. ALL-CAPS + stroke wins readability at thumb distance. Word-by-word timing wins completion. Yellow highlights win comprehension — the viewer instantly knows which word matters.

For a broader look at why captions drive views, see our data breakdown on whether captions actually increase video views.


The 7 Elements of a Hormozi-Style Caption

Every Hormozi caption has the same seven components. Miss any of them and the style falls apart.

1. Heavy Condensed Sans-Serif Font

The font is non-negotiable. It must be a display-weight sans-serif with minimal internal spacing. The three most common choices:

  • Montserrat Black (free, Google Fonts) — the most common Hormozi-exact match
  • Anton (free, Google Fonts) — tighter, more condensed
  • Bebas Neue (free, Google Fonts) — slightly lighter weight, still works
  • Impact (pre-installed on most systems) — a passable backup

Avoid any thin, rounded, or serif font. The style depends on visual weight.

2. ALL-CAPS

Every word is uppercase. This is the single most copied element and the one that makes the style instantly recognizable. Title case and sentence case do not produce the same feel.

3. White Fill + Thick Black Stroke

Letters are white. The outline is black, thick, and consistent. On a 1080×1920 vertical video, the stroke is usually 8–12 pixels wide — thick enough that the caption reads against any background (including blown-out highlights and deep shadows).

4. Yellow Highlight on the Emphasized Word

This is the detail most amateur imitations miss. Hormozi (and his editors) pick one word per phrase — the noun or verb that carries the meaning — and change its fill from white to yellow (or sometimes green). The highlighted word becomes the anchor the viewer's eye locks onto.

The highlight color matters. Hormozi uses a bright, saturated yellow close to #FFD93D or #FFEE33. Green alternates sometimes use #39FF14 or #A6FF00.

5. Word-by-Word Timing

Captions don't appear as full sentences. Each word snaps in as it is spoken, then is replaced by the next word or cleared at the end of the phrase. This creates the rhythmic "beat" that makes the clips feel punchy.

Typical per-word display time: 200–500ms, matching the cadence of spoken English.

6. Lower-Middle Placement

Captions sit roughly 60–70% of the way down the frame — below the face, above the bottom UI (likes, comments, share buttons on TikTok/Reels/Shorts). Never centered vertically, never at the very top, never overlapping the speaker's mouth.

7. No Fancy Animations

The style is intentionally minimalist in motion. No bounce, no zoom, no spin, no fade. Captions either pop in instantly or use a tiny scale-up (≤105%). Heavy animation dates the clip and breaks the rhythm.


The Exact Specs

Here is the reference sheet, in one table, for replicating the look on a 1080×1920 vertical video.

ElementSpecification
FontMontserrat Black, Anton, or Bebas Neue
CaseALL CAPS
Fill color (default)#FFFFFF (pure white)
Fill color (highlight)#FFD93D (yellow) or #39FF14 (green)
Stroke color#000000 (pure black)
Stroke width8–12 px on a 1080×1920 canvas
Font size80–120 px (≈10–15% of frame height)
Letter spacing0 to −2% (slightly tight)
Words on screen1–3 per beat
TimingWord-by-word, 200–500 ms per word
PositionY ≈ 60–70% from top (1150–1350 px on 1920-high canvas)
AnimationNone, or instant snap + subtle scale-up

If you want to match the style pixel-for-pixel, those are the numbers to start from and then adjust to your frame.


How to Make Hormozi Captions Manually

If you are doing this by hand in a traditional editor, the workflow is long but achievable. Here is the process in CapCut, Premiere Pro, or After Effects:

Step 1 — Transcribe the Audio

Use a speech-to-text tool (CapCut has this built-in; Premiere has Speech-to-Text in the Text panel) to generate a transcript. Confirm accuracy word-by-word — names, jargon, and numbers are frequent failure points.

Step 2 — Split Into Word-Sized Clips

Each caption should be 1–3 words. Break the transcript at natural speech beats. In CapCut, set "Split by word" on the caption track. In Premiere, duplicate the text layer per word and trim each to its audio duration.

Step 3 — Apply the Text Style

Set the font, size, stroke, and fill to the specs in the table above. Save this as a text preset (Premiere) or template (CapCut) so you don't rebuild it every clip.

Step 4 — Highlight the Key Word

For each phrase, identify the single most meaningful word and change its fill from white to yellow. This is the highest-leverage manual step — and the one AI tools automate best.

Step 5 — Position and Time

Place the text block at 60–70% screen height. Align each word's in-point to the audio waveform peak for that syllable. For a 60-second talking-head clip with ~150 words, this step alone takes 20–40 minutes.

Step 6 — Export

Render at 1080×1920, 30 or 60fps, H.264, high bitrate. Check the caption readability in the preview at actual mobile size before exporting.

Total time for a 60-second clip: 30–60 minutes of manual work, per clip. This is why most creators either batch-edit once a week or automate.


How to Make Hormozi Captions Automatically with AI

Modern AI captioning tools do the entire Hormozi workflow in seconds:

  1. Automatic transcription with word-level timestamps
  2. Pre-built Hormozi-style presets (font, color, stroke, position already configured)
  3. Automatic keyword highlighting — AI picks the emphasized word in each phrase and colors it yellow
  4. Word-by-word animation applied to the entire clip in one pass
  5. Platform-optimized exports for TikTok, Reels, and Shorts

This compresses 30–60 minutes of manual work into under a minute. For creators posting daily, it is the difference between captions being possible and captions being impossible.

Ascynd includes a Hormozi-style preset out of the box, with word-by-word timing and automatic keyword highlighting on every clip you generate. Drop in a long-form video (podcast, interview, webinar), and Ascynd clips the most engaging moments and captions them in the Hormozi format automatically.

For the broader workflow — from long-form source to captioned short — see our guides on AI auto captions and building an AI content creation workflow.


Platform-Specific Adjustments

The core Hormozi style works on every short-form platform, but small adjustments matter for each.

TikTok

  • Watch the UI zone. TikTok's right-side action buttons and bottom text overlap the lower ~20% of the frame. Keep Hormozi captions at roughly Y=60% (not 70%) so they don't clash with the like button or the caption/username text.
  • Avoid the top 15% — TikTok's own UI lives there.
  • Word-by-word timing plays especially well with TikTok's completion-rate algorithm. See our breakdown on AI TikTok clips for more.

Instagram Reels

  • The safe zone is narrower than TikTok. Reels UI covers the bottom ~17% and the top ~11%.
  • Keep captions at Y=62–65% for Reels specifically.
  • Instagram's algorithm now indexes on-screen text as a discovery signal, so accurate captions double as keywords.

YouTube Shorts

  • Shorts has the most forgiving safe zone of the three platforms.
  • YouTube indexes caption text for search, so precise transcription matters more here than on other platforms — AI captions with >95% accuracy will outperform sloppy hand-typed captions every time.
  • Because Shorts displays on larger screens (tablets, TVs) more often than TikTok, you can push the font size slightly smaller (~9% of frame height) without hurting readability.

LinkedIn

  • LinkedIn video autoplays on mute and sees a 70% engagement boost with captions.
  • On LinkedIn, the Hormozi style can feel too loud for some B2B audiences. Consider softening the highlight color to white-on-white-bold (no yellow) while keeping the word-by-word timing.

Mistakes That Kill the Hormozi Look

The style is simple, but easy to get wrong. These are the mistakes that give away a low-effort imitation:

  • Thin fonts. If you use Montserrat Regular instead of Montserrat Black, the style collapses. Weight matters more than family.
  • No stroke, or a gray stroke. The black stroke is load-bearing. Without it, captions disappear on bright backgrounds.
  • Too many words per caption. Four or more words on screen at once is a readability wall. Keep it to 1–3.
  • Full-sentence captions instead of word-by-word. Static captions work, but they are not the Hormozi style. The timing is half the format.
  • Captions placed over the face. Cover the mouth and you lose lip-sync cues that help viewers track audio even when muted.
  • Random highlight colors. Yellow (or green) is the convention. Pink, blue, or red highlights don't read the same way.
  • Inconsistent highlighting cadence. Highlighting every other word defeats the purpose. One keyword per phrase is the rule.
  • Auto-generated captions with typos. Platform auto-captions (TikTok, Instagram, YouTube) misfire on jargon, names, and numbers. Spot-check before publishing. For the trade-offs, see manual vs. AI captioning.

FAQ

What font does Alex Hormozi use for his captions?

The most common match is Montserrat Black, a free Google Font. Alex Hormozi's editors have also used Anton and Bebas Neue — all three are heavy, condensed, uppercase-friendly sans-serifs. Montserrat Black is the closest 1:1 match to the style most viewers associate with his recent clips.

What color is the highlight word in Hormozi captions?

Bright yellow, approximately #FFD93D or #FFEE33. Occasionally a saturated green (#39FF14) is used for variety. The point is high contrast against white — any muted or pastel color breaks the effect.

How big should Hormozi-style captions be?

Roughly 10–15% of the vertical frame height. On a 1080×1920 video, that translates to an 80–120 pixel font size. Big enough to read at arm's length on a phone, small enough to leave room for two or three words per line.

Do I need Premiere Pro or After Effects to make Hormozi captions?

No. CapCut (free, mobile and desktop) can produce the Hormozi style manually with its word-splitting and highlight features. AI captioning tools like Ascynd produce the style automatically without any editing software at all. Premiere and After Effects give you more control, but they aren't required.

How long does it take to make Hormozi-style captions manually?

For a 60-second talking-head clip, expect 30–60 minutes of manual work: transcription, word splitting, styling, keyword highlighting, and timing. AI captioning tools reduce this to under a minute per clip.

Are Hormozi-style captions the best format for every video?

For talking-head business, fitness, self-development, and sales content, yes — the data consistently shows this format drives higher completion and retention. For music-driven, cinematic, or purely visual content (food ASMR, travel B-roll, ambient), softer subtitle-style captions usually fit better. Match the caption style to the content.

Do Hormozi captions work on long-form YouTube videos?

They can, but the format is designed for short-form vertical video. On long-form horizontal YouTube, more traditional subtitle-style captions (bottom-aligned, white text, smaller size) tend to read better because viewers are in a lean-back, sound-on watching mode.


The Bottom Line

The Hormozi caption style is not magic — it is a tightly engineered answer to how people actually watch short-form video on muted mobile screens. Large condensed font. ALL CAPS. White fill, black stroke, yellow keyword. Word-by-word timing. Lower-middle placement. Seven elements, no more.

Replicate those seven elements and you replicate the style. Miss any one of them and the clip reads as an imitation.

If you want the style without the 30-minute-per-clip manual workflow, try Ascynd — the Hormozi preset is built in, keyword highlighting is automatic, and you can caption dozens of clips a week without touching a timeline.