YouTube Content Production Pipeline

Summary

A command-line YouTube content production pipeline that takes a video transcript (or a topic prompt) and fans it out to 18 generators producing every adjacent asset a creator needs to ship: titles, descriptions, SEO tags, hooks, thumbnail briefs, AI-generated thumbnails and banners, content briefs, section illustrations, and downstream products. One transcript in, a publish-ready folder out. Built as a plug-in architecture so adding the 19th generator is a 7-step recipe.

The Problem

Shipping a YouTube video isn't just shooting the video. Every upload needs a title that survives the algorithm, a description that hits a known SEO pattern, hashtags, a thumbnail brief, the actual thumbnail image, a section-illustration set for the body of the video, sometimes a content brief for repurposing, and product descriptions if you're selling alongside the video. Doing all of that by hand for each upload is the reason most creators ship sporadically.

I wanted a command — ytcc generate transcript.md — that produced every one of those assets, on-brand, in under a minute. With the freedom to skip generators, run only specific ones, or regenerate from a saved analysis without paying for the analysis step twice.

The Approach

The pipeline is built around a ContentContext dataclass and a registry of generators. The transcript first goes through a single frontier-model analysis pass that produces a structured JSON with topic, outline, hook candidates, and audience signal. That analysis becomes part of the context every downstream generator consumes — no generator re-analyzes the video, they all read the cached JSON.

transcript.md  →  Frontier-model analysis (one call)
                          ↓
                 ContentContext (transcript + analysis + brand)
                          ↓
        ┌─────────────────┼─────────────────┐
        ↓                 ↓                 ↓
   Text generators   Image generators   Product generators
   (titles, SEO,     (thumbnails,       (descriptions,
    descriptions,     banners,           briefs,
    hooks, briefs)    illustrations)     splits)

Generators register themselves via a decorator and inherit from a BaseGenerator class with a single generate(context) → Path method. Each one declares its prompt file, output extension, and output directory in config. Adding a new generator is 7 mechanical steps with no framework code to touch. That plug-in shape is the reason the pipeline grew to 18 generators without becoming unmaintainable.

What I Built

One-pass analysis layer — single frontier-model call produces a structured JSON consumed by every downstream generator; saves cost and keeps every asset internally consistent
18 plug-in generators — text, image, and product generators each with their own prompt template and output schema
Image generation pipeline — multimodal model produces thumbnails and banners; ImageMagick handles composition and text overlay
Brand context module — channel name, audience, tone, default hashtags, SEO tag pool centralized so every generator stays on-brand
--only / --skip / regenerate flags — run specific generators, skip expensive ones, or re-run from cached analysis without paying for the analysis again
Topic mode — ytcc concept "<topic>" skips the transcript and generates from a prompt for fast pre-production

Engineering Highlights

One analysis, many generators. The most expensive step is the structured analysis pass. Every other generator reads the cached JSON and never re-analyzes. Drops cost and runtime and — more importantly — keeps the title, description, thumbnail brief, and hooks all working off the same understanding of the video.
Plug-in architecture as a productivity multiplier. A new generator is one Python file, one decorator, one prompt template, one config entry. The shape lets me add capability faster than I add maintenance burden.
Dual-vendor AI by design. Text generation runs on a frontier reasoning model; image generation runs on a separate multimodal vendor. The vendor choice is configured per generator, abstracted behind environment variables. Switching either vendor is a config change, not a refactor.
Title format codified, not "creative." The channel has a tested title formula that performs. The titles generator enforces the formula via the prompt instead of asking the model to be clever. Boring + tested beats clever + unproven on YouTube algorithms.

Outcome

Daily content output for a sports betting channel goes from "an afternoon of asset prep" to a single command. Ships every weekday on schedule. Adding a new asset type — a new generator — takes under an hour. The pipeline has grown from a few generators to 18 without rewrites.

Tech footprint

Frontend — Click CLI with Rich console output; no web UI
Backend — Python 3.8+ pipeline with a generator registry pattern
AI — frontier reasoning model for text and analysis, separate multimodal model for thumbnails and banners (vendor-abstracted)
Image processing — ImageMagick (Wand) for composition and text overlay
Config — python-dotenv, centralized brand context module
Output — markdown, JSON, PNG into per-generator directories