How Our AI Summarization Pipeline Works
A technical look at how CandleForrest extracts structured trading strategies from YouTube videos using yt-dlp and GPT-4o-mini.
The Pipeline
When you submit a YouTube URL, a lot happens behind the scenes in a few seconds. Here's the full pipeline:
1. Metadata Extraction
We use yt-dlp to fetch video metadata — title, description, channel name, thumbnail, duration, and channel URL. No YouTube API key needed. yt-dlp scrapes it directly.
2. Transcript Fetching
yt-dlp also grabs the auto-generated English subtitles in VTT format. We clean the raw VTT by:
- Stripping timestamps and formatting tags
- Deduplicating consecutive identical lines (YouTube's auto-captions repeat a lot)
- Producing clean plain text
3. AI Summarization
The cleaned transcript goes to GPT-4o-mini via OpenAI's API with Instructor for structured outputs. We ask for:
- A concise summary of the strategy
- Tags for categorization (normalized via our synonym map)
- A containsStrategy boolean — if false, the submission is rejected
4. Strategy Step Extraction
A second LLM call extracts structured strategy steps:
- Setup conditions — what market state triggers the strategy
- Entry signal — the specific trigger to enter a trade
- Exit / take profit — when and how to close
- Stop loss & risk — risk management rules
- Indicators, timeframes, and markets — the technical context
5. Channel Resolution
We also create or update a channel record — scraping the YouTube channel page for the real profile picture, follower count, and handle.
Why Not Use the YouTube API?
The YouTube Data API requires API keys, has strict quotas, and doesn't provide transcripts. yt-dlp gives us everything we need with zero configuration and no rate limits for our scale.
Tag Normalization
We maintain a synonym map of ~80 entries that normalizes common variations:
- "moving average", "MA", "SMA", "EMA" →
moving-averages - "support and resistance", "S/R", "support/resistance" →
support-resistance - "risk reward", "R:R", "risk-to-reward" →
risk-management
This keeps the tag filter clean and ensures related strategies cluster together.
What Could Go Wrong
The AI isn't perfect. Auto-generated transcripts can be noisy, especially for videos with heavy jargon or non-native speakers. That's why every strategy card shows a disclaimer: "AI-generated summary — watch the original video before trading this strategy."
Verified strategies (manually reviewed) get the disclaimer removed.