You can speed up audio (2x or 3x) with ffmpeg before sending it to OpenAI’s transcription API to reduce duration, tokens, cost, and wait time with minimal quality loss.
A simple workflow: extract audio with yt-dlp, apply atempo filter in ffmpeg, then send the faster audio to gpt-4o-transcribe and summarize with an LLM.
At 2x speed a 40-minute file becomes ~20 minutes, cutting input token cost roughly in half; at 3x (~13 minutes) it cuts cost by ~33%.
Output token count remains the same at 2x and 3x speeds, so most savings come from reduced audio input tokens.
Speeds beyond 3x (e.g., 4x) degrade transcription quality significantly.
Doubling or tripling audio speed is a quick, effective hack to save time and money on AI transcriptions with acceptable fidelity.
Get notified when new stories are published for "General AI News"