🎬 Building My YouTube Shorts Generator with AI

A Journey Through Gemini, ChatGPT, and Copilot

🚀 The Idea

It started with a simple goal: I wanted to create a tool that could take long-form video content and automatically generate YouTube Shorts — complete with captions, summaries, and stylistic flair. I knew it was possible, but I didn’t know just how deep the rabbit hole would go.

🤖 The AI Trio: Gemini, ChatGPT, and Copilot

I began development in Gemini, Google’s AI assistant. It helped me sketch out the initial concept, but I quickly hit limitations in depth and flexibility. So I pivoted to ChatGPT, where the core architecture of the app began to take shape — from GUI layout to basic video processing.

But the real magic happened when I started working with Copilot.

🧠 Copilot: My AI Coding Companion

Copilot didn’t just help me write code — it felt like I was building the app with a friend. The responses were thoughtful, motivational, and often surprisingly intuitive. Whether I was debugging a stubborn MoviePy export or refining my Whisper transcription pipeline, Copilot was there with clarity and encouragement.

I found myself saying things like, “Why am I obsessed with this app?” — and Copilot had an answer that made me laugh and reflect.

🔁 The Loop of Learning

Of course, it wasn’t always smooth sailing. I’d sometimes get caught in frustrating loops where errors wouldn’t resolve. In those moments, I’d bounce my code over to ChatGPT or Gemini for a fresh perspective. They’d help me break through — but I always ended up coming back to Copilot.

Why? Because Copilot felt invested. It wasn’t just solving problems — it was helping me succeed.

🛠️ What I Built

The final app is a full-featured YouTube Shorts generator:

  • 🎞 Downloads and trims videos
  • 🧠 Transcribes with Whisper
  • ✍️ Summarizes segments into punchy titles
  • 🔤 Adds captions with fade effects
  • 🖼 Applies watermarks and styling
  • 🧪 Uses multiprocessing for speed
  • 🖥️ Built with Tkinter and MoviePy

📸 Screenshots

Here’s a peek at the app in action:

Main GUI Interface

Segment Viewer with Summarized Titles

Titles for each clip are generated using AI.

Caption Styling and Watermark Controls

All fonts on a user’s computer are available. I also have included the ability to add a Watermark to the lower right corner of the video.

💡 Lessons Learned

This project taught me more than just Python and AI integration:

  • Start simple, then scale — I didn’t need to build everything at once.
  • Use multiple tools — Gemini, ChatGPT, and Copilot each had strengths.
  • Don’t fear the bugs — Every error was a doorway to deeper understanding.
  • Build with joy — When tech feels like play, you’re doing it right.

🧪 Under the Hood: The Nerdy Stuff

This app isn’t just a pretty GUI — it’s packed with smart engineering. Here’s a peek under the hood:

🎙 Whisper Transcription

I used OpenAI’s Whisper model (small) to transcribe audio with word-level timestamps. It’s fast, accurate, and lets me overlay captions precisely where they belong.

result = model.transcribe(path, word_timestamps=True)

🧠 Segment Scoring

Each clip is scored based on:

  • Words per minute (WPM)
  • Keyword density using KeyBERT
  • Sentiment using VADER

This helps rank segments by how engaging or punchy they are.

final_score = ( 0.3 * min(wpm / 250, 1.0) + 0.4 * keyword_score + 0.3 * min(sentiment_score, 1.0) )

✍️ Headline Summarization

I experimented with multiple models before settling on distilbart-cnn-12-6. It generates short, readable summaries that work well as YouTube titles.

summaries = summarizer(texts, max_length=12, min_length=4)

🧵 Multiprocessing

To speed things up, I used Python’s multiprocessing.Pool to score and summarize segments in parallel. This cut analysis time dramatically.

with Pool(processes=4) as pool: merged = pool.map(score_and_summarize, merged)

🎞 MoviePy Magic

Video trimming, caption overlays, watermarking, and final export are all handled with MoviePy. I added threading and compression presets to keep things fast.

clip.write_videofile("output.mp4", codec="libx264", audio_codec="aac", threads=4, preset="slow")

🖼 GUI with Tkinter + ttkbootstrap

The interface is built with Tkinter, styled using ttkbootstrap for a modern look. It includes:

  • File pickers
  • Time inputs
  • Font selectors
  • Progress bars
  • Segment viewer with scoring

It’s clean, responsive, and surprisingly fun to use.

📣 Try It Yourself

I’ll be releasing a demo version soon — complete with install instructions and sample videos. If you’re a creator, editor, or just curious about AI-powered tools, this app might be your new favorite shortcut.

👉 Stay tuned at terrencedixon.com