๐ŸŽฅ Video Question Answering and Summarizer

Upload a video file to transcribe its audio content, then ask questions or generate summaries using NVIDIA's Canary-Qwen-2.5B model.

Features:

  • Extract and transcribe audio from video files (handles long videos with chunking)
  • Ask questions about the video content
  • Generate different types of summaries
  • Powered by NVIDIA NeMo Canary-Qwen-2.5B

๐Ÿš€ Step 1: Load Model

๐Ÿ“น Step 2: Upload and Process Video

โ“ Step 3: Ask Questions

๐Ÿ“ Step 4: Generate Summary

Summary Type

๐Ÿ’ก Tips & Improvements:

  1. Supported formats: MP4, AVI, MOV, MKV, and other common video formats
  2. Audio quality: Better audio quality leads to more accurate transcriptions
  3. Long videos: The app now automatically splits long audio files into chunks for complete transcription
  4. Processing time: Longer videos are processed in chunks, which may take more time but ensures completeness
  5. Questions: Be specific with your questions for better answers
  6. Summaries: Choose the summary type that best fits your needs

๐Ÿ”ง Recent Fixes:

  • Increased token limits for complete transcriptions (2048 tokens for transcription, 1536 for summaries)
  • Audio chunking for videos longer than 30 seconds to prevent cutoffs
  • Improved transcript cleaning to remove model artifacts
  • Better progress tracking during video processing
  • Copy buttons for easy text copying

โš ๏ธ Requirements:

  • PyTorch 2.6+ for FSDP2 support
  • CUDA-compatible GPU recommended for optimal performance
  • Sufficient disk space for temporary audio files