๐ฅ Video Question Answering and Summarizer
Upload a video file to transcribe its audio content, then ask questions or generate summaries using NVIDIA's Canary-Qwen-2.5B model.
Features:
- Extract and transcribe audio from video files (handles long videos with chunking)
 - Ask questions about the video content
 - Generate different types of summaries
 - Powered by NVIDIA NeMo Canary-Qwen-2.5B
 
๐ Step 1: Load Model
๐น Step 2: Upload and Process Video
โ Step 3: Ask Questions
๐ Step 4: Generate Summary
Summary Type       
 ๐ก Tips & Improvements:
- Supported formats: MP4, AVI, MOV, MKV, and other common video formats
 - Audio quality: Better audio quality leads to more accurate transcriptions
 - Long videos: The app now automatically splits long audio files into chunks for complete transcription
 - Processing time: Longer videos are processed in chunks, which may take more time but ensures completeness
 - Questions: Be specific with your questions for better answers
 - Summaries: Choose the summary type that best fits your needs
 
๐ง Recent Fixes:
- Increased token limits for complete transcriptions (2048 tokens for transcription, 1536 for summaries)
 - Audio chunking for videos longer than 30 seconds to prevent cutoffs
 - Improved transcript cleaning to remove model artifacts
 - Better progress tracking during video processing
 - Copy buttons for easy text copying
 
โ ๏ธ Requirements:
- PyTorch 2.6+ for FSDP2 support
 - CUDA-compatible GPU recommended for optimal performance
 - Sufficient disk space for temporary audio files