๐ฅ Video Question Answering and Summarizer
Upload a video file to transcribe its audio content, then ask questions or generate summaries using NVIDIA's Canary-Qwen-2.5B model.
Features:
- Extract and transcribe audio from video files (handles long videos with chunking)
- Ask questions about the video content
- Generate different types of summaries
- Powered by NVIDIA NeMo Canary-Qwen-2.5B
๐ Step 1: Load Model
๐น Step 2: Upload and Process Video
โ Step 3: Ask Questions
๐ Step 4: Generate Summary
Summary Type
๐ก Tips & Improvements:
- Supported formats: MP4, AVI, MOV, MKV, and other common video formats
- Audio quality: Better audio quality leads to more accurate transcriptions
- Long videos: The app now automatically splits long audio files into chunks for complete transcription
- Processing time: Longer videos are processed in chunks, which may take more time but ensures completeness
- Questions: Be specific with your questions for better answers
- Summaries: Choose the summary type that best fits your needs
๐ง Recent Fixes:
- Increased token limits for complete transcriptions (2048 tokens for transcription, 1536 for summaries)
- Audio chunking for videos longer than 30 seconds to prevent cutoffs
- Improved transcript cleaning to remove model artifacts
- Better progress tracking during video processing
- Copy buttons for easy text copying
โ ๏ธ Requirements:
- PyTorch 2.6+ for FSDP2 support
- CUDA-compatible GPU recommended for optimal performance
- Sufficient disk space for temporary audio files