A robust Python tool for bulk downloading YouTube videos with proxy support, configurable resolution settings, and S3 storage integration.
- Bulk video download from CSV lists
- Smart proxy management with automatic testing and failover
- Configurable video resolution settings
- Concurrent downloads with thread pooling
- S3 storage integration
- Progress tracking and persistence
- Separate video and audio download options
- Comprehensive error handling and logging
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
Create a .env
file with the following settings:
YTBULK_MAX_RETRIES=3
YTBULK_MAX_CONCURRENT=5
YTBULK_ERROR_THRESHOLD=10
YTBULK_TEST_VIDEO=<video_id>
YTBULK_PROXY_LIST_URL=<proxy_list_url>
YTBULK_PROXY_MIN_SPEED=1.0
YTBULK_DEFAULT_RESOLUTION=1080p
YTBULK_MAX_RETRIES
: Maximum retry attempts per downloadYTBULK_MAX_CONCURRENT
: Maximum concurrent downloadsYTBULK_ERROR_THRESHOLD
: Error threshold before stoppingYTBULK_TEST_VIDEO
: Video ID used for proxy testingYTBULK_PROXY_LIST_URL
: URL to fetch proxy listYTBULK_PROXY_MIN_SPEED
: Minimum acceptable proxy speed (MB/s)YTBULK_DEFAULT_RESOLUTION
: Default video resolution (360p, 480p, 720p, 1080p, 4K)
python -m cli CSV_FILE ID_COLUMN --work-dir WORK_DIR --bucket S3_BUCKET [OPTIONS]
CSV_FILE
: Path to CSV file containing video IDsID_COLUMN
: Name of the column containing YouTube video IDs--work-dir
: Working directory for temporary files--bucket
: S3 bucket name for storage--max-resolution
: Maximum video resolution (optional)--video/--no-video
: Enable/disable video download--audio/--no-audio
: Enable/disable audio download
python -m cli videos.csv video_id --work-dir ./downloads --bucket my-youtube-bucket --max-resolution 720p
-
YTBulkConfig (
config.py
)- Handles configuration loading and validation
- Environment variable management
- Resolution settings
-
YTBulkProxyManager (
proxies.py
)- Manages proxy pool
- Tests proxy performance
- Handles proxy rotation and failover
- Persists proxy status
-
YTBulkStorage (
storage.py
)- Manages local and S3 storage
- Handles file organization
- Manages metadata
- Tracks processed videos
-
YTBulkDownloader (
download.py
)- Core download functionality
- Video format selection
- Download process management
-
YTBulkCLI (
cli.py
)- Command-line interface
- Progress tracking
- Concurrent download management
The proxy system features:
- Automatic proxy testing
- Speed-based verification
- State persistence
- Automatic failover
- Concurrent proxy usage
Files are organized in the following structure:
work_dir/
├── cache/
│ └── proxies.json
└── downloads/
└── {channel_id}/
└── {video_id}/
├── {video_id}.mp4
├── {video_id}.m4a
└── {video_id}.info.json
- Comprehensive error logging
- Automatic retry mechanism
- Proxy failover
- File integrity verification
- S3 upload confirmation
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License
- yt-dlp: YouTube download functionality
- click: Command line interface
- python-dotenv: Environment configuration
- tqdm: Progress bars
- boto3: AWS S3 integration
Leave a Reply