# Download System v2 — Complete Redesign ## Goal Replace the 9500-line monolithic `download-manager.ts` with a clean, modular download system that fixes: 1. Downloads hanging without clean restart 2. Wrong error classification leading to wrong retry paths 3. Unreliable resume (corrupt files, unnecessary restarts) 4. Post-processing (extraction) breaking or looping ## Constraints - Same IPC interface — drop-in replacement, no UI changes needed - Same external dependencies (debrid.ts, storage.ts, integrity.ts) - Same session/settings persistence format ## Architecture ### Module Structure ``` src/main/download/ ├── download-manager.ts # Orchestrator (~500 lines) — coordination only ├── scheduler.ts # Queue management, slot allocation, priorities ├── pipeline.ts # Single download flow: unrestrict → stream → verify ├── stream-writer.ts # HTTP streaming, resume, buffered writing, NTFS ├── error-classifier.ts # Typed error system (enums, not string matching) ├── retry-manager.ts # Central retry logic, backoff, shelving, state └── post-processor.ts # Extraction queue, hybrid retry, cleanup ``` ### Module Responsibilities #### 1. download-manager.ts (Orchestrator) - Holds session state, packages, items - Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.) - Delegates to Scheduler for queue management - Delegates to Pipeline for individual downloads - Delegates to PostProcessor for extraction - Emits same events as current (progress, status changes) - Handles persistence (save/load session) #### 2. scheduler.ts - `findNextItem()`: priority-based queue with provider cooldown awareness - `fillSlots()`: start downloads up to maxParallel - Scheduler loop with generation guard (prevents stale schedulers) - Global stall watchdog - Provider cooldown tracking (circuit breaker) - AllDebrid paced-start / hoster-limit logic #### 3. pipeline.ts - `runDownload(item, context)`: single download lifecycle - Step 1: Unrestrict link via debrid service - Step 2: Stream file via StreamWriter - Step 3: Verify integrity (CRC if available) - Step 4: Signal completion - Each step returns typed result or throws typed DownloadError - No retry logic here — just reports what happened #### 4. stream-writer.ts - `streamToFile(url, targetPath, options)`: HTTP streaming - Resume support with pre-validation: - Check existing file size against tracked downloadedBytes - Truncate if sparse file detected (pre-allocated > actual) - Send Range header only after validation - HTTP 416 handling (complete vs incomplete) - Server-ignored-range detection (200 instead of 206) - Buffered writing with NTFS 4KB alignment - Sparse file pre-allocation (Windows) - Content-Disposition filename override - Stall detection (configurable timeout, default 10s) - Drain timeout for slow disks (default 5min) - Progress reporting via callback #### 5. error-classifier.ts - `DownloadErrorKind` enum with all error categories - `DownloadError` class extending Error with `.kind` property - `classifyError(error, context)`: takes raw error + context, returns DownloadError - Classifies at point of origin (HTTP layer, fetch layer, debrid layer) - No post-hoc string matching needed - `classifyHttpStatus(status, headers)`: HTTP-specific classification - `classifyFetchError(error)`: network-level classification - `classifyUnrestrictError(error)`: debrid-specific classification ```typescript enum DownloadErrorKind { // Network NetworkReset, // ECONNRESET, socket hang up, EPIPE Timeout, // No data received within stall timeout DnsFailure, // ENOTFOUND // HTTP RangeNotSatisfied, // 416 — file may be complete or need restart RangeIgnored, // Server sent 200 instead of 206 ServerError, // 500, 502, 503 RateLimited, // 429 Forbidden, // 403 — link expired NotFound, // 404 — file removed from CDN // Provider/Debrid UnrestrictFailed, // Provider can't convert link ProviderBusy, // Concurrent download limit ProviderDown, // Provider service unavailable HosterUnavailable, // Hoster down (not provider issue) LinkDead, // Permanent: file deleted at source QuotaExceeded, // Daily traffic limit // Filesystem DiskFull, // ENOSPC PermissionDenied, // EACCES, EPERM FileLocked, // EBUSY (Windows) // Integrity FileCorrupt, // CRC/size mismatch after download FileTruncated, // Downloaded less than expected // Extraction WrongPassword, // Archive password incorrect ArchiveCorrupt, // Archive header/data damaged ExtractorCrash, // 7-Zip/WinRAR process crashed ExtractionLoop, // Same archive failed extraction 3+ times } ``` #### 6. retry-manager.ts - `RetryManager` class holds all retry state per item - Deklarative retry policies per DownloadErrorKind: ```typescript interface RetryPolicy { maxRetries: number; // 0 = no retry (permanent failure) backoff: "fixed" | "exponential" | "linear"; baseDelayMs: number; maxDelayMs: number; resetFile: boolean; // Delete partial file before retry switchProvider: boolean; // Try different provider refreshLink: boolean; // Get new direct link from debrid providerCooldownMs?: number; // Apply cooldown to current provider } ``` - `shouldRetry(itemId, error)`: returns { retry: boolean, delayMs, actions[] } - `recordFailure(itemId, error)`: tracks failure for shelving - Shelving: after N total failures (configurable, default 15), pause 90s + reset provider - State persists across stop/start (same format as current retryStateByItem) - `resetItem(itemId)`: clear all retry state (manual reset) #### 7. post-processor.ts - `PostProcessor` class with extraction queue - State machine per package: ``` pending → extracting → done ↓ retry (max 2) → failed ``` - Tracks extraction attempts per archive (max 3 retries) - No infinite loops: hard cap on retry count - Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time) - Cleanup: remove partial extracts on failure - Empty folder cleanup after successful extraction ### Data Flow ``` User clicks Start ↓ DownloadManager.startRun() ↓ Scheduler.start() — begins loop ↓ Scheduler.findNextItem() — picks highest priority queued item ↓ Pipeline.runDownload(item) ├── debridService.unrestrict(item.link) │ └── error? → ErrorClassifier.classify() → DownloadError ├── StreamWriter.streamToFile(url, path, opts) │ ├── Resume validation │ ├── HTTP streaming with stall detection │ └── error? → ErrorClassifier.classify() → DownloadError └── integrityCheck(file) └── error? → DownloadError(FileCorrupt) ↓ Success → mark completed → Scheduler fills next slot Error → RetryManager.shouldRetry(item, error) ├── retry: true → Scheduler.queueRetry(item, delay, actions) └── retry: false → mark failed ↓ All items done → PostProcessor.run(package) ├── Extract archives ├── Verify extracted files └── Cleanup ``` ### Resume Validation (Key Improvement) Current problem: Resume trusts file size blindly, leading to corrupt files. New approach: 1. Before sending Range header, validate existing file: - `stat.size` must match `item.downloadedBytes` (±1KB tolerance for flush timing) - If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes - If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh 2. After resume response, validate: - 206 with correct Content-Range → continue - 200 (range ignored) → classify as RangeIgnored, retry with fresh link - 416 → check if file actually complete (existingBytes >= expectedTotal) 3. After download complete, validate: - Final file size matches expected total - CRC check if manifest available ### Stall Detection (Key Improvement) Current problem: Downloads hang and stall detection sometimes doesn't trigger properly. New approach: - **Per-download heartbeat**: StreamWriter emits heartbeat every second with bytes received - **Scheduler monitors heartbeats**: if no heartbeat for stallTimeoutMs → abort + retry - **Disk-write awareness**: separate tracking for "blocked on disk write" vs "blocked on network" - **Global watchdog**: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue - **Validating timeout**: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase) ### Post-Processing State Machine (Key Improvement) Current problem: Extraction can loop infinitely if archive keeps failing. New approach: ``` ExtractionState per archive: { archivePath: string; status: "pending" | "extracting" | "done" | "failed"; attempts: number; // max 3 lastError?: string; redownloaded: boolean; // max 1 redownload } ``` Rules: - Max 3 extraction attempts per archive - If `ArchiveCorrupt` + `redownloaded === false` → queue redownload, set redownloaded = true - If `ArchiveCorrupt` + `redownloaded === true` → fail permanently - If `WrongPassword` → try next password from list, fail after all exhausted - If `ExtractorCrash` → retry once, fail on second crash - Package marked as "completed with errors" if any archive fails permanently ## Migration Strategy 1. New code lives in `src/main/download/` directory 2. Old `src/main/download-manager.ts` stays untouched as reference 3. New `download-manager.ts` in `src/main/download/` implements same class interface 4. Switch import in `main.ts` from old to new 5. Test with real downloads 6. Delete old file when stable ## Testing Strategy - Unit tests for ErrorClassifier (classify every known error string) - Unit tests for RetryManager (policy application, shelving threshold) - Unit tests for StreamWriter resume validation logic - Unit tests for PostProcessor state machine - Integration test: Scheduler + Pipeline with mocked debrid/HTTP