beta-real-debrid-downloader/docs/plans/2026-03-08-download-system-v2-design.md
Sucukdeluxe efa0909e11 feat: Download System v2 — complete rewrite of download pipeline
Replace monolithic download-manager.ts (9500 lines) with 7 focused modules:

- error-classifier.ts: 25+ typed DownloadErrorKind enum, classifier functions
  for network/HTTP/debrid/extraction errors — no more string matching
- retry-manager.ts: Declarative per-error-kind retry policies, exponential
  backoff, shelving after 15 failures, state export/import
- stream-writer.ts: HTTP stream → file with pre-resume validation, stall
  detection, NTFS-aligned buffered writing, Range-ignored detection
- pipeline.ts: Single download lifecycle (unrestrict → stream → verify),
  throws typed errors, caller decides retry strategy
- post-processor.ts: Extraction state machine with hard caps (3 attempts
  per archive, 5 rounds per package), no infinite loops
- scheduler.ts: Queue management with priority-based slot allocation,
  heartbeat stall detection, global watchdog, provider cooldowns
- download-manager.ts: Drop-in orchestrator (~1500 lines), same public API

Fixes:
1. Hanging downloads: heartbeat-based stall detection + global watchdog
2. Wrong error classification: typed enum at point of origin
3. Unreliable resume: file size vs tracker validation, Range-ignored detection
4. Extraction loops: bounded retries with state machine

215 new unit tests for error-classifier and retry-manager (all passing).
Build compiles cleanly. Same IPC interface — UI unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 18:14:17 +01:00

10 KiB

Download System v2 — Complete Redesign

Goal

Replace the 9500-line monolithic download-manager.ts with a clean, modular download system that fixes:

  1. Downloads hanging without clean restart
  2. Wrong error classification leading to wrong retry paths
  3. Unreliable resume (corrupt files, unnecessary restarts)
  4. Post-processing (extraction) breaking or looping

Constraints

  • Same IPC interface — drop-in replacement, no UI changes needed
  • Same external dependencies (debrid.ts, storage.ts, integrity.ts)
  • Same session/settings persistence format

Architecture

Module Structure

src/main/download/
├── download-manager.ts      # Orchestrator (~500 lines) — coordination only
├── scheduler.ts             # Queue management, slot allocation, priorities
├── pipeline.ts              # Single download flow: unrestrict → stream → verify
├── stream-writer.ts         # HTTP streaming, resume, buffered writing, NTFS
├── error-classifier.ts      # Typed error system (enums, not string matching)
├── retry-manager.ts         # Central retry logic, backoff, shelving, state
└── post-processor.ts        # Extraction queue, hybrid retry, cleanup

Module Responsibilities

1. download-manager.ts (Orchestrator)

  • Holds session state, packages, items
  • Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.)
  • Delegates to Scheduler for queue management
  • Delegates to Pipeline for individual downloads
  • Delegates to PostProcessor for extraction
  • Emits same events as current (progress, status changes)
  • Handles persistence (save/load session)

2. scheduler.ts

  • findNextItem(): priority-based queue with provider cooldown awareness
  • fillSlots(): start downloads up to maxParallel
  • Scheduler loop with generation guard (prevents stale schedulers)
  • Global stall watchdog
  • Provider cooldown tracking (circuit breaker)
  • AllDebrid paced-start / hoster-limit logic

3. pipeline.ts

  • runDownload(item, context): single download lifecycle
  • Step 1: Unrestrict link via debrid service
  • Step 2: Stream file via StreamWriter
  • Step 3: Verify integrity (CRC if available)
  • Step 4: Signal completion
  • Each step returns typed result or throws typed DownloadError
  • No retry logic here — just reports what happened

4. stream-writer.ts

  • streamToFile(url, targetPath, options): HTTP streaming
  • Resume support with pre-validation:
    • Check existing file size against tracked downloadedBytes
    • Truncate if sparse file detected (pre-allocated > actual)
    • Send Range header only after validation
  • HTTP 416 handling (complete vs incomplete)
  • Server-ignored-range detection (200 instead of 206)
  • Buffered writing with NTFS 4KB alignment
  • Sparse file pre-allocation (Windows)
  • Content-Disposition filename override
  • Stall detection (configurable timeout, default 10s)
  • Drain timeout for slow disks (default 5min)
  • Progress reporting via callback

5. error-classifier.ts

  • DownloadErrorKind enum with all error categories
  • DownloadError class extending Error with .kind property
  • classifyError(error, context): takes raw error + context, returns DownloadError
    • Classifies at point of origin (HTTP layer, fetch layer, debrid layer)
    • No post-hoc string matching needed
  • classifyHttpStatus(status, headers): HTTP-specific classification
  • classifyFetchError(error): network-level classification
  • classifyUnrestrictError(error): debrid-specific classification
enum DownloadErrorKind {
  // Network
  NetworkReset,        // ECONNRESET, socket hang up, EPIPE
  Timeout,             // No data received within stall timeout
  DnsFailure,          // ENOTFOUND

  // HTTP
  RangeNotSatisfied,   // 416 — file may be complete or need restart
  RangeIgnored,        // Server sent 200 instead of 206
  ServerError,         // 500, 502, 503
  RateLimited,         // 429
  Forbidden,           // 403 — link expired
  NotFound,            // 404 — file removed from CDN

  // Provider/Debrid
  UnrestrictFailed,    // Provider can't convert link
  ProviderBusy,        // Concurrent download limit
  ProviderDown,        // Provider service unavailable
  HosterUnavailable,   // Hoster down (not provider issue)
  LinkDead,            // Permanent: file deleted at source
  QuotaExceeded,       // Daily traffic limit

  // Filesystem
  DiskFull,            // ENOSPC
  PermissionDenied,    // EACCES, EPERM
  FileLocked,          // EBUSY (Windows)

  // Integrity
  FileCorrupt,         // CRC/size mismatch after download
  FileTruncated,       // Downloaded less than expected

  // Extraction
  WrongPassword,       // Archive password incorrect
  ArchiveCorrupt,      // Archive header/data damaged
  ExtractorCrash,      // 7-Zip/WinRAR process crashed
  ExtractionLoop,      // Same archive failed extraction 3+ times
}

6. retry-manager.ts

  • RetryManager class holds all retry state per item
  • Deklarative retry policies per DownloadErrorKind:
interface RetryPolicy {
  maxRetries: number;          // 0 = no retry (permanent failure)
  backoff: "fixed" | "exponential" | "linear";
  baseDelayMs: number;
  maxDelayMs: number;
  resetFile: boolean;          // Delete partial file before retry
  switchProvider: boolean;     // Try different provider
  refreshLink: boolean;       // Get new direct link from debrid
  providerCooldownMs?: number; // Apply cooldown to current provider
}
  • shouldRetry(itemId, error): returns { retry: boolean, delayMs, actions[] }
  • recordFailure(itemId, error): tracks failure for shelving
  • Shelving: after N total failures (configurable, default 15), pause 90s + reset provider
  • State persists across stop/start (same format as current retryStateByItem)
  • resetItem(itemId): clear all retry state (manual reset)

7. post-processor.ts

  • PostProcessor class with extraction queue
  • State machine per package:
    pending → extracting → done
                  ↓
              retry (max 2) → failed
    
  • Tracks extraction attempts per archive (max 3 retries)
  • No infinite loops: hard cap on retry count
  • Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time)
  • Cleanup: remove partial extracts on failure
  • Empty folder cleanup after successful extraction

Data Flow

User clicks Start
    ↓
DownloadManager.startRun()
    ↓
Scheduler.start() — begins loop
    ↓
Scheduler.findNextItem() — picks highest priority queued item
    ↓
Pipeline.runDownload(item)
    ├── debridService.unrestrict(item.link)
    │   └── error? → ErrorClassifier.classify() → DownloadError
    ├── StreamWriter.streamToFile(url, path, opts)
    │   ├── Resume validation
    │   ├── HTTP streaming with stall detection
    │   └── error? → ErrorClassifier.classify() → DownloadError
    └── integrityCheck(file)
        └── error? → DownloadError(FileCorrupt)
    ↓
Success → mark completed → Scheduler fills next slot
Error → RetryManager.shouldRetry(item, error)
    ├── retry: true → Scheduler.queueRetry(item, delay, actions)
    └── retry: false → mark failed
    ↓
All items done → PostProcessor.run(package)
    ├── Extract archives
    ├── Verify extracted files
    └── Cleanup

Resume Validation (Key Improvement)

Current problem: Resume trusts file size blindly, leading to corrupt files.

New approach:

  1. Before sending Range header, validate existing file:
    • stat.size must match item.downloadedBytes (±1KB tolerance for flush timing)
    • If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes
    • If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh
  2. After resume response, validate:
    • 206 with correct Content-Range → continue
    • 200 (range ignored) → classify as RangeIgnored, retry with fresh link
    • 416 → check if file actually complete (existingBytes >= expectedTotal)
  3. After download complete, validate:
    • Final file size matches expected total
    • CRC check if manifest available

Stall Detection (Key Improvement)

Current problem: Downloads hang and stall detection sometimes doesn't trigger properly.

New approach:

  • Per-download heartbeat: StreamWriter emits heartbeat every second with bytes received
  • Scheduler monitors heartbeats: if no heartbeat for stallTimeoutMs → abort + retry
  • Disk-write awareness: separate tracking for "blocked on disk write" vs "blocked on network"
  • Global watchdog: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue
  • Validating timeout: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase)

Post-Processing State Machine (Key Improvement)

Current problem: Extraction can loop infinitely if archive keeps failing.

New approach:

ExtractionState per archive:
{
  archivePath: string;
  status: "pending" | "extracting" | "done" | "failed";
  attempts: number;        // max 3
  lastError?: string;
  redownloaded: boolean;   // max 1 redownload
}

Rules:

  • Max 3 extraction attempts per archive
  • If ArchiveCorrupt + redownloaded === false → queue redownload, set redownloaded = true
  • If ArchiveCorrupt + redownloaded === true → fail permanently
  • If WrongPassword → try next password from list, fail after all exhausted
  • If ExtractorCrash → retry once, fail on second crash
  • Package marked as "completed with errors" if any archive fails permanently

Migration Strategy

  1. New code lives in src/main/download/ directory
  2. Old src/main/download-manager.ts stays untouched as reference
  3. New download-manager.ts in src/main/download/ implements same class interface
  4. Switch import in main.ts from old to new
  5. Test with real downloads
  6. Delete old file when stable

Testing Strategy

  • Unit tests for ErrorClassifier (classify every known error string)
  • Unit tests for RetryManager (policy application, shelving threshold)
  • Unit tests for StreamWriter resume validation logic
  • Unit tests for PostProcessor state machine
  • Integration test: Scheduler + Pipeline with mocked debrid/HTTP