beta-real-debrid-downloader/docs/plans/2026-03-08-download-system-v2-design.md
Sucukdeluxe efa0909e11 feat: Download System v2 — complete rewrite of download pipeline
Replace monolithic download-manager.ts (9500 lines) with 7 focused modules:

- error-classifier.ts: 25+ typed DownloadErrorKind enum, classifier functions
  for network/HTTP/debrid/extraction errors — no more string matching
- retry-manager.ts: Declarative per-error-kind retry policies, exponential
  backoff, shelving after 15 failures, state export/import
- stream-writer.ts: HTTP stream → file with pre-resume validation, stall
  detection, NTFS-aligned buffered writing, Range-ignored detection
- pipeline.ts: Single download lifecycle (unrestrict → stream → verify),
  throws typed errors, caller decides retry strategy
- post-processor.ts: Extraction state machine with hard caps (3 attempts
  per archive, 5 rounds per package), no infinite loops
- scheduler.ts: Queue management with priority-based slot allocation,
  heartbeat stall detection, global watchdog, provider cooldowns
- download-manager.ts: Drop-in orchestrator (~1500 lines), same public API

Fixes:
1. Hanging downloads: heartbeat-based stall detection + global watchdog
2. Wrong error classification: typed enum at point of origin
3. Unreliable resume: file size vs tracker validation, Range-ignored detection
4. Extraction loops: bounded retries with state machine

215 new unit tests for error-classifier and retry-manager (all passing).
Build compiles cleanly. Same IPC interface — UI unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 18:14:17 +01:00

260 lines
10 KiB
Markdown

# Download System v2 — Complete Redesign
## Goal
Replace the 9500-line monolithic `download-manager.ts` with a clean, modular download system that fixes:
1. Downloads hanging without clean restart
2. Wrong error classification leading to wrong retry paths
3. Unreliable resume (corrupt files, unnecessary restarts)
4. Post-processing (extraction) breaking or looping
## Constraints
- Same IPC interface — drop-in replacement, no UI changes needed
- Same external dependencies (debrid.ts, storage.ts, integrity.ts)
- Same session/settings persistence format
## Architecture
### Module Structure
```
src/main/download/
├── download-manager.ts # Orchestrator (~500 lines) — coordination only
├── scheduler.ts # Queue management, slot allocation, priorities
├── pipeline.ts # Single download flow: unrestrict → stream → verify
├── stream-writer.ts # HTTP streaming, resume, buffered writing, NTFS
├── error-classifier.ts # Typed error system (enums, not string matching)
├── retry-manager.ts # Central retry logic, backoff, shelving, state
└── post-processor.ts # Extraction queue, hybrid retry, cleanup
```
### Module Responsibilities
#### 1. download-manager.ts (Orchestrator)
- Holds session state, packages, items
- Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.)
- Delegates to Scheduler for queue management
- Delegates to Pipeline for individual downloads
- Delegates to PostProcessor for extraction
- Emits same events as current (progress, status changes)
- Handles persistence (save/load session)
#### 2. scheduler.ts
- `findNextItem()`: priority-based queue with provider cooldown awareness
- `fillSlots()`: start downloads up to maxParallel
- Scheduler loop with generation guard (prevents stale schedulers)
- Global stall watchdog
- Provider cooldown tracking (circuit breaker)
- AllDebrid paced-start / hoster-limit logic
#### 3. pipeline.ts
- `runDownload(item, context)`: single download lifecycle
- Step 1: Unrestrict link via debrid service
- Step 2: Stream file via StreamWriter
- Step 3: Verify integrity (CRC if available)
- Step 4: Signal completion
- Each step returns typed result or throws typed DownloadError
- No retry logic here — just reports what happened
#### 4. stream-writer.ts
- `streamToFile(url, targetPath, options)`: HTTP streaming
- Resume support with pre-validation:
- Check existing file size against tracked downloadedBytes
- Truncate if sparse file detected (pre-allocated > actual)
- Send Range header only after validation
- HTTP 416 handling (complete vs incomplete)
- Server-ignored-range detection (200 instead of 206)
- Buffered writing with NTFS 4KB alignment
- Sparse file pre-allocation (Windows)
- Content-Disposition filename override
- Stall detection (configurable timeout, default 10s)
- Drain timeout for slow disks (default 5min)
- Progress reporting via callback
#### 5. error-classifier.ts
- `DownloadErrorKind` enum with all error categories
- `DownloadError` class extending Error with `.kind` property
- `classifyError(error, context)`: takes raw error + context, returns DownloadError
- Classifies at point of origin (HTTP layer, fetch layer, debrid layer)
- No post-hoc string matching needed
- `classifyHttpStatus(status, headers)`: HTTP-specific classification
- `classifyFetchError(error)`: network-level classification
- `classifyUnrestrictError(error)`: debrid-specific classification
```typescript
enum DownloadErrorKind {
// Network
NetworkReset, // ECONNRESET, socket hang up, EPIPE
Timeout, // No data received within stall timeout
DnsFailure, // ENOTFOUND
// HTTP
RangeNotSatisfied, // 416 — file may be complete or need restart
RangeIgnored, // Server sent 200 instead of 206
ServerError, // 500, 502, 503
RateLimited, // 429
Forbidden, // 403 — link expired
NotFound, // 404 — file removed from CDN
// Provider/Debrid
UnrestrictFailed, // Provider can't convert link
ProviderBusy, // Concurrent download limit
ProviderDown, // Provider service unavailable
HosterUnavailable, // Hoster down (not provider issue)
LinkDead, // Permanent: file deleted at source
QuotaExceeded, // Daily traffic limit
// Filesystem
DiskFull, // ENOSPC
PermissionDenied, // EACCES, EPERM
FileLocked, // EBUSY (Windows)
// Integrity
FileCorrupt, // CRC/size mismatch after download
FileTruncated, // Downloaded less than expected
// Extraction
WrongPassword, // Archive password incorrect
ArchiveCorrupt, // Archive header/data damaged
ExtractorCrash, // 7-Zip/WinRAR process crashed
ExtractionLoop, // Same archive failed extraction 3+ times
}
```
#### 6. retry-manager.ts
- `RetryManager` class holds all retry state per item
- Deklarative retry policies per DownloadErrorKind:
```typescript
interface RetryPolicy {
maxRetries: number; // 0 = no retry (permanent failure)
backoff: "fixed" | "exponential" | "linear";
baseDelayMs: number;
maxDelayMs: number;
resetFile: boolean; // Delete partial file before retry
switchProvider: boolean; // Try different provider
refreshLink: boolean; // Get new direct link from debrid
providerCooldownMs?: number; // Apply cooldown to current provider
}
```
- `shouldRetry(itemId, error)`: returns { retry: boolean, delayMs, actions[] }
- `recordFailure(itemId, error)`: tracks failure for shelving
- Shelving: after N total failures (configurable, default 15), pause 90s + reset provider
- State persists across stop/start (same format as current retryStateByItem)
- `resetItem(itemId)`: clear all retry state (manual reset)
#### 7. post-processor.ts
- `PostProcessor` class with extraction queue
- State machine per package:
```
pending → extracting → done
retry (max 2) → failed
```
- Tracks extraction attempts per archive (max 3 retries)
- No infinite loops: hard cap on retry count
- Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time)
- Cleanup: remove partial extracts on failure
- Empty folder cleanup after successful extraction
### Data Flow
```
User clicks Start
DownloadManager.startRun()
Scheduler.start() — begins loop
Scheduler.findNextItem() — picks highest priority queued item
Pipeline.runDownload(item)
├── debridService.unrestrict(item.link)
│ └── error? → ErrorClassifier.classify() → DownloadError
├── StreamWriter.streamToFile(url, path, opts)
│ ├── Resume validation
│ ├── HTTP streaming with stall detection
│ └── error? → ErrorClassifier.classify() → DownloadError
└── integrityCheck(file)
└── error? → DownloadError(FileCorrupt)
Success → mark completed → Scheduler fills next slot
Error → RetryManager.shouldRetry(item, error)
├── retry: true → Scheduler.queueRetry(item, delay, actions)
└── retry: false → mark failed
All items done → PostProcessor.run(package)
├── Extract archives
├── Verify extracted files
└── Cleanup
```
### Resume Validation (Key Improvement)
Current problem: Resume trusts file size blindly, leading to corrupt files.
New approach:
1. Before sending Range header, validate existing file:
- `stat.size` must match `item.downloadedBytes` (±1KB tolerance for flush timing)
- If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes
- If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh
2. After resume response, validate:
- 206 with correct Content-Range → continue
- 200 (range ignored) → classify as RangeIgnored, retry with fresh link
- 416 → check if file actually complete (existingBytes >= expectedTotal)
3. After download complete, validate:
- Final file size matches expected total
- CRC check if manifest available
### Stall Detection (Key Improvement)
Current problem: Downloads hang and stall detection sometimes doesn't trigger properly.
New approach:
- **Per-download heartbeat**: StreamWriter emits heartbeat every second with bytes received
- **Scheduler monitors heartbeats**: if no heartbeat for stallTimeoutMs → abort + retry
- **Disk-write awareness**: separate tracking for "blocked on disk write" vs "blocked on network"
- **Global watchdog**: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue
- **Validating timeout**: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase)
### Post-Processing State Machine (Key Improvement)
Current problem: Extraction can loop infinitely if archive keeps failing.
New approach:
```
ExtractionState per archive:
{
archivePath: string;
status: "pending" | "extracting" | "done" | "failed";
attempts: number; // max 3
lastError?: string;
redownloaded: boolean; // max 1 redownload
}
```
Rules:
- Max 3 extraction attempts per archive
- If `ArchiveCorrupt` + `redownloaded === false` → queue redownload, set redownloaded = true
- If `ArchiveCorrupt` + `redownloaded === true` → fail permanently
- If `WrongPassword` → try next password from list, fail after all exhausted
- If `ExtractorCrash` → retry once, fail on second crash
- Package marked as "completed with errors" if any archive fails permanently
## Migration Strategy
1. New code lives in `src/main/download/` directory
2. Old `src/main/download-manager.ts` stays untouched as reference
3. New `download-manager.ts` in `src/main/download/` implements same class interface
4. Switch import in `main.ts` from old to new
5. Test with real downloads
6. Delete old file when stable
## Testing Strategy
- Unit tests for ErrorClassifier (classify every known error string)
- Unit tests for RetryManager (policy application, shelving threshold)
- Unit tests for StreamWriter resume validation logic
- Unit tests for PostProcessor state machine
- Integration test: Scheduler + Pipeline with mocked debrid/HTTP