Replace monolithic download-manager.ts (9500 lines) with 7 focused modules: - error-classifier.ts: 25+ typed DownloadErrorKind enum, classifier functions for network/HTTP/debrid/extraction errors — no more string matching - retry-manager.ts: Declarative per-error-kind retry policies, exponential backoff, shelving after 15 failures, state export/import - stream-writer.ts: HTTP stream → file with pre-resume validation, stall detection, NTFS-aligned buffered writing, Range-ignored detection - pipeline.ts: Single download lifecycle (unrestrict → stream → verify), throws typed errors, caller decides retry strategy - post-processor.ts: Extraction state machine with hard caps (3 attempts per archive, 5 rounds per package), no infinite loops - scheduler.ts: Queue management with priority-based slot allocation, heartbeat stall detection, global watchdog, provider cooldowns - download-manager.ts: Drop-in orchestrator (~1500 lines), same public API Fixes: 1. Hanging downloads: heartbeat-based stall detection + global watchdog 2. Wrong error classification: typed enum at point of origin 3. Unreliable resume: file size vs tracker validation, Range-ignored detection 4. Extraction loops: bounded retries with state machine 215 new unit tests for error-classifier and retry-manager (all passing). Build compiles cleanly. Same IPC interface — UI unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
10 KiB
Download System v2 — Complete Redesign
Goal
Replace the 9500-line monolithic download-manager.ts with a clean, modular download system that fixes:
- Downloads hanging without clean restart
- Wrong error classification leading to wrong retry paths
- Unreliable resume (corrupt files, unnecessary restarts)
- Post-processing (extraction) breaking or looping
Constraints
- Same IPC interface — drop-in replacement, no UI changes needed
- Same external dependencies (debrid.ts, storage.ts, integrity.ts)
- Same session/settings persistence format
Architecture
Module Structure
src/main/download/
├── download-manager.ts # Orchestrator (~500 lines) — coordination only
├── scheduler.ts # Queue management, slot allocation, priorities
├── pipeline.ts # Single download flow: unrestrict → stream → verify
├── stream-writer.ts # HTTP streaming, resume, buffered writing, NTFS
├── error-classifier.ts # Typed error system (enums, not string matching)
├── retry-manager.ts # Central retry logic, backoff, shelving, state
└── post-processor.ts # Extraction queue, hybrid retry, cleanup
Module Responsibilities
1. download-manager.ts (Orchestrator)
- Holds session state, packages, items
- Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.)
- Delegates to Scheduler for queue management
- Delegates to Pipeline for individual downloads
- Delegates to PostProcessor for extraction
- Emits same events as current (progress, status changes)
- Handles persistence (save/load session)
2. scheduler.ts
findNextItem(): priority-based queue with provider cooldown awarenessfillSlots(): start downloads up to maxParallel- Scheduler loop with generation guard (prevents stale schedulers)
- Global stall watchdog
- Provider cooldown tracking (circuit breaker)
- AllDebrid paced-start / hoster-limit logic
3. pipeline.ts
runDownload(item, context): single download lifecycle- Step 1: Unrestrict link via debrid service
- Step 2: Stream file via StreamWriter
- Step 3: Verify integrity (CRC if available)
- Step 4: Signal completion
- Each step returns typed result or throws typed DownloadError
- No retry logic here — just reports what happened
4. stream-writer.ts
streamToFile(url, targetPath, options): HTTP streaming- Resume support with pre-validation:
- Check existing file size against tracked downloadedBytes
- Truncate if sparse file detected (pre-allocated > actual)
- Send Range header only after validation
- HTTP 416 handling (complete vs incomplete)
- Server-ignored-range detection (200 instead of 206)
- Buffered writing with NTFS 4KB alignment
- Sparse file pre-allocation (Windows)
- Content-Disposition filename override
- Stall detection (configurable timeout, default 10s)
- Drain timeout for slow disks (default 5min)
- Progress reporting via callback
5. error-classifier.ts
DownloadErrorKindenum with all error categoriesDownloadErrorclass extending Error with.kindpropertyclassifyError(error, context): takes raw error + context, returns DownloadError- Classifies at point of origin (HTTP layer, fetch layer, debrid layer)
- No post-hoc string matching needed
classifyHttpStatus(status, headers): HTTP-specific classificationclassifyFetchError(error): network-level classificationclassifyUnrestrictError(error): debrid-specific classification
enum DownloadErrorKind {
// Network
NetworkReset, // ECONNRESET, socket hang up, EPIPE
Timeout, // No data received within stall timeout
DnsFailure, // ENOTFOUND
// HTTP
RangeNotSatisfied, // 416 — file may be complete or need restart
RangeIgnored, // Server sent 200 instead of 206
ServerError, // 500, 502, 503
RateLimited, // 429
Forbidden, // 403 — link expired
NotFound, // 404 — file removed from CDN
// Provider/Debrid
UnrestrictFailed, // Provider can't convert link
ProviderBusy, // Concurrent download limit
ProviderDown, // Provider service unavailable
HosterUnavailable, // Hoster down (not provider issue)
LinkDead, // Permanent: file deleted at source
QuotaExceeded, // Daily traffic limit
// Filesystem
DiskFull, // ENOSPC
PermissionDenied, // EACCES, EPERM
FileLocked, // EBUSY (Windows)
// Integrity
FileCorrupt, // CRC/size mismatch after download
FileTruncated, // Downloaded less than expected
// Extraction
WrongPassword, // Archive password incorrect
ArchiveCorrupt, // Archive header/data damaged
ExtractorCrash, // 7-Zip/WinRAR process crashed
ExtractionLoop, // Same archive failed extraction 3+ times
}
6. retry-manager.ts
RetryManagerclass holds all retry state per item- Deklarative retry policies per DownloadErrorKind:
interface RetryPolicy {
maxRetries: number; // 0 = no retry (permanent failure)
backoff: "fixed" | "exponential" | "linear";
baseDelayMs: number;
maxDelayMs: number;
resetFile: boolean; // Delete partial file before retry
switchProvider: boolean; // Try different provider
refreshLink: boolean; // Get new direct link from debrid
providerCooldownMs?: number; // Apply cooldown to current provider
}
shouldRetry(itemId, error): returns { retry: boolean, delayMs, actions[] }recordFailure(itemId, error): tracks failure for shelving- Shelving: after N total failures (configurable, default 15), pause 90s + reset provider
- State persists across stop/start (same format as current retryStateByItem)
resetItem(itemId): clear all retry state (manual reset)
7. post-processor.ts
PostProcessorclass with extraction queue- State machine per package:
pending → extracting → done ↓ retry (max 2) → failed - Tracks extraction attempts per archive (max 3 retries)
- No infinite loops: hard cap on retry count
- Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time)
- Cleanup: remove partial extracts on failure
- Empty folder cleanup after successful extraction
Data Flow
User clicks Start
↓
DownloadManager.startRun()
↓
Scheduler.start() — begins loop
↓
Scheduler.findNextItem() — picks highest priority queued item
↓
Pipeline.runDownload(item)
├── debridService.unrestrict(item.link)
│ └── error? → ErrorClassifier.classify() → DownloadError
├── StreamWriter.streamToFile(url, path, opts)
│ ├── Resume validation
│ ├── HTTP streaming with stall detection
│ └── error? → ErrorClassifier.classify() → DownloadError
└── integrityCheck(file)
└── error? → DownloadError(FileCorrupt)
↓
Success → mark completed → Scheduler fills next slot
Error → RetryManager.shouldRetry(item, error)
├── retry: true → Scheduler.queueRetry(item, delay, actions)
└── retry: false → mark failed
↓
All items done → PostProcessor.run(package)
├── Extract archives
├── Verify extracted files
└── Cleanup
Resume Validation (Key Improvement)
Current problem: Resume trusts file size blindly, leading to corrupt files.
New approach:
- Before sending Range header, validate existing file:
stat.sizemust matchitem.downloadedBytes(±1KB tolerance for flush timing)- If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes
- If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh
- After resume response, validate:
- 206 with correct Content-Range → continue
- 200 (range ignored) → classify as RangeIgnored, retry with fresh link
- 416 → check if file actually complete (existingBytes >= expectedTotal)
- After download complete, validate:
- Final file size matches expected total
- CRC check if manifest available
Stall Detection (Key Improvement)
Current problem: Downloads hang and stall detection sometimes doesn't trigger properly.
New approach:
- Per-download heartbeat: StreamWriter emits heartbeat every second with bytes received
- Scheduler monitors heartbeats: if no heartbeat for stallTimeoutMs → abort + retry
- Disk-write awareness: separate tracking for "blocked on disk write" vs "blocked on network"
- Global watchdog: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue
- Validating timeout: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase)
Post-Processing State Machine (Key Improvement)
Current problem: Extraction can loop infinitely if archive keeps failing.
New approach:
ExtractionState per archive:
{
archivePath: string;
status: "pending" | "extracting" | "done" | "failed";
attempts: number; // max 3
lastError?: string;
redownloaded: boolean; // max 1 redownload
}
Rules:
- Max 3 extraction attempts per archive
- If
ArchiveCorrupt+redownloaded === false→ queue redownload, set redownloaded = true - If
ArchiveCorrupt+redownloaded === true→ fail permanently - If
WrongPassword→ try next password from list, fail after all exhausted - If
ExtractorCrash→ retry once, fail on second crash - Package marked as "completed with errors" if any archive fails permanently
Migration Strategy
- New code lives in
src/main/download/directory - Old
src/main/download-manager.tsstays untouched as reference - New
download-manager.tsinsrc/main/download/implements same class interface - Switch import in
main.tsfrom old to new - Test with real downloads
- Delete old file when stable
Testing Strategy
- Unit tests for ErrorClassifier (classify every known error string)
- Unit tests for RetryManager (policy application, shelving threshold)
- Unit tests for StreamWriter resume validation logic
- Unit tests for PostProcessor state machine
- Integration test: Scheduler + Pipeline with mocked debrid/HTTP