Replace monolithic download-manager.ts (9500 lines) with 7 focused modules: - error-classifier.ts: 25+ typed DownloadErrorKind enum, classifier functions for network/HTTP/debrid/extraction errors — no more string matching - retry-manager.ts: Declarative per-error-kind retry policies, exponential backoff, shelving after 15 failures, state export/import - stream-writer.ts: HTTP stream → file with pre-resume validation, stall detection, NTFS-aligned buffered writing, Range-ignored detection - pipeline.ts: Single download lifecycle (unrestrict → stream → verify), throws typed errors, caller decides retry strategy - post-processor.ts: Extraction state machine with hard caps (3 attempts per archive, 5 rounds per package), no infinite loops - scheduler.ts: Queue management with priority-based slot allocation, heartbeat stall detection, global watchdog, provider cooldowns - download-manager.ts: Drop-in orchestrator (~1500 lines), same public API Fixes: 1. Hanging downloads: heartbeat-based stall detection + global watchdog 2. Wrong error classification: typed enum at point of origin 3. Unreliable resume: file size vs tracker validation, Range-ignored detection 4. Extraction loops: bounded retries with state machine 215 new unit tests for error-classifier and retry-manager (all passing). Build compiles cleanly. Same IPC interface — UI unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
260 lines
10 KiB
Markdown
260 lines
10 KiB
Markdown
# Download System v2 — Complete Redesign
|
|
|
|
## Goal
|
|
Replace the 9500-line monolithic `download-manager.ts` with a clean, modular download system that fixes:
|
|
1. Downloads hanging without clean restart
|
|
2. Wrong error classification leading to wrong retry paths
|
|
3. Unreliable resume (corrupt files, unnecessary restarts)
|
|
4. Post-processing (extraction) breaking or looping
|
|
|
|
## Constraints
|
|
- Same IPC interface — drop-in replacement, no UI changes needed
|
|
- Same external dependencies (debrid.ts, storage.ts, integrity.ts)
|
|
- Same session/settings persistence format
|
|
|
|
## Architecture
|
|
|
|
### Module Structure
|
|
|
|
```
|
|
src/main/download/
|
|
├── download-manager.ts # Orchestrator (~500 lines) — coordination only
|
|
├── scheduler.ts # Queue management, slot allocation, priorities
|
|
├── pipeline.ts # Single download flow: unrestrict → stream → verify
|
|
├── stream-writer.ts # HTTP streaming, resume, buffered writing, NTFS
|
|
├── error-classifier.ts # Typed error system (enums, not string matching)
|
|
├── retry-manager.ts # Central retry logic, backoff, shelving, state
|
|
└── post-processor.ts # Extraction queue, hybrid retry, cleanup
|
|
```
|
|
|
|
### Module Responsibilities
|
|
|
|
#### 1. download-manager.ts (Orchestrator)
|
|
- Holds session state, packages, items
|
|
- Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.)
|
|
- Delegates to Scheduler for queue management
|
|
- Delegates to Pipeline for individual downloads
|
|
- Delegates to PostProcessor for extraction
|
|
- Emits same events as current (progress, status changes)
|
|
- Handles persistence (save/load session)
|
|
|
|
#### 2. scheduler.ts
|
|
- `findNextItem()`: priority-based queue with provider cooldown awareness
|
|
- `fillSlots()`: start downloads up to maxParallel
|
|
- Scheduler loop with generation guard (prevents stale schedulers)
|
|
- Global stall watchdog
|
|
- Provider cooldown tracking (circuit breaker)
|
|
- AllDebrid paced-start / hoster-limit logic
|
|
|
|
#### 3. pipeline.ts
|
|
- `runDownload(item, context)`: single download lifecycle
|
|
- Step 1: Unrestrict link via debrid service
|
|
- Step 2: Stream file via StreamWriter
|
|
- Step 3: Verify integrity (CRC if available)
|
|
- Step 4: Signal completion
|
|
- Each step returns typed result or throws typed DownloadError
|
|
- No retry logic here — just reports what happened
|
|
|
|
#### 4. stream-writer.ts
|
|
- `streamToFile(url, targetPath, options)`: HTTP streaming
|
|
- Resume support with pre-validation:
|
|
- Check existing file size against tracked downloadedBytes
|
|
- Truncate if sparse file detected (pre-allocated > actual)
|
|
- Send Range header only after validation
|
|
- HTTP 416 handling (complete vs incomplete)
|
|
- Server-ignored-range detection (200 instead of 206)
|
|
- Buffered writing with NTFS 4KB alignment
|
|
- Sparse file pre-allocation (Windows)
|
|
- Content-Disposition filename override
|
|
- Stall detection (configurable timeout, default 10s)
|
|
- Drain timeout for slow disks (default 5min)
|
|
- Progress reporting via callback
|
|
|
|
#### 5. error-classifier.ts
|
|
- `DownloadErrorKind` enum with all error categories
|
|
- `DownloadError` class extending Error with `.kind` property
|
|
- `classifyError(error, context)`: takes raw error + context, returns DownloadError
|
|
- Classifies at point of origin (HTTP layer, fetch layer, debrid layer)
|
|
- No post-hoc string matching needed
|
|
- `classifyHttpStatus(status, headers)`: HTTP-specific classification
|
|
- `classifyFetchError(error)`: network-level classification
|
|
- `classifyUnrestrictError(error)`: debrid-specific classification
|
|
|
|
```typescript
|
|
enum DownloadErrorKind {
|
|
// Network
|
|
NetworkReset, // ECONNRESET, socket hang up, EPIPE
|
|
Timeout, // No data received within stall timeout
|
|
DnsFailure, // ENOTFOUND
|
|
|
|
// HTTP
|
|
RangeNotSatisfied, // 416 — file may be complete or need restart
|
|
RangeIgnored, // Server sent 200 instead of 206
|
|
ServerError, // 500, 502, 503
|
|
RateLimited, // 429
|
|
Forbidden, // 403 — link expired
|
|
NotFound, // 404 — file removed from CDN
|
|
|
|
// Provider/Debrid
|
|
UnrestrictFailed, // Provider can't convert link
|
|
ProviderBusy, // Concurrent download limit
|
|
ProviderDown, // Provider service unavailable
|
|
HosterUnavailable, // Hoster down (not provider issue)
|
|
LinkDead, // Permanent: file deleted at source
|
|
QuotaExceeded, // Daily traffic limit
|
|
|
|
// Filesystem
|
|
DiskFull, // ENOSPC
|
|
PermissionDenied, // EACCES, EPERM
|
|
FileLocked, // EBUSY (Windows)
|
|
|
|
// Integrity
|
|
FileCorrupt, // CRC/size mismatch after download
|
|
FileTruncated, // Downloaded less than expected
|
|
|
|
// Extraction
|
|
WrongPassword, // Archive password incorrect
|
|
ArchiveCorrupt, // Archive header/data damaged
|
|
ExtractorCrash, // 7-Zip/WinRAR process crashed
|
|
ExtractionLoop, // Same archive failed extraction 3+ times
|
|
}
|
|
```
|
|
|
|
#### 6. retry-manager.ts
|
|
- `RetryManager` class holds all retry state per item
|
|
- Deklarative retry policies per DownloadErrorKind:
|
|
|
|
```typescript
|
|
interface RetryPolicy {
|
|
maxRetries: number; // 0 = no retry (permanent failure)
|
|
backoff: "fixed" | "exponential" | "linear";
|
|
baseDelayMs: number;
|
|
maxDelayMs: number;
|
|
resetFile: boolean; // Delete partial file before retry
|
|
switchProvider: boolean; // Try different provider
|
|
refreshLink: boolean; // Get new direct link from debrid
|
|
providerCooldownMs?: number; // Apply cooldown to current provider
|
|
}
|
|
```
|
|
|
|
- `shouldRetry(itemId, error)`: returns { retry: boolean, delayMs, actions[] }
|
|
- `recordFailure(itemId, error)`: tracks failure for shelving
|
|
- Shelving: after N total failures (configurable, default 15), pause 90s + reset provider
|
|
- State persists across stop/start (same format as current retryStateByItem)
|
|
- `resetItem(itemId)`: clear all retry state (manual reset)
|
|
|
|
#### 7. post-processor.ts
|
|
- `PostProcessor` class with extraction queue
|
|
- State machine per package:
|
|
```
|
|
pending → extracting → done
|
|
↓
|
|
retry (max 2) → failed
|
|
```
|
|
- Tracks extraction attempts per archive (max 3 retries)
|
|
- No infinite loops: hard cap on retry count
|
|
- Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time)
|
|
- Cleanup: remove partial extracts on failure
|
|
- Empty folder cleanup after successful extraction
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
User clicks Start
|
|
↓
|
|
DownloadManager.startRun()
|
|
↓
|
|
Scheduler.start() — begins loop
|
|
↓
|
|
Scheduler.findNextItem() — picks highest priority queued item
|
|
↓
|
|
Pipeline.runDownload(item)
|
|
├── debridService.unrestrict(item.link)
|
|
│ └── error? → ErrorClassifier.classify() → DownloadError
|
|
├── StreamWriter.streamToFile(url, path, opts)
|
|
│ ├── Resume validation
|
|
│ ├── HTTP streaming with stall detection
|
|
│ └── error? → ErrorClassifier.classify() → DownloadError
|
|
└── integrityCheck(file)
|
|
└── error? → DownloadError(FileCorrupt)
|
|
↓
|
|
Success → mark completed → Scheduler fills next slot
|
|
Error → RetryManager.shouldRetry(item, error)
|
|
├── retry: true → Scheduler.queueRetry(item, delay, actions)
|
|
└── retry: false → mark failed
|
|
↓
|
|
All items done → PostProcessor.run(package)
|
|
├── Extract archives
|
|
├── Verify extracted files
|
|
└── Cleanup
|
|
```
|
|
|
|
### Resume Validation (Key Improvement)
|
|
|
|
Current problem: Resume trusts file size blindly, leading to corrupt files.
|
|
|
|
New approach:
|
|
1. Before sending Range header, validate existing file:
|
|
- `stat.size` must match `item.downloadedBytes` (±1KB tolerance for flush timing)
|
|
- If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes
|
|
- If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh
|
|
2. After resume response, validate:
|
|
- 206 with correct Content-Range → continue
|
|
- 200 (range ignored) → classify as RangeIgnored, retry with fresh link
|
|
- 416 → check if file actually complete (existingBytes >= expectedTotal)
|
|
3. After download complete, validate:
|
|
- Final file size matches expected total
|
|
- CRC check if manifest available
|
|
|
|
### Stall Detection (Key Improvement)
|
|
|
|
Current problem: Downloads hang and stall detection sometimes doesn't trigger properly.
|
|
|
|
New approach:
|
|
- **Per-download heartbeat**: StreamWriter emits heartbeat every second with bytes received
|
|
- **Scheduler monitors heartbeats**: if no heartbeat for stallTimeoutMs → abort + retry
|
|
- **Disk-write awareness**: separate tracking for "blocked on disk write" vs "blocked on network"
|
|
- **Global watchdog**: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue
|
|
- **Validating timeout**: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase)
|
|
|
|
### Post-Processing State Machine (Key Improvement)
|
|
|
|
Current problem: Extraction can loop infinitely if archive keeps failing.
|
|
|
|
New approach:
|
|
```
|
|
ExtractionState per archive:
|
|
{
|
|
archivePath: string;
|
|
status: "pending" | "extracting" | "done" | "failed";
|
|
attempts: number; // max 3
|
|
lastError?: string;
|
|
redownloaded: boolean; // max 1 redownload
|
|
}
|
|
```
|
|
|
|
Rules:
|
|
- Max 3 extraction attempts per archive
|
|
- If `ArchiveCorrupt` + `redownloaded === false` → queue redownload, set redownloaded = true
|
|
- If `ArchiveCorrupt` + `redownloaded === true` → fail permanently
|
|
- If `WrongPassword` → try next password from list, fail after all exhausted
|
|
- If `ExtractorCrash` → retry once, fail on second crash
|
|
- Package marked as "completed with errors" if any archive fails permanently
|
|
|
|
## Migration Strategy
|
|
|
|
1. New code lives in `src/main/download/` directory
|
|
2. Old `src/main/download-manager.ts` stays untouched as reference
|
|
3. New `download-manager.ts` in `src/main/download/` implements same class interface
|
|
4. Switch import in `main.ts` from old to new
|
|
5. Test with real downloads
|
|
6. Delete old file when stable
|
|
|
|
## Testing Strategy
|
|
|
|
- Unit tests for ErrorClassifier (classify every known error string)
|
|
- Unit tests for RetryManager (policy application, shelving threshold)
|
|
- Unit tests for StreamWriter resume validation logic
|
|
- Unit tests for PostProcessor state machine
|
|
- Integration test: Scheduler + Pipeline with mocked debrid/HTTP
|