# Download System v2 — Complete Redesign

## Goal
Replace the 9500-line monolithic `download-manager.ts` with a clean, modular download system that fixes:
1. Downloads hanging without clean restart
2. Wrong error classification leading to wrong retry paths
3. Unreliable resume (corrupt files, unnecessary restarts)
4. Post-processing (extraction) breaking or looping

## Constraints
- Same IPC interface — drop-in replacement, no UI changes needed
- Same external dependencies (debrid.ts, storage.ts, integrity.ts)
- Same session/settings persistence format

## Architecture

### Module Structure

```
src/main/download/
├── download-manager.ts      # Orchestrator (~500 lines) — coordination only
├── scheduler.ts             # Queue management, slot allocation, priorities
├── pipeline.ts              # Single download flow: unrestrict → stream → verify
├── stream-writer.ts         # HTTP streaming, resume, buffered writing, NTFS
├── error-classifier.ts      # Typed error system (enums, not string matching)
├── retry-manager.ts         # Central retry logic, backoff, shelving, state
└── post-processor.ts        # Extraction queue, hybrid retry, cleanup
```

### Module Responsibilities

#### 1. download-manager.ts (Orchestrator)
- Holds session state, packages, items
- Exposes same IPC methods as current (startRun, stopRun, pauseItem, etc.)
- Delegates to Scheduler for queue management
- Delegates to Pipeline for individual downloads
- Delegates to PostProcessor for extraction
- Emits same events as current (progress, status changes)
- Handles persistence (save/load session)

#### 2. scheduler.ts
- `findNextItem()`: priority-based queue with provider cooldown awareness
- `fillSlots()`: start downloads up to maxParallel
- Scheduler loop with generation guard (prevents stale schedulers)
- Global stall watchdog
- Provider cooldown tracking (circuit breaker)
- AllDebrid paced-start / hoster-limit logic

#### 3. pipeline.ts
- `runDownload(item, context)`: single download lifecycle
- Step 1: Unrestrict link via debrid service
- Step 2: Stream file via StreamWriter
- Step 3: Verify integrity (CRC if available)
- Step 4: Signal completion
- Each step returns typed result or throws typed DownloadError
- No retry logic here — just reports what happened

#### 4. stream-writer.ts
- `streamToFile(url, targetPath, options)`: HTTP streaming
- Resume support with pre-validation:
  - Check existing file size against tracked downloadedBytes
  - Truncate if sparse file detected (pre-allocated > actual)
  - Send Range header only after validation
- HTTP 416 handling (complete vs incomplete)
- Server-ignored-range detection (200 instead of 206)
- Buffered writing with NTFS 4KB alignment
- Sparse file pre-allocation (Windows)
- Content-Disposition filename override
- Stall detection (configurable timeout, default 10s)
- Drain timeout for slow disks (default 5min)
- Progress reporting via callback

#### 5. error-classifier.ts
- `DownloadErrorKind` enum with all error categories
- `DownloadError` class extending Error with `.kind` property
- `classifyError(error, context)`: takes raw error + context, returns DownloadError
  - Classifies at point of origin (HTTP layer, fetch layer, debrid layer)
  - No post-hoc string matching needed
- `classifyHttpStatus(status, headers)`: HTTP-specific classification
- `classifyFetchError(error)`: network-level classification
- `classifyUnrestrictError(error)`: debrid-specific classification

```typescript
enum DownloadErrorKind {
  // Network
  NetworkReset,        // ECONNRESET, socket hang up, EPIPE
  Timeout,             // No data received within stall timeout
  DnsFailure,          // ENOTFOUND

  // HTTP
  RangeNotSatisfied,   // 416 — file may be complete or need restart
  RangeIgnored,        // Server sent 200 instead of 206
  ServerError,         // 500, 502, 503
  RateLimited,         // 429
  Forbidden,           // 403 — link expired
  NotFound,            // 404 — file removed from CDN

  // Provider/Debrid
  UnrestrictFailed,    // Provider can't convert link
  ProviderBusy,        // Concurrent download limit
  ProviderDown,        // Provider service unavailable
  HosterUnavailable,   // Hoster down (not provider issue)
  LinkDead,            // Permanent: file deleted at source
  QuotaExceeded,       // Daily traffic limit

  // Filesystem
  DiskFull,            // ENOSPC
  PermissionDenied,    // EACCES, EPERM
  FileLocked,          // EBUSY (Windows)

  // Integrity
  FileCorrupt,         // CRC/size mismatch after download
  FileTruncated,       // Downloaded less than expected

  // Extraction
  WrongPassword,       // Archive password incorrect
  ArchiveCorrupt,      // Archive header/data damaged
  ExtractorCrash,      // 7-Zip/WinRAR process crashed
  ExtractionLoop,      // Same archive failed extraction 3+ times
}
```

#### 6. retry-manager.ts
- `RetryManager` class holds all retry state per item
- Deklarative retry policies per DownloadErrorKind:

```typescript
interface RetryPolicy {
  maxRetries: number;          // 0 = no retry (permanent failure)
  backoff: "fixed" | "exponential" | "linear";
  baseDelayMs: number;
  maxDelayMs: number;
  resetFile: boolean;          // Delete partial file before retry
  switchProvider: boolean;     // Try different provider
  refreshLink: boolean;       // Get new direct link from debrid
  providerCooldownMs?: number; // Apply cooldown to current provider
}
```

- `shouldRetry(itemId, error)`: returns { retry: boolean, delayMs, actions[] }
- `recordFailure(itemId, error)`: tracks failure for shelving
- Shelving: after N total failures (configurable, default 15), pause 90s + reset provider
- State persists across stop/start (same format as current retryStateByItem)
- `resetItem(itemId)`: clear all retry state (manual reset)

#### 7. post-processor.ts
- `PostProcessor` class with extraction queue
- State machine per package:
  ```
  pending → extracting → done
                ↓
            retry (max 2) → failed
  ```
- Tracks extraction attempts per archive (max 3 retries)
- No infinite loops: hard cap on retry count
- Hybrid extract retry: if archive corrupt + redownload suggested, queue redownload (max 1 time)
- Cleanup: remove partial extracts on failure
- Empty folder cleanup after successful extraction

### Data Flow

```
User clicks Start
    ↓
DownloadManager.startRun()
    ↓
Scheduler.start() — begins loop
    ↓
Scheduler.findNextItem() — picks highest priority queued item
    ↓
Pipeline.runDownload(item)
    ├── debridService.unrestrict(item.link)
    │   └── error? → ErrorClassifier.classify() → DownloadError
    ├── StreamWriter.streamToFile(url, path, opts)
    │   ├── Resume validation
    │   ├── HTTP streaming with stall detection
    │   └── error? → ErrorClassifier.classify() → DownloadError
    └── integrityCheck(file)
        └── error? → DownloadError(FileCorrupt)
    ↓
Success → mark completed → Scheduler fills next slot
Error → RetryManager.shouldRetry(item, error)
    ├── retry: true → Scheduler.queueRetry(item, delay, actions)
    └── retry: false → mark failed
    ↓
All items done → PostProcessor.run(package)
    ├── Extract archives
    ├── Verify extracted files
    └── Cleanup
```

### Resume Validation (Key Improvement)

Current problem: Resume trusts file size blindly, leading to corrupt files.

New approach:
1. Before sending Range header, validate existing file:
   - `stat.size` must match `item.downloadedBytes` (±1KB tolerance for flush timing)
   - If mismatch > 1MB: file is from sparse pre-allocation → truncate to downloadedBytes
   - If mismatch < 1MB but > 1KB: suspicious → delete and restart fresh
2. After resume response, validate:
   - 206 with correct Content-Range → continue
   - 200 (range ignored) → classify as RangeIgnored, retry with fresh link
   - 416 → check if file actually complete (existingBytes >= expectedTotal)
3. After download complete, validate:
   - Final file size matches expected total
   - CRC check if manifest available

### Stall Detection (Key Improvement)

Current problem: Downloads hang and stall detection sometimes doesn't trigger properly.

New approach:
- **Per-download heartbeat**: StreamWriter emits heartbeat every second with bytes received
- **Scheduler monitors heartbeats**: if no heartbeat for stallTimeoutMs → abort + retry
- **Disk-write awareness**: separate tracking for "blocked on disk write" vs "blocked on network"
- **Global watchdog**: if ALL active downloads show zero progress for 60s (excluding disk-blocked), abort all and re-queue
- **Validating timeout**: if unrestrict takes > 30s, abort and retry (prevents infinite hang in validation phase)

### Post-Processing State Machine (Key Improvement)

Current problem: Extraction can loop infinitely if archive keeps failing.

New approach:
```
ExtractionState per archive:
{
  archivePath: string;
  status: "pending" | "extracting" | "done" | "failed";
  attempts: number;        // max 3
  lastError?: string;
  redownloaded: boolean;   // max 1 redownload
}
```

Rules:
- Max 3 extraction attempts per archive
- If `ArchiveCorrupt` + `redownloaded === false` → queue redownload, set redownloaded = true
- If `ArchiveCorrupt` + `redownloaded === true` → fail permanently
- If `WrongPassword` → try next password from list, fail after all exhausted
- If `ExtractorCrash` → retry once, fail on second crash
- Package marked as "completed with errors" if any archive fails permanently

## Migration Strategy

1. New code lives in `src/main/download/` directory
2. Old `src/main/download-manager.ts` stays untouched as reference
3. New `download-manager.ts` in `src/main/download/` implements same class interface
4. Switch import in `main.ts` from old to new
5. Test with real downloads
6. Delete old file when stable

## Testing Strategy

- Unit tests for ErrorClassifier (classify every known error string)
- Unit tests for RetryManager (policy application, shelving threshold)
- Unit tests for StreamWriter resume validation logic
- Unit tests for PostProcessor state machine
- Integration test: Scheduler + Pipeline with mocked debrid/HTTP