Before: a non-transient / non-file-rejected / non-account-specific
error (e.g. "VOE Upload: <any generic message>") would burn the full
retries-per-account budget on the primary before the rotation logic
even kicked in. On retries=5 that's "Retry 2/5 Primär", "Retry 3/5
Primär", … all on the same broken account before the fallback gets
a shot.
Now:
- main.js pre-resolves the next fallback for every hoster at batch-
start (stored in _accountOverrides via the existing session cache +
primeOverrides path). Pre-job-swap still ignores it until the
primary is actually marked failed, so jobs still begin on primary.
- upload-manager.js: in the retry loop's generic error branch,
_hasPendingOverride() checks whether a usable fallback is ready.
If yes and the error is NOT transient (transient = network glitch
= retry same acc), break out to rotation. Marks primary failed,
rotates to acc2, retries there.
- Result: for a 2-account hoster, worst case is 1 attempt on primary
+ retries-per-account on fallback, instead of N × 2. Transient
network errors (ENOTFOUND / ECONNRESET / socket hang up) keep the
old "retry same account" semantics because the network is the
issue, not the account.
- Single-account hosters: unchanged. No pending override = classic
retry-on-same-account until exhausted.
3 new tests pin: generic + override → rotate on attempt 1; transient
+ override → stay on same acc; no override → classic retry. 87/87
green.
Previously: clicking "Erneut versuchen" after a batch had already
finished spawned a fresh UploadManager with empty _failedAccounts and
_accountOverrides. The first retry then burned the full retry budget
on the account we already knew was dead (e.g. disk-space-full byse
account) before rotation kicked in again — same problem we fixed for
within-batch flow but for across-batch flow.
- main.js: two module-level maps (_sessionFailedAccounts,
_sessionAccountOverrides) cache rotation state across batches in the
same app session. Populated on account-failed and on both
switchAccount paths (event-driven + save-config re-resolve).
- lib/upload-manager.js: startBatch(tasks, opts) accepts
primeFailedAccounts + primeOverrides. State is still cleared first
(legacy behaviour for callers without opts), then re-primed from the
passed session state. batch-start rot-log entry reports how many
entries were primed for diagnostics.
- Tests: prime priority is honored (pre-job-swap fires on first
attempt, no fast-fail, no upload to acc1); back-compat for callers
that don't pass opts.
App restart remains the reset signal — matches the "neuer Tag, acc1
hat vielleicht wieder Platz" expectation.
Three small, unrelated reliability improvements bundled:
1. lib/hosters.js (_resolveByseUploadByName): drop the "only one new
file → claim it" fallback. Under parallel byse uploads, job A's
poller could claim job B's newly-uploaded file and return the wrong
URL. Now requires exact normalized name match. Trade-off: a few
false negatives if byse rewrites the filename beyond our
normalizer, but parallel correctness wins.
2. tests/upload-manager.test.js: pin the transient-network classifier
behaviour with 2 new tests covering common transient strings
(ENOTFOUND, ECONNRESET, socket hang up, fetch failed, EAI_AGAIN…)
and verifying real account-level / file-rejected errors are NOT
misclassified as transient. Baseline stays clean: 82/82 green.
3. main.js: log process.memoryUsage() snapshot at batch-start and
batch-done. One line each — harmless in the happy path, gives us
the data points needed to spot long-session RSS/heap growth across
batches without DevTools instrumentation.
Three related gaps closed so one full byse account stops wasting
attempts on every subsequent job and later-added accounts get picked
up without an app restart.
1. Pre-job-swap moved BEHIND the semaphore acquire. At scale (500 jobs
/ 1 slot) every worker was checking _failedAccounts at spawn time
before the first upload had even tried — so none of them saw the
failed state. Now each worker re-checks right before its first
upload attempt.
2. save-config IPC handler re-resolves fallbacks for any account that
is already in _failedAccounts but has no override set. Previously
account-failed only fired once per account, so a config change
after the first mark-failed was silently ignored and the batch
stayed stuck on the dead account until the app restarted.
3. UploadManager exposes getFailedAccountKeys() and getOverride(hoster)
so main.js can drive the late re-resolve without poking private
fields.
4 new tests: pre-job-swap after semaphore, getters contract, fresh
manager resets learned state, late-added fallback is honored by
subsequent jobs. 80/80 green.
Byse rejects uploads with status like "not enough disk space on your
account" when the account's storage is exhausted. The parser was
flagging every non-OK status as err.fileRejected=true, and the upload-
manager classifier additionally matched the generic "lehnte Datei ab"
prefix as file-rejected. Result: rotation was skipped on a full account
and every subsequent file failed on the same dead account.
- hosters.js: byse parser now distinguishes account-level phrases
(disk space / storage / quota / insufficient / account full) and sets
err.accountError=true for those. File-specific failures (Duplicate,
wrong format, size) keep err.fileRejected=true.
- upload-manager.js: _isFileRejectedError no longer matches the generic
"lehnte Datei ab" prefix and short-circuits when err.accountError is
true. _shouldSkipRetryOnAccountError honors the flag and has added
regex patterns as a safety net.
- Tests: 5 new unit tests covering disk-space/account-level/duplicate
and the accountError-wins-over-fileRejected precedence.
- addJobs injects new tasks into running batch (verified concurrent execution)
- addJobs rejects duplicate jobIds already in batch
- addJobs returns added=0 when not running
These tests verify the fix in v2.6.3 (files added during upload now
get injected into the running batch via addJobsToBatch).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- updater: replace undici.request() with fetch() (fixes maxRedirections
error that blocked auto-update from v1.0.0 to v1.1.0)
- upload-manager: move signalCleanup declaration outside try block
(was causing ReferenceError in catch, silently breaking ALL retries)
- upload-manager: _combineSignals now returns cleanup fn to prevent
abort listener accumulation over batch lifetime
- upload-manager: _sleep removes abort listener on normal timer fire
- hosters: apiGet removes abort listener in finally block
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>