Expose a single runnable end-to-end test command for fast regression sweeps and update the smoke test to validate preflight and localization-aware clip errors reliably across language modes.