25 min read / ,

Improving Queue Safety in Laravel

TLDR Set $tries on every queue job. Don't rely on platform defaults. 23 findings verified against Laravel 13.4.0 with links to real GitHub issues spanning 2015 to 2026.

A follow-up to Why Your Laravel Jobs Might Retry Forever After an OOM

If you take one thing from this post: set $tries on every queue job. Don’t rely on platform defaults.

If you use an AI coding tool, paste this to audit your own app:

Read https://joshsalway.com/articles/improving-queue-safety-in-laravel then audit my app/Jobs/ directory. Report what’s missing and what could break. Explain the changes and why they are useful. Don’t make changes.

Any changes you make to your application are your responsibility. Please do your own research and understand the potential risks before applying any recommendations.

This post documents 23 queue safety findings — 14 from my audit and 9 reported by other developers. Not all are bugs. Some are unsafe defaults, some are missing limits, some are design decisions with safety implications. All are verified against Laravel Framework v13.4.0 with links to real GitHub issues spanning 2015 to 2026.


How I found these

After two Vapor billing incidents ($140 in 2019, $218.90 in 2023), I traced my original application’s git history commit by commit, cross-referenced support emails, and compared Worker.php across Laravel v10, v12, and v13. My code had gaps, but so did the framework and platform. That led to a broader question: how many other places in the queue system have the same pattern?

Using Claude Code, I audited the full queue system for unbounded behaviour, counters bypassed by process death, and unsafe defaults.

Packages examined:

  • laravel/framework (v13.4.0): src/Illuminate/Queue/, src/Illuminate/Bus/, src/Illuminate/Pipeline/, src/Illuminate/Console/Scheduling/
  • laravel/vapor-core (v2.43.3): VaporWorkCommand.php, QueueHandler.php, VaporWorker.php, VaporJob.php
  • laravel/vapor-cli, laravel/cloud-cli, laravel/forge-cli, laravel/forge-sdk: checked for comparison

I also compared defaults across Sidekiq, BullMQ, Celery, Google Cloud Tasks, and AWS SQS native to find suitable guardrails and recommendations.


What likely caused my billing incidents

After investigating the original application’s git history, support emails, and vapor.yml at the time of the incidents, the root cause of the August 2023 bill ($218.90 in 7 days) became clear.

What was a skill issue

  • I didn’t set $tries on my job classes. That’s in the docs.
  • I used file_get_contents() on arbitrary URLs with no timeout inside a queue job.
  • I set queue-timeout: 900 in vapor.yml, giving each failed attempt 15 minutes of 2048MB Lambda compute time.
  • I didn’t set up AWS budget alerts. The spend was invisible until the invoice arrived.

Those are real mistakes and I own them.

What wasn’t a skill issue

  • The tries configuration has three layers that contradict each other (see finding #4).
  • There is no global default_tries config. One missed property on one job class and the behaviour depends on which layer wins.
  • Retries are completely silent. No warning, no log entry, no event.
  • Support responded to both incidents with “check your AWS invoice” and “set up budget alerts.” Neither response mentioned $tries, retry limits, or the Vapor tries default. If someone had said “add $tries to your jobs” in 2019, the 2023 incident wouldn’t have happened.

Whose fault is it?

In the previous post I wrote that the framework, hosting platform, and application code should all have seatbelts in place to reduce the likelihood of infinite retries in serverless environments. All three failed:

Mine: Jobs without $tries, unbounded HTTP calls, 15-minute timeout.

The framework: Unsafe defaults — $tries null (see #4), backoff 0 (see #2), WithoutOverlapping lock infinite (see #3), maxExceptions bypassed by OOM (see #1) — meant my mistakes had no guardrails. Every other major queue system defaults to bounded retries.

The platform (Vapor): Three layers of tries configuration that contradict each other, none clearly documented (see #4). Support didn’t flag the root cause in either incident.

The git log from August 20, 2023 (private repository) tells the story: 11 commits in a single day, reverts of reverts, and the first $tries = 5 added mid-crisis. No single layer caused this alone. My bad code, combined with unsafe framework defaults, on a platform with inconsistent and undocumented tries configuration, turned a coding mistake into $358.90 across two separate incidents four years apart.


Do we actually need seatbelts or guardrails?

Nobody plans to have a car accident. You don’t put on a seatbelt because you expect to crash. You put it on because if something goes wrong, the seatbelt is the difference between a bad day and a catastrophic one.

Fun fact: When Volvo invented the three-point seatbelt in 1959, they made the patent available to every car manufacturer for free because they believed safety should be a shared standard, not a competitive advantage. It still took decades for seatbelts to become mandatory worldwide.

Queue safety is the same. Most jobs work fine. Most deployments don’t have runaway billing. Most developers never hit an OOM infinite retry. But when it happens, the difference between a job with $tries = 5 and a job with no $tries is the difference between a failed job in your failed_jobs table and a $200 bill you don’t see until the invoice arrives.

Guardrails on a road don’t slow you down. They’re invisible until the moment you need them. $tries, $backoff, $timeout, and failed() are the same. They cost nothing in normal operation. They save you when something unexpected happens: an API goes down, a payload is larger than expected, a memory limit is hit, a deploy goes wrong.

On serverless platforms, a runaway job shows up as a bill. On traditional servers, it’s less obvious. A job retrying in a tight loop just quietly consumes CPU and memory. Your server gets slower, your other jobs back up, but there’s no invoice to trigger an investigation. Without monitoring or logging, you might never know it’s happening.

If you’ve never had a queue incident, that’s great. Add the guardrails anyway. They’re free insurance. Regardless of the safety angle, these recommendations will likely make your queue jobs more reliable. And if you already knew all of this, pat yourself on the back.


Recommendations you can do today

Even after writing this audit, I checked my own applications and found jobs without $tries set — including the exact job that caused my billing incident. That’s how easy this is to miss. Safe defaults would catch it for everyone.

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;
use Throwable;

class ProcessEmailJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $tries = 5;
    public $maxExceptions = 0;
    public $backoff = [30, 60, 300];
    public $timeout = 120;

    public function retryUntil()
    {
        return now()->addHours(2);
    }

    public function failed(Throwable $exception)
    {
        // Log, notify, or alert. Don't let failures be silent.
        Log::error('ProcessEmailJob permanently failed', [
            'exception' => $exception->getMessage(),
        ]);
    }
}

Why each property matters:

  • $tries = 5 — hard cap on total attempts. Don’t rely on platform defaults — set this explicitly on every job.
  • $maxExceptions = 0 — fails the job on the first catchable exception. See note below.
  • $backoff = [30, 60, 300] — exponential delays between retries. Without this, retries are immediate (0 seconds).

Why $maxExceptions = 0? This is deliberately aggressive because of finding #1: the maxExceptions counter never increments during OOM, so any value above 0 allows infinite retries when OOM and catchable exceptions alternate. Setting it to 0 means the first catchable exception stops the loop. Retry tolerance is handled by $tries and $backoff instead, which work across process restarts.

  • $timeout = 120 — kills the job after 2 minutes. Without this, jobs can run until Lambda’s 15-minute ceiling.
  • retryUntil() — time-based circuit breaker. Even if $tries is misconfigured, the job stops after 2 hours.
  • failed() — get notified when a job permanently fails. Silent failures are what cause $200 bills.

On WithoutOverlapping middleware

public function middleware()
{
    return [
        (new WithoutOverlapping($this->key))
            ->expireAfter(minutes: 30)
            ->releaseAfter(seconds: 30),
    ];
}

Always set expireAfter. The default is 0 (never expires). If your worker crashes while holding the lock, the job is permanently blocked without this.

On ShouldBeUnique jobs

class ImportDataJob implements ShouldQueue, ShouldBeUnique
{
    public $uniqueFor = 3600; // seconds
}

Always set $uniqueFor. The default is 0, which means the lock never expires (same pattern as WithoutOverlapping). If the job fails or the worker crashes, the lock persists and blocks all future dispatches of that job until cache TTL. Issue #49890 reported by @naquad was closed as “no-fix for us.”

Prune failed jobs

// In app/Console/Kernel.php or routes/console.php
Schedule::command('queue:prune-failed --hours=168')->daily();

The failed_jobs table grows unbounded. At 300k+ records, queue:retry --all will OOM (Issue #49185 reported by @arharp). Prune weekly.

On external HTTP calls inside jobs

// Bad: no timeout, no size limit
$content = file_get_contents($url);

// Good: bounded timeout, exception on failure
$response = Http::timeout(10)->get($url);
$response->throw();
$content = $response->body();

Every external call inside a queue job should have a timeout. One slow or unresponsive endpoint can hold a Lambda invocation running for minutes.

Hard limits

These catch problems regardless of whether individual jobs are configured correctly:

  • queue:work --tries=3 — set this on your worker command or supervisor config. Acts as a floor even if a job class forgets $tries. On Vapor, set SQS_TRIES=3 in your environment.
  • SQS dead letter queue — configure a redrive policy with maxReceiveCount in AWS. After N receive attempts, SQS moves the message to a DLQ automatically, even if your application crashes. This is your infrastructure-level circuit breaker and works independently of Laravel.
  • AWS budget alerts — won’t stop the spend, but tells you early. Set via the Vapor UI.
  • Monitor queue depth — as of Laravel 13.4.0, Queue::pendingJobs(), Queue::delayedJobs(), and Queue::reservedJobs() (PR #59511) let you inspect queue state natively. On AWS, you can also use CloudWatch alarms on ApproximateNumberOfMessagesVisible.
  • Lambda concurrency limits — set reserved concurrency on your queue Lambda to cap how many concurrent invocations can run. Limits the burn rate during a runaway.
  • queue-memory and queue-timeout in vapor.yml — keep these as low as your jobs actually need. Don’t set queue-timeout: 900 to “be safe” like I did. Lower timeout = lower cost per failed attempt.
  • Ideally, a cost-based kill switch — a soft limit sends you an email when spend hits a threshold. A hard limit actually stops the workload. Vapor and most serverless platforms only offer soft limits today. A hard limit that pauses queue processing when spend exceeds a configurable amount (e.g. $30/month) would have prevented both of my incidents entirely. The alert told me the house was on fire. A hard limit would have put it out.

A base job class

If you want to apply safe defaults across all your jobs without repeating yourself:

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;
use Throwable;

abstract class SafeJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $tries = 5;
    public $maxExceptions = 0;
    public $backoff = [30, 60, 300];
    public $timeout = 120;

    public function failed(Throwable $exception)
    {
        Log::error(static::class . ' permanently failed', [
            'exception' => $exception->getMessage(),
        ]);
    }
}

Then extend it:

class ProcessEmailJob extends SafeJob
{
    // Inherits all safe defaults
    // Override any property if this job needs different limits
}

The Findings

Critical: Unbounded retries and permanent resource locks

These findings can cause infinite retry loops, permanent job lockout, or runaway costs on serverless platforms. They are the highest priority for anyone running queues in production.

1. maxExceptions counter only increments inside the catch block

File: src/Illuminate/Queue/Worker.php - process() and markJobAsFailedIfWillExceedMaxExceptions()

This is the bug documented in my previous blog post. The maxExceptions counter is incremented inside handleJobException(), which is called from the catch (Throwable) block. When the process is killed by an out-of-memory error, the catch block never executes. The counter is never incremented. The job retries with a counter of zero indefinitely.

There is a pre-fire check for maxTries (markJobAsFailedIfAlreadyExceedsMaxAttempts) but no equivalent pre-fire check for maxExceptions.

Real-world: Issue #58207 (Dec 2025) reported by @pingencom — 31 comments from production users independently building workarounds.

Suggested fix: Increment the exception counter before fire(), decrement after successful completion. If the worker dies during fire(), the increment persists. On the next pickup, a pre-fire check can fail the job if the counter meets the threshold.


2. Default backoff is 0 seconds

File: src/Illuminate/Queue/Worker.php - calculateBackoff()

When a job fails and has no backoff property, the default is 0 seconds. The job is immediately re-queued. Combined with a high or unlimited retry count, this creates a tight failure loop where the same job fails and retries as fast as the worker can process it.

Real-world: Issue #44680 (Oct 2022) reported by @hjeldin — backoff ignored after timeout kill, job retries immediately.

Suggested fix: Default backoff to a small positive value (e.g. 3 seconds) instead of 0.


3. WithoutOverlapping middleware defaults to an infinite lock

File: src/Illuminate/Queue/Middleware/WithoutOverlapping.php

The default expiresAfter is 0, which translates to a lock that never expires. If a job using this middleware is killed (OOM, SIGKILL, server crash), the lock is never released. All future instances of that job are permanently blocked, either released back to the queue indefinitely or silently dropped.

Real-world: Issue #37060 (Apr 2021) reported by @lasselehtinen — lock not released on failed jobs. Issue #15750 (Oct 2016) reported by @PieterScheffers — cron stuck due to mutex not cleaned. Issue #50330 (Mar 2024) reported by @nickma42 — race condition in multi-server environments.

Suggested fix: Default expiresAfter to a reasonable value (e.g. 3600 seconds) instead of 0.


4. $tries defaults to null (unlimited) in job payload

Files: src/Illuminate/Queue/Queue.php - getJobTries(), src/Illuminate/Queue/Worker.php - markJobAsFailedIfAlreadyExceedsMaxAttempts()

When a job class does not define a $tries property, the payload contains maxTries: null. This falls through to the command-level --tries option. The queue:work command defaults to --tries=1 (safe). On Laravel Vapor, VaporWorkCommand defines --tries=0 (unlimited) in its command signature, but the QueueHandler runtime that invokes it passes $_ENV['SQS_TRIES'] ?? 3, so in practice the default is 3 unless explicitly overridden. This inconsistency between the command definition and the runtime invocation is confusing and not documented.

The interaction between job-level, command-level, and runtime-level tries configuration is complex. A global default in config/queue.php would provide a single, visible safety net.

Real-world: PR #29385 (Aug 2019) by @SjorsO — attempted to change the default from 0 to 1. Not merged. The PR states: “Changing the default solves the problem of broken jobs getting stuck in an infinite loop when you forget to pass the queue worker a —tries flag.” Issue #58207 (Dec 2025) reported by @pingencom — jobs retried endlessly with $tries=0.

Suggested fix: Add a default_tries option to config/queue.php that applies when neither the job class nor the command line specifies a value.


5. Race condition in maxExceptions counter initialization

File: src/Illuminate/Queue/Worker.php - markJobAsFailedIfWillExceedMaxExceptions()

The counter initialization uses a non-atomic get/put sequence. When multiple workers process retries of the same job UUID simultaneously, both workers can see a missing key and reset the counter, causing the exception count to be lost. An atomic Cache::add() call would prevent this race condition.

Real-world: Silent bug — users see the symptom (jobs retrying forever) without understanding the cause. Compounds Issue #58207 (Dec 2025) reported by @pingencom.

Suggested fix: Replace the Cache::get() / Cache::put() initialization sequence with Cache::add(), which only sets the key if it doesn’t already exist. The subsequent Cache::increment() is already atomic.


6. handleJobException releases when failure checks are bypassed

File: src/Illuminate/Queue/Worker.php - handleJobException()

The finally block in handleJobException releases the job back to the queue if it hasn’t been deleted, released, or marked as failed. These three guards are correct and work well when maxTries or maxExceptions properly mark the job as failed. However, when those checks are bypassed (because maxTries is 0 or maxExceptions fails to increment due to OOM), the job is never marked as failed, and the release always proceeds. This amplifies the unlimited retry issue in those specific scenarios.

Real-world: PR #45876 (Jan 2023) by @khepin — “jobs that fail because of high memory usage all stay on the queue and accumulate there. If enough of them have accumulated, the workers keep spinning on jobs that can never go through.”

Suggested improvement: Resolving finding #1 (pre-fire maxExceptions check) would address this for jobs with maxExceptions set. For the broader case, consider logging a warning when a job is released back to the queue with a high attempt count and no maxTries or maxExceptions configured. This wouldn’t change behaviour but would make the problem visible instead of silent.


Important: Unsafe defaults and resource exhaustion risks

These findings involve default values or missing limits that could cause problems under load, after outages, or with specific configuration combinations. They are less likely to cause immediate damage but represent gaps that production systems can hit.

7. Redis migrationBatchSize defaults to unlimited

File: src/Illuminate/Queue/RedisQueue.php

The migrationBatchSize defaults to -1, which means unlimited. The Lua script that migrates expired delayed and reserved jobs fetches all of them in a single call. If a large number of delayed jobs expire simultaneously (for example, after an outage or server restart), this single Lua operation can block Redis for all clients, consume significant memory in the Lua execution context, and potentially trigger the lua-time-limit threshold.

Real-world: PR #43310 (Jul 2022) by @AbiriAmir — “scheduling a large number of jobs for a specific time causes Redis to halt since migrate script is a heavy script.” This PR added the migrationBatchSize config but defaulted it to -1 (unlimited).

Suggested fix: Default migrationBatchSize to a bounded value (e.g. 1000).


8. ThrottlesExceptions retryAfterMinutes defaults to 0

File: src/Illuminate/Queue/Middleware/ThrottlesExceptions.php

When an exception triggers the throttle, the job is released with a delay of retryAfterMinutes * 60. The default for retryAfterMinutes is 0, meaning the job is immediately re-queued after an exception. Combined with a high retry count, this creates a tight failure loop similar to the 0-second backoff issue.

Real-world: Issue #36637 (Mar 2021) reported by @tairau — backoff docblock says seconds but value is used as minutes. Issue #56087 (Jun 2025) reported by @michaeldzjap — ThrottlesExceptions overrides FailOnException, causing jobs to retry despite being told to fail.

Suggested fix: Default retryAfterMinutes to a small positive value (e.g. 5).


9. RateLimited middleware release loop

File: src/Illuminate/Queue/Middleware/RateLimited.php

When a job is rate-limited, it is released back to the queue. Each release counts as an attempt. In high-concurrency environments with many workers competing for the same rate limit, a job can be released and re-attempted many times without ever executing its actual logic. If the job has a high or unlimited retry count, this consumes worker capacity without doing useful work.

Real-world: Issue #53157 (Oct 2024) reported by @amir9480RateLimiter perSecond not working as expected for queue jobs.

Suggested improvement: Consider not counting rate-limited releases as attempts, or providing an option to distinguish between “failed” and “deferred” releases.


10. Database queue reserved-but-expired can cause duplicate execution

File: src/Illuminate/Queue/DatabaseQueue.php - getNextAvailableJob() + isReservedButExpired()

Jobs where reserved_at is older than retry_after seconds are treated as available. The default retry_after is 90 seconds. If a job legitimately takes longer than 90 seconds to process, another worker can pick it up while it is still running. The same job executes concurrently in two workers.

Real-world: Issue #8577 (Apr 2015) reported by @m4tthumphrey — multiple Redis workers picking up the same job. Issue #7046 (Jan 2015) reported by @easmith — database queue deadlocks from concurrent execution.

Suggested improvement: Document this interaction clearly and consider a longer default retry_after, or add a mechanism for long-running jobs to extend their reservation.


11. retryUntil can override maxTries

File: src/Illuminate/Queue/Worker.php - markJobAsFailedIfAlreadyExceedsMaxAttempts()

If retryUntil() returns a future timestamp, the maxTries check is skipped entirely. A job with retryUntil() returning a far-future date (e.g. one year) will retry for the entire window regardless of how many times it has failed. Combined with a 0-second backoff, this is a sustained failure loop for the duration of the window.

Real-world: Issue #35199 (Nov 2020) reported by @trevorgehman — “Queue worker ignores job’s maxTries setting if using retryUntil().”

Suggested improvement: Document this interaction clearly. Consider checking both retryUntil and maxTries rather than treating them as mutually exclusive.


12. Pipeline memory retention between jobs

File: src/Illuminate/Pipeline/Pipeline.php

The Pipeline retains references to $passable and $pipes after execution. In long-running queue workers, this means the previous job’s data is held in memory until the next job overwrites it. While the retention is bounded to one job’s worth of memory, it contributes to gradual memory growth in workers, increasing the likelihood of OOM events.

Real-world: Issue #59402 (Mar 2026) reported by @JoshSalwayJob::$instance retains command object reference after fire(). Issue #56395 (Jul 2025) reported by @momala454 — job objects retain large data in memory after processing.

Suggested fix: Null $passable and $pipes after pipeline execution.


Improvement: Edge cases and documentation gaps

These findings are lower risk but represent real gaps that specific configurations can hit.

13. Reserved job migration ignores attempt count

File: src/Illuminate/Queue/RedisQueue.php - migrate()

When reserved jobs expire (worker died mid-processing), they are moved back to the ready queue without checking their attempt count. The attempt check only happens when the worker next pops and processes the job. This means a job that has already exceeded its max tries can be re-enqueued and picked up before being failed.

Real-world: Issue #32103 (Mar 2020) reported by @mfn — job retried despite still running, reserved timeout expired mid-execution.

Suggested improvement: The pre-fire check (markJobAsFailedIfAlreadyExceedsMaxAttempts) already handles this when the worker pops the job. The gap is the brief window where an over-limit job sits in the “available” queue before being popped. Checking attempts inside the Lua migration script would be expensive. A lighter approach: log or emit an event when a reserved job is migrated back, so monitoring tools can flag jobs that are cycling.


14. No timeout enforcement on Windows

File: src/Illuminate/Queue/Worker.php - registerTimeoutHandler()

The timeout handler relies on pcntl_alarm, which is only available on systems with the pcntl extension (Linux/Mac). On Windows, Laravel deliberately throws an error if you set a timeout, forcing you to pass --timeout 0 to acknowledge you’re running without timeout protection. This is the right design choice (silently ignoring the timeout would be worse), but it means Windows workers have no timeout enforcement at all. A job that enters an infinite loop or deadlock will block the worker process forever.

Real-world: Issue #15002 (Aug 2016) reported by @StevenBock — queues require explicit --timeout 0 on Windows. Issue #14909 (Aug 2016) reported by @ac1982 — PHP requires --enable-pcntl for queue timeouts.

Suggested improvement: Add a fallback timeout mechanism for environments without pcntl support.


What the community found

The following issues were reported by other developers. I didn’t find these in my audit — they found them first. I’m including them here so everything is in one place.

Worker enters infinite silent loop on non-database exceptions

Issue #59517 (Apr 2026, OPEN) reported by @thuggins-engrain. stopWorkerIfLostConnection() only checks for database connection errors. If SQS SDK, Redis auth, or HTTP errors occur in getNextJob(), the worker catches the exception, sleeps 1 second, and retries forever with no exit condition.

Batch deadlocks under high concurrency

Issue #39722 (Nov 2021) reported by @gm-lunatix. Issue #36478 (Mar 2021) reported by @murphatron. Issue #40574 (Jan 2022) reported by @walkonthemarz. The job_batches table uses SELECT FOR UPDATE, causing row-level lock contention with high-concurrency workers.

Batch never finishes when jobs fail

Issue #36180 (Feb 2021) reported by @stephenstack. Issue #35711 (Dec 2020) reported by @nalingia. When a batch job fails, pending_jobs may never reach 0, so then/finally callbacks never fire. Batch hangs forever. Closed as completed but no linked fix PR found.

ShouldBeUnique lock not released

Issue #49890 (Jan 2024) reported by @naquad — lock not released when dependent model is deleted before processing. Closed with “this is a no-fix for us right now.” Issue #37729 (Jun 2021) reported by @rflatt-reassured — lock only releases after timeout, not after successful completion.

Failed jobs table causes OOM when retrying

Issue #49185 (Nov 2023) reported by @arharp. Issue #52129 (Jul 2024) reported by @godwin-loyaltek. RetryCommand and FailedJobProviderInterface::all() load the entire failed_jobs table into memory. At 300k+ records, it OOMs.

Chain jobs silently terminate on queue restart

Issue #45426 (Dec 2022) reported by @Monilsh. If queue:restart is issued mid-chain, remaining jobs are silently dropped. No error, no failed job record. Closed as “expected behavior.”

Timed-out worker kill leaks resources

Issue #30351 (Oct 2019) reported by @halaei. Worker::kill() sends SIGKILL, which prevents cleanup of temp files, connections, and locks.

No backpressure on dispatch

PR #57787 (Nov 2025) by @yousefkadah — community attempt to add queue depth notifications via a maxPendingJobs property. Not merged. There is no queue depth checking in the dispatch path. If the dispatch rate exceeds the consumption rate, the queue grows without limit.

Failed job providers crash on corrupted payload

Issue #59635 (Apr 2026, OPEN) reported by @ruttydm. UUID-based failed job providers use json_decode($payload, true)['uuid'] without null check. Corrupted payloads crash the provider and the failure record is permanently lost.


Current status

Across the 14 findings in this audit and the 9 community-reported issues above, the resolution status as of April 2026:

  • 3 partially addressed: migrationBatchSize config added but defaults to unlimited (#7, PR #43310), ThrottlesExceptions docblock corrected but default remains 0 (#8, PR #36642), Windows timeout workaround documented but no auto-detection (#15002)
  • 3 closed as intentional design decisions: ShouldBeUnique lock behaviour (#49890, “no-fix for us”), chain jobs dropped on restart (#45426, “expected behavior”), retryUntil/maxTries mutual exclusion (#35199, by design)
  • 4 currently open: OOM infinite retry (#58207), worker silent loop (#59517), corrupted payload crash (#59635), job memory not released (#56395)
  • 13 closed without framework code changes

Some of the closed issues are from older Laravel versions and may have been addressed indirectly through major version changes. Some reflect intentional design trade-offs that reasonable people can disagree on. I’ve included them because the safety implications exist regardless of whether the behaviour is intentional.

I verified all 14 findings and 9 community reports against the current v13.4.0 source. The oldest linked issues date back to January 2015 (#7046) and August 2016 (#15002). The same code patterns, same defaults, and same behaviours are still present.


What the docs and support could improve

Documentation

  • Add a “Queue Safety” section to the queue docs. $tries, $maxExceptions, $backoff, and $timeout are scattered across different sections. A single page showing them together with a recommended safe configuration would help.
  • Document the Vapor tries configuration. VaporWorkCommand defines --tries=0, but the runtime passes SQS_TRIES ?? 3. queue:work defaults to --tries=1. Three layers, three different values, none documented together.
  • Document the retryUntil / maxTries mutual exclusion. When retryUntil() returns a future timestamp, maxTries is skipped entirely.
  • Document the maxExceptions OOM limitation. It only works for catchable exceptions. Fatal errors bypass the counter entirely.
  • Document the ShouldBeUnique lock lifecycle. The default $uniqueFor is 0 (never expires). Issue #49890 was closed as “no-fix for us” but the safety implication remains.
  • Add a serverless queue checklist to the Vapor docs. (“Have you set $tries? $timeout? $backoff? AWS budget alerts? SQS dead letter queue?”)

Support

  • Ask about $tries when customers report unexpected queue costs. A standard triage question like “Do your job classes have $tries set?” would catch the most common root cause immediately.
  • Link to safe configuration examples in cost-related support replies.
  • Consider a proactive dashboard warning when a job has retried more than N times without $tries set. The data is available in SQS’s ApproximateReceiveCount.

Disclaimer

  • This audit was AI-assisted using Claude Code against Laravel Framework v13.4.0. Every finding includes the exact file path and method name so you can verify it yourself.
  • I have not tested every suggested fix in production. Some may have trade-offs or edge cases I haven’t considered.
  • Some findings may have been addressed in ways I haven’t identified. I have not used Laravel Vapor since 2023 and do not currently have an account, so Vapor-specific observations are based on the public vapor-core source code, not runtime testing. Things may have changed.
  • These are starting points, not finished PRs. Published in good faith for the benefit of the community.

There is a reasonable design philosophy where the framework intentionally leaves these guardrails to the developer. The tools exist ($tries, $maxExceptions, $backoff, $timeout) and it’s the developer’s responsibility to configure them. Changing defaults is a breaking change for anyone relying on current behaviour, and some of these suggestions would need careful migration paths.

Where I respectfully disagree is on the defaults. Every other major queue system I compared (Sidekiq, BullMQ, Celery, Google Cloud Tasks) defaults to bounded retries and requires developers to opt in to unlimited behaviour. Laravel does the opposite.

If any of these findings are inaccurate or have already been addressed, I’m happy to update this post with corrections.