fix(webapp): propagate abort signal through realtime proxy fetch by ericallam · Pull Request #3442 · triggerdotdev/trigger.dev

ericallam · 2026-04-24T13:54:11Z

Summary

Fixes an RSS-only memory leak in the three realtime proxy routes (/realtime/v1/runs, /realtime/v1/runs/:id, /realtime/v1/batches/:id). Client disconnects during an in-flight long-poll would leave the upstream fetch to Electric running with no way to abort it, so undici kept the socket open and buffered response chunks that would never be consumed.

Root cause

All three routes flow through RealtimeClient.streamRun/streamRuns/streamBatch → #streamRunsWhere → #performElectricRequest → longPollingFetch(url, { signal }). The chain was already signal-aware, but #streamRunsWhere hardcoded signal=undefined when calling #performElectricRequest, so no signal ever reached longPollingFetch.

When a downstream client aborts a long-poll mid-flight:

Express tears down the downstream response socket.
The longPollingFetch promise has already resolved (it returns as soon as upstream headers arrive) and handed back new Response(upstream.body, {...}).
undici keeps the upstream socket open and continues buffering chunks into the ReadableStream that nothing will ever read from.
The upstream connection is eventually closed by Electric's own poll timeout (~20s). During that window the per-request buffers stay in native memory.

These buffers live below V8's accounting — no heapUsed or external growth, no sign in heap snapshots, only RSS. An isolated standalone reproducer (fetch against a slow-streaming upstream, discard the Response before consuming its body) measures ~44 KB retained per leaked request after GC. That's consistent with the undici socket + receive buffer + HTTP parser state for a long-lived chunked response. The pattern is the shape documented in nodejs/undici#1108 and #2143.

What changed

realtimeClient.server.ts — add optional signal parameter to streamRun, streamRuns, streamBatch, and the shared #streamRunsWhere; thread it through to #performElectricRequest instead of hardcoding undefined.
realtime.v1.runs.$runId.ts, realtime.v1.runs.ts, realtime.v1.batches.$batchId.ts — pass getRequestAbortSignal() (from httpAsyncStorage.server.ts) at the call site. This is the signal wired to res.on('close') and fires reliably on downstream disconnect.
longPollingFetch.ts — belt-and-suspenders: cancel the upstream body explicitly in the error path, and treat AbortError as a clean 499 instead of a 500. This both releases undici's buffers deterministically on error and avoids spurious 500s in request logs when a client legitimately walks away.

Verification

Standalone reproducer: slow upstream server streams 32 KB chunks every 100 ms for 5 seconds per request. The proxy does fetch(url) with varying signal/cancel strategies, creates new Response(upstream.body, ...), and discards it without consuming the body (simulating the leak path).

Results from 1 000 parallel fetches per variant, measured post-GC:

variant	Δ heap	Δ RSS
A. no signal, body never consumed (the bug)	+0.3 MB	+59.4 MB
B. signal propagated, aborted after headers (this fix)	−0.1 MB	+15.4 MB
C. no signal, explicit `res.body.cancel()`	0 MB	−25.4 MB

10-round sustained test of variant B to distinguish accumulating retention from one-time allocator overhead:

round  1/10  Δ=+3.2 MB     round  6/10  Δ=-12.5 MB
round  2/10  Δ=-7.6 MB     round  7/10  Δ=-11.9 MB
round  3/10  Δ=-11.7 MB    round  8/10  Δ=-2.6 MB
round  4/10  Δ=+3.2 MB     round  9/10  Δ=-8.0 MB
round  5/10  Δ=-1.2 MB     round 10/10  Δ=-12.6 MB

RSS oscillates in a 49-65 MB band with no upward trend — signal propagation fully releases the buffers.

Risk

Behavior change only on aborted long-polls: the upstream fetch now cancels promptly instead of running to its natural timeout. This saves both memory and outbound traffic to Electric.
AbortError now surfaces as 499 rather than 500. Any dashboard or alert that counts 500s in request logs will see slightly fewer of them; this is the intended behavior.
Signal-aware parameter is optional on RealtimeClient.streamRun/streamRuns/streamBatch, so callers that don't opt in get the previous behavior.

Test plan

Existing realtime integration tests pass
Dashboard realtime views (runs list, batch details) continue working normally across tab open/close cycles
Under a burst of aborted long-polls, server RSS returns to baseline rather than climbing

The three high-traffic realtime proxy routes (/realtime/v1/runs, /realtime/v1/runs/:id, /realtime/v1/batches/:id) all route through RealtimeClient.streamRun/streamRuns/streamBatch -> #streamRunsWhere -> #performElectricRequest -> longPollingFetch(url, {signal}). The #streamRunsWhere caller hardcoded signal=undefined, so the upstream fetch to Electric had no abort signal. When a downstream client disconnected mid long-poll, undici kept the upstream socket open and continued buffering response chunks that would never be read, until Electric's own poll timeout elapsed (up to ~20s). The buffered bytes live in native memory below V8's accounting, so the retention shows up only in RSS — invisible to heap snapshots. Thread a signal parameter through streamRun/streamRuns/streamBatch (and the shared #streamRunsWhere) and pass getRequestAbortSignal() from each of the three route handlers. Also cancel the upstream body explicitly in longPollingFetch's error path and treat AbortError as a clean client-close (499) rather than a 500, matching the semantic of 'downstream went away'. Verified in an isolated standalone reproducer (fetch-a-slow-upstream pattern, 5 rounds of 200 parallel fetches, burst-and-discard): A: no signal, body never consumed Δrss=+59.4 MB B: signal propagated, abort on close Δrss=+15.4 MB (plateaus) C: no signal, res.body.cancel() Δrss=-25.4 MB Sustained 10-round test with B: RSS oscillates in a 49-65 MB band with no upward trend -> the signal propagation fully releases the undici buffers; the +15 MB residual in the single-round test was one-time allocator overhead, not accumulation.

changeset-bot · 2026-04-24T13:54:20Z

⚠️ No Changeset found

Latest commit: 586315b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-04-24T13:54:29Z

Walkthrough

This pull request addresses a memory leak in the realtime proxy RSS by implementing abort signal propagation through the fetch chain. Three realtime route handlers (realtime.v1.runs, realtime.v1.runs.$runId, realtime.v1.batches.$batchId) are updated to retrieve request abort signals and pass them to corresponding realtimeClient streaming methods. The realtimeClient service class is updated to accept optional AbortSignal parameters on streamRun, streamBatch, and streamRuns methods, which are threaded through to downstream request logic. The longPollingFetch utility is enhanced to explicitly cancel upstream response bodies on error and to convert AbortError conditions into HTTP 499 responses. A server-changes documentation file describes the fix and its impact on resource release.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: propagating an abort signal through the realtime proxy fetch path to fix a memory leak.
Description check	✅ Passed	The description comprehensively covers root cause, changes, verification with measured results, risks, and test plan, but does not follow the required template structure with explicit sections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/realtime-fetch-abort-signal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

apps/webapp/app/services/realtimeClient.server.ts (1)

354-360: Nit: parameter order inconsistent with public API.

Public methods place signal as the last parameter (after clientVersion), but #performElectricRequest places signal before clientVersion. Not functionally incorrect, but aligning the ordering avoids future confusion at call sites.

♻️ Optional refactor

-  async `#performElectricRequest`(
-    url: URL,
-    environment: RealtimeEnvironment,
-    apiVersion: API_VERSIONS,
-    signal?: AbortSignal,
-    clientVersion?: string
-  ) {
+  async `#performElectricRequest`(
+    url: URL,
+    environment: RealtimeEnvironment,
+    apiVersion: API_VERSIONS,
+    clientVersion?: string,
+    signal?: AbortSignal
+  ) {

(Update the two call sites in #streamRunsWhere accordingly.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/services/realtimeClient.server.ts` around lines 354 - 360,
The private method `#performElectricRequest` has its parameters out of order
(signal before clientVersion) compared to the public API; reorder the parameter
list so clientVersion comes before signal (i.e., ...apiVersion, clientVersion?:
string, signal?: AbortSignal) and update all call sites in `#streamRunsWhere` to
pass arguments in the new order (swap the two last args where they're currently
passed as signal, clientVersion). Ensure the function signature and every
invocation use the same parameter order to keep the API consistent.

apps/webapp/app/utils/longPollingFetch.ts (1)

50-71: Consider checking the signal's aborted state to handle edge cases in undici error handling.

Node.js fetch (undici) doesn't always throw a DOMException named "AbortError" when a signal is aborted. In edge cases—such as when the request body is already consumed or certain socket closures occur—undici can throw a TypeError instead while the signal is still aborted. Adding a check for options?.signal?.aborted as a fallback alongside the error.name check ensures the 499 response is returned consistently, even if undici's error shape changes in future versions.

♻️ Optional refactor

-    // AbortError is the expected path when downstream disconnects with a
-    // propagated signal — treat as a clean client-close, not a server error.
-    if (error instanceof Error && error.name === "AbortError") {
-      throw new Response(null, { status: 499 });
-    }
+    // AbortError is the expected path when downstream disconnects with a
+    // propagated signal — treat as a clean client-close, not a server error.
+    if (
+      options?.signal?.aborted ||
+      (error instanceof Error && error.name === "AbortError")
+    ) {
+      throw new Response(null, { status: 499 });
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/utils/longPollingFetch.ts` around lines 50 - 71, The current
catch block treats only Error.name === "AbortError" as a client-close; update
the logic to also detect when the request signal was aborted (e.g.,
options?.signal?.aborted or whichever signal variable is passed into
longPollingFetch) and treat that as the same 499 path. Specifically, inside the
catch after canceling upstream, check (options?.signal?.aborted || (error
instanceof Error && error.name === "AbortError")) and throw new Response(null, {
status: 499 }) in that case; keep the existing TypeError and generic Error
branches (and continue to log via logger) for other cases, referencing upstream,
error, and logger to locate the block to change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.server-changes/fix-realtime-fetch-signal-leak.md:
- Line 6: Summary: hyphenate the compound modifier "mid long-poll" to
"mid-long-poll". Find the sentence that reads "when a client disconnected mid
long-poll" in the changelog entry and replace "mid long-poll" with
"mid-long-poll" (or alternatively "mid long-polling request") so the compound
modifier is properly hyphenated.

---

Nitpick comments:
In `@apps/webapp/app/services/realtimeClient.server.ts`:
- Around line 354-360: The private method `#performElectricRequest` has its
parameters out of order (signal before clientVersion) compared to the public
API; reorder the parameter list so clientVersion comes before signal (i.e.,
...apiVersion, clientVersion?: string, signal?: AbortSignal) and update all call
sites in `#streamRunsWhere` to pass arguments in the new order (swap the two last
args where they're currently passed as signal, clientVersion). Ensure the
function signature and every invocation use the same parameter order to keep the
API consistent.

In `@apps/webapp/app/utils/longPollingFetch.ts`:
- Around line 50-71: The current catch block treats only Error.name ===
"AbortError" as a client-close; update the logic to also detect when the request
signal was aborted (e.g., options?.signal?.aborted or whichever signal variable
is passed into longPollingFetch) and treat that as the same 499 path.
Specifically, inside the catch after canceling upstream, check
(options?.signal?.aborted || (error instanceof Error && error.name ===
"AbortError")) and throw new Response(null, { status: 499 }) in that case; keep
the existing TypeError and generic Error branches (and continue to log via
logger) for other cases, referencing upstream, error, and logger to locate the
block to change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 74c97be1-f5e0-4afc-bff5-8faa35615848

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd1fc1 and 586315b.

📒 Files selected for processing (6)

.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/services/realtimeClient.server.ts
apps/webapp/app/utils/longPollingFetch.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: units / e2e-webapp / 🧪 E2E Tests: Webapp
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks. These are temporary debug instrumentation and must be stripped using agentcrumbs strip before merge.

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

**/*.ts{,x}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

apps/webapp/app/services/realtimeClient.server.ts

🧠 Learnings (18)

📓 Common learnings

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3430
File: apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts:0-0
Timestamp: 2026-04-23T13:26:27.529Z
Learning: In `apps/webapp/app/utils/sse.ts` (and callers such as `apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts`), the string-reason type passed to `.abort()` is NOT load-bearing for heap retention in SSE streams on Node 20. Isolation testing (2000 requests × 200 KB payload) confirmed `.abort()` vs `.abort("string")` produces identical heap deltas once `AbortSignal.any` is removed. The actual root cause of the SSE memory leak was `AbortSignal.any`, which registers signals with a FinalizationRegistry that does not reliably release them on abort (see nodejs/node#54614 and `#55351`). Named sentinel constants (`ABORT_REASON_*`, exported from `sse.ts`) are a readability/style win per the CLAUDE.md named-constant guideline, not a correctness fix. Do not flag `.abort()` vs `.abort("reason string")` as a memory-retention issue in future reviews of this codebase.

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3399
File: apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts:282-291
Timestamp: 2026-04-16T14:07:46.808Z
Learning: In `apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts` (`streamResponse`), the pattern `signal.addEventListener("abort", cleanup, { once: true })` does NOT need an explicit `removeEventListener` call in the non-abort cleanup paths (inactivity, cancel). The `AbortController` is per-request, scoped to `httpAsyncStorage` (created in `apps/webapp/server.ts` per-request middleware), so it gets GC'd when the request ends — taking the listener and closure with it. The `isCleanedUp` guard prevents double-execution, and `redis.disconnect()` is called before the request ends. Do not flag this as a listener/closure leak.

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.330Z
Learning: Applies to apps/webapp/**/*.server.ts : Never use `request.signal` for detecting client disconnects. Use `getRequestAbortSignal()` from `app/services/httpAsyncStorage.server.ts` instead, which is wired directly to Express `res.on('close')` and fires reliably

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3399
File: apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts:26-42
Timestamp: 2026-04-16T13:24:09.546Z
Learning: In `apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts`, `RedisRealtimeStreams` is only ever instantiated once as a process-wide singleton via `singleton("realtimeStreams", initializeRedisRealtimeStreams)` in `apps/webapp/app/services/realtime/v1StreamsGlobal.server.ts` (line 30). Therefore, the instance-level `_sharedRedis` field and `sharedRedis` getter are effectively process-scoped. Do not flag them as a per-request connection leak. The v2 streaming path uses a completely separate class (`S2RealtimeStreams`).

📚 Learning: 2026-04-23T13:26:27.529Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3430
File: apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts:0-0
Timestamp: 2026-04-23T13:26:27.529Z
Learning: In `apps/webapp/app/utils/sse.ts` (and callers such as `apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts`), the string-reason type passed to `.abort()` is NOT load-bearing for heap retention in SSE streams on Node 20. Isolation testing (2000 requests × 200 KB payload) confirmed `.abort()` vs `.abort("string")` produces identical heap deltas once `AbortSignal.any` is removed. The actual root cause of the SSE memory leak was `AbortSignal.any`, which registers signals with a FinalizationRegistry that does not reliably release them on abort (see nodejs/node#54614 and `#55351`). Named sentinel constants (`ABORT_REASON_*`, exported from `sse.ts`) are a readability/style win per the CLAUDE.md named-constant guideline, not a correctness fix. Do not flag `.abort()` vs `.abort("reason string")` as a memory-retention issue in future reviews of this codebase.

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-04-16T14:19:16.330Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.330Z
Learning: Applies to apps/webapp/**/*.server.ts : Never use `request.signal` for detecting client disconnects. Use `getRequestAbortSignal()` from `app/services/httpAsyncStorage.server.ts` instead, which is wired directly to Express `res.on('close')` and fires reliably

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-04-16T14:07:46.808Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3399
File: apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts:282-291
Timestamp: 2026-04-16T14:07:46.808Z
Learning: In `apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts` (`streamResponse`), the pattern `signal.addEventListener("abort", cleanup, { once: true })` does NOT need an explicit `removeEventListener` call in the non-abort cleanup paths (inactivity, cancel). The `AbortController` is per-request, scoped to `httpAsyncStorage` (created in `apps/webapp/server.ts` per-request middleware), so it gets GC'd when the request ends — taking the listener and closure with it. The `isCleanedUp` guard prevents double-execution, and `redis.disconnect()` is called before the request ends. Do not flag this as a listener/closure leak.

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-03-25T15:29:25.889Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2026-03-25T15:29:25.889Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `metadata.stream()` to stream data in realtime from inside tasks

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-04-16T14:19:16.330Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.330Z
Learning: Applies to apps/webapp/app/v3/services/{cancelTaskRun,batchTriggerV3}.server.ts : When editing services that branch on `RunEngineVersion` to support both V1 and V2 (e.g., `cancelTaskRun.server.ts`, `batchTriggerV3.server.ts`), only modify V2 code paths

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-04-16T13:24:09.546Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3399
File: apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts:26-42
Timestamp: 2026-04-16T13:24:09.546Z
Learning: In `apps/webapp/app/services/realtime/redisRealtimeStreams.server.ts`, `RedisRealtimeStreams` is only ever instantiated once as a process-wide singleton via `singleton("realtimeStreams", initializeRedisRealtimeStreams)` in `apps/webapp/app/services/realtime/v1StreamsGlobal.server.ts` (line 30). Therefore, the instance-level `_sharedRedis` field and `sharedRedis` getter are effectively process-scoped. Do not flag them as a per-request connection leak. The v2 streaming path uses a completely separate class (`S2RealtimeStreams`).

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to {packages/core,apps/webapp}/**/*.{ts,tsx} : Use zod for validation in packages/core and apps/webapp

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts

📚 Learning: 2026-03-22T13:26:12.060Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-03-22T19:24:14.403Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.ts
apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts
apps/webapp/app/utils/longPollingFetch.ts
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-03-02T12:43:17.177Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/database/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:17.177Z
Learning: Applies to internal-packages/database/**/{app,src,webapp}/**/*.{ts,tsx,js,jsx} : Use `$replica` from `~/db.server` for read-heavy queries in the webapp instead of the primary database connection

Applied to files:

apps/webapp/app/routes/realtime.v1.runs.$runId.ts
apps/webapp/app/routes/realtime.v1.batches.$batchId.ts

📚 Learning: 2026-04-20T15:06:19.815Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts:37-51
Timestamp: 2026-04-20T15:06:19.815Z
Learning: In `apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts` (and all session realtime read paths), `$replica` is intentionally used for the `resolveSessionByIdOrExternalId` call — including the `closedAt` guard in the PUT/initialize path. The project convention is to use `$replica` consistently across all session realtime routes. The race window (replica lag allowing a ghost-initialize after close) is accepted as not realistic in practice (clients follow the close API response; they do not race it). If replica lag ever causes issues, the mitigation is to revisit all realtime routes together, not to swap individual routes to `prisma`. Do not flag `$replica` usage in session realtime routes as a stale-read issue.

Applied to files:

apps/webapp/app/routes/realtime.v1.batches.$batchId.ts

📚 Learning: 2026-04-20T15:06:11.054Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/routes/realtime.v1.streams.$runId.$target.$streamId.append.ts:16-26
Timestamp: 2026-04-20T15:06:11.054Z
Learning: In `apps/webapp/app/routes/realtime.v1.streams.$runId.$target.$streamId.append.ts` and `apps/webapp/app/routes/realtime.v1.sessions.$session.$io.append.ts`, the `MAX_APPEND_BODY_BYTES` cap of 512 KiB (1024 * 512) is intentional even though `appendPart` wraps the body in JSON (which could expand quote-heavy payloads beyond S2's 1 MiB per-record limit). The maintainer considers worst-case quote-heavy payloads pathological and not realistic. If S2 rejections occur in practice, an encoded-size guard will be added inside `appendPart` rather than lowering the raw body cap on every caller. Do not flag this as an issue in future reviews.

Applied to files:

.server-changes/fix-realtime-fetch-signal-leak.md
apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-04-07T14:12:59.018Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3331
File: apps/webapp/app/runEngine/concerns/batchPayloads.server.ts:112-136
Timestamp: 2026-04-07T14:12:59.018Z
Learning: In `apps/webapp/app/runEngine/concerns/batchPayloads.server.ts`, the `pRetry` call wrapping `uploadPacketToObjectStore` intentionally retries **all** error types (no `shouldRetry` filter / `AbortError` guards). The maintainer explicitly prefers over-retrying to under-retrying because multiple heterogeneous object store backends are supported and it is impractical to enumerate all permanent error signatures. Do not flag this as an issue in future reviews.

Applied to files:

apps/webapp/app/utils/longPollingFetch.ts

📚 Learning: 2026-04-16T14:09:34.540Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3388
File: apps/webapp/app/services/platform.v3.server.ts:542-569
Timestamp: 2026-04-16T14:09:34.540Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getEntitlement` SWR loader intentionally returns `undefined` on errors (instead of a failure sentinel) because `unkey/cache` already deduplicates concurrent in-process loader calls via `deduplicateLoadFromOrigin` (a shared promise map keyed by namespace::key). During a billing outage, concurrent requests on the same process share one pending HTTP call rather than fanning out. The fail-open `{ hasAccess: true }` fallback is applied *outside* the SWR call so error results are never committed to cache. The maintainer will revisit if sustained multi-instance outage patterns emerge in practice. Do not re-raise the failure-sentinel suggestion for this function in future reviews.

Applied to files:

apps/webapp/app/utils/longPollingFetch.ts

📚 Learning: 2025-07-21T12:52:44.342Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2284
File: apps/webapp/app/services/realtimeClient.server.ts:111-127
Timestamp: 2025-07-21T12:52:44.342Z
Learning: Electric (the database service used in the realtimeClient) has built-in SQL injection protection and safely handles whereClause parameters passed via URL parameters, so direct string interpolation of runId values into SQL where clauses is safe when using Electric.

Applied to files:

apps/webapp/app/services/realtimeClient.server.ts

📚 Learning: 2026-03-26T09:02:07.973Z

Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:07.973Z
Learning: When parsing Trigger.dev task run annotations in server-side services, keep `TaskRun.annotations` strictly conforming to the `RunAnnotations` schema from `trigger.dev/core/v3`. If the code already uses `RunAnnotations.safeParse` (e.g., in a `#parseAnnotations` helper), treat that as intentional/necessary for atomic, schema-accurate annotation handling. Do not recommend relaxing the annotation payload schema or using a permissive “passthrough” parse path, since the annotations are expected to be written atomically in one operation and should not contain partial/legacy payloads that would require a looser parser.

Applied to files:

apps/webapp/app/services/realtimeClient.server.ts

🪛 LanguageTool

.server-changes/fix-realtime-fetch-signal-leak.md

[grammar] ~6-~6: Use a hyphen to join words.
Context: ...ignal, so when a client disconnected mid long-poll, undici kept the upstream sock...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (5)

apps/webapp/app/utils/longPollingFetch.ts (1)

14-17: LGTM — upstream retention for explicit cancellation.

Hoisting upstream out of the try so the catch can call upstream?.body?.cancel() is the correct shape for releasing undici socket/buffer state when the downstream stream is never consumed. Optional chaining + the inner try/catch also correctly handles the "body already transferred/locked" and "fetch rejected before assignment" cases.

apps/webapp/app/routes/realtime.v1.runs.ts (1)

29-37: LGTM.

Signal wiring matches the updated streamRuns signature (position 7), and sourcing it from getRequestAbortSignal() is the correct pattern per the webapp guideline — request.signal would not fire reliably under @remix-run/express. As per coding guidelines: "Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead".

apps/webapp/app/routes/realtime.v1.batches.$batchId.ts (1)

31-39: LGTM.

Consistent abort-signal propagation — matches streamBatch's new 7th parameter and uses getRequestAbortSignal() per the webapp guideline.

apps/webapp/app/routes/realtime.v1.runs.$runId.ts (1)

44-56: LGTM — nice inline explanation.

The inline comment explaining the RSS/undici buffering mechanism is helpful; signal is correctly placed as the 7th argument to streamRun and sourced from getRequestAbortSignal().

apps/webapp/app/services/realtimeClient.server.ts (1)

112-190: LGTM — signal is threaded consistently through the public API.

Adding signal?: AbortSignal as the final optional parameter on streamRun/streamBatch/streamRuns is backward compatible, and each call into #streamRunsWhere forwards it correctly. Aborted long-polls will throw a Response(499) out of longPollingFetch, and #performElectricRequest's try/catch still awaits #decrementConcurrency before rethrowing, so the concurrency counter is released on client disconnect.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

## Summary 8 new features, 18 improvements, 11 bug fixes. ## Breaking changes - Add server-side deprecation gate for deploys from v3 CLI versions (gated by `DEPRECATE_V3_CLI_DEPLOYS_ENABLED`). v4 CLI deploys are unaffected. ([#3415](#3415)) ## Improvements - Add `--no-browser` flag to `init` and `login` to skip auto-opening the browser during authentication. Also error loudly when `init` is run without `--yes` under non-TTY stdin (previously default-and-exited silently, leaving the project half-initialized). Both commands now show an `Examples` section in `--help`. ([#3483](#3483)) - Add `isReplay` boolean to the run context (`ctx.run.isReplay`), derived from the existing `replayedFromTaskRunFriendlyId` database field. Defaults to `false` for backwards compatibility. ([#3454](#3454)) - Redact the `resolveWaitpoint` runtime log so it only emits `id` and `type` instead of the full completed waitpoint. Previously the log printed the entire waitpoint (including `output`) to stdout in production runs, which could leak sensitive payloads. The value returned by `wait.forToken()` is unchanged. ([#3490](#3490)) - Add `SessionId` friendly ID generator and schemas for the new durable Session primitive. Exported from `@trigger.dev/core/v3/isomorphic` alongside `RunId`, `BatchId`, etc. Ships the `CreateSessionStreamWaitpoint` request/response schemas alongside the main Session CRUD. ([#3417](#3417)) - Truncate large error stacks and messages to prevent OOM crashes. Stack traces are capped at 50 frames (keeping top 5 + bottom 45 with an omission notice), individual stack lines at 1024 chars, and error messages at 1000 chars. Applied in parseError, sanitizeError, and OTel span recording. ([#3405](#3405)) ## Server changes These changes affect the self-hosted Docker image and Trigger.dev Cloud: - Add a "Back office" tab to `/admin` and a per-organization detail page at `/admin/back-office/orgs/:orgId`. The first action available on that page is editing the org's API rate limit: admins can save a `tokenBucket` override (refill rate, interval, max tokens) and see a plain-English preview of the resulting sustained rate and burst allowance. Writes are audit-logged via the server logger. ([#3434](#3434)) - Optional `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` env var to apply a default repository policy when the webapp creates new ECR repos ([#3467](#3467)) - Ship the Errors page to all users, with a polish + bug-fix pass: pinned "No channel" item in the Slack alert channel picker, viewer-timezone alert timestamps via Slack's `<!date^>` token, Activity sparkline peak tooltip, centered loading spinner and bug-icon empty state on the error detail page, ellipsis on the Configure alerts trigger. ([#3477](#3477)) - Configure the set of machine presets to build boot snapshots for at deploy time via `COMPUTE_TEMPLATE_MACHINE_PRESETS` (CSV of preset names, default `small-1x`). Use `COMPUTE_TEMPLATE_MACHINE_PRESETS_REQUIRED` (CSV, default = full PRESETS list) to scope which preset failures fail a required-mode deploy. Optional preset failures are logged and don't block the deploy. ([#3492](#3492)) - Regenerating a RuntimeEnvironment API key no longer invalidates the previous key immediately. The old key is recorded in a new `RevokedApiKey` table with a 24 hour grace window, and `findEnvironmentByApiKey` falls back to it when the submitted key doesn't match any live environment. The grace window can be ended early (or extended) by updating `expiresAt` on the row. ([#3420](#3420)) - Add the `Session` primitive — a durable, task-bound, bidirectional I/O channel that outlives a single run and acts as the run manager for `chat.agent`. Ships the Postgres `Session` + `SessionRun` tables, ClickHouse `sessions_v1` + replication service, the `sessions` JWT scope, and the public CRUD + realtime routes (`/api/v1/sessions`, `/realtime/v1/sessions/:session/:io`) including `end-and-continue` for server-orchestrated run handoffs and session-stream waitpoints. ([#3417](#3417)) - Add `KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED` flag (off by default) that overrides the cluster default and sets `dnsConfig.options.ndots` on runner pods (defaulting to 2, configurable via `KUBERNETES_POD_DNS_NDOTS`). Kubernetes defaults pods to `ndots: 5`, so any name with fewer than 5 dots — including typical external domains like `api.example.com` — is first walked through every entry in the cluster search list (`<ns>.svc.cluster.local`, `svc.cluster.local`, `cluster.local`) before being tried as-is, turning one resolution into 4+ CoreDNS queries (×2 with A+AAAA). Using a lower `ndots` value reduces DNS query amplification in the `cluster.local` zone. Note: before enabling, make sure no code path relies on search-list expansion for names with dots ≥ the configured value — those names will hit their as-is form first and could resolve externally before falling back to the cluster search path. ([#3441](#3441)) - Vercel integration option to disable auto promotions ([#3376](#3376)) - Make it clear in the admin that feature flags are global and should rarely be changed. ([#3408](#3408)) - Admin worker groups API: add GET loader and expose more fields on POST. ([#3390](#3390)) - Add 60s fresh / 60s stale SWR cache to `getEntitlement` in `platform.v3.server.ts`. Eliminates a synchronous billing-service HTTP round trip on every trigger. Reuses the existing `platformCache` (LRU memory + Redis) pattern already used for `limits` and `usage`. Cache key is `${orgId}`. Errors return a permissive `{ hasAccess: true }` fallback (existing behavior) and are also cached to prevent thundering-herd on billing outages. ([#3388](#3388)) - Show a `MicroVM` badge next to the region name on the regions page. ([#3407](#3407)) - Increase default maximum project count per organization from 10 to 25 ([#3409](#3409)) - Merge execution snapshot creation into the dequeue taskRun.update transaction, reducing 2 DB commits to 1 per dequeue operation ([#3395](#3395)) - Add per-worker Node.js heap metrics to the OTel meter — `nodejs.memory.heap.used`, `nodejs.memory.heap.total`, `nodejs.memory.heap.limit`, `nodejs.memory.external`, `nodejs.memory.array_buffers`, `nodejs.memory.rss`. Host-metrics only publishes RSS, which overstates V8 heap by the external + native footprint; these give direct heap visibility per cluster worker so `NODE_MAX_OLD_SPACE_SIZE` can be sized against observed heap peaks rather than RSS. ([#3437](#3437)) - Tag Prisma spans with `db.datasource: "writer" | "replica"` so monitors and trace queries can distinguish the writer pool from the replica pool. Applies to all `prisma:engine:*` spans (including `prisma:engine:connection` used by the connection-pool monitors) and the outer `prisma:client:operation` span. ([#3422](#3422)) - Clarify the cross-region intent in the Terraform and AI-prompt helpers on the Add Private Connection page. Both already default `supported_regions` to `["us-east-1", "eu-central-1"]`; added an inline comment / parenthetical so the user understands why both regions are listed (Trigger.dev runs in both, so the service must be consumable from either). ([#3465](#3465)) - Add `RUN_ENGINE_READ_REPLICA_SNAPSHOTS_SINCE_ENABLED` flag (default off) to route the Prisma reads inside `RunEngine.getSnapshotsSince` through the read-only replica client. Offloads the snapshot polling queries (fired by every running task runner) from the primary. When disabled, behavior is unchanged. ([#3423](#3423)) - Stop creating TaskRunTag records and _TaskRunToTaskRunTag join table entries during task triggering. The denormalized runTags string array on TaskRun already stores tag names, making the M2M relation redundant write overhead. ([#3369](#3369)) - Stop writing per-tick state (`lastScheduledTimestamp`, `nextScheduledTimestamp`, `lastRunTriggeredAt`) on `TaskSchedule` and `TaskScheduleInstance`. The schedule engine now carries the previous fire time forward via the worker queue payload, eliminating ~270K dead-tuple-driven autovacuums per year on these hot tables and the associated `IO:XactSync` mini-spikes on the writer. Customer-facing `payload.lastTimestamp` semantics are unchanged. ([#3476](#3476)) - Replace the expensive DISTINCT query for task filter dropdowns with a dedicated TaskIdentifier registry table backed by Redis. Environments migrate automatically on their next deploy, with a transparent fallback to the legacy query for unmigrated environments. Also fixes duplicate dropdown entries when a task changes trigger source, and adds active/archived grouping for removed tasks. Moves BackgroundWorkerTask reads in the trigger hot path to the read replica. ([#3368](#3368)) - Public Access Tokens (PATs) minted before an API key rotation now keep working during the 24h grace window. `validatePublicJwtKey` falls back to any non-expired `RevokedApiKey` rows for the signing environment when the primary signature check against the env's current `apiKey` fails. The fallback query only runs on the failure path, so the hot success path is unchanged. ([#3464](#3464)) - Batch items that hit the environment queue size limit now fast-fail without retries and without creating pre-failed TaskRuns. ([#3352](#3352)) - Show the cancel button in the runs list for runs in `DEQUEUED` status. `DEQUEUED` was missing from `NON_FINAL_RUN_STATUSES` so the list hid the button even though the single run page allowed it. ([#3421](#3421)) - Reduce 5xx feedback loops on hot debounce keys by quantizing `delayUntil`, adding an unlocked fast-path skip, and gracefully handling redlock contention in `handleDebounce` so the SDK no longer retries into a herd. ([#3453](#3453)) - Fix RSS memory leak in the realtime proxy routes. `/realtime/v1/runs`, `/realtime/v1/runs/:id`, and `/realtime/v1/batches/:id` called `fetch()` into Electric with no abort signal, so when a client disconnected mid long-poll, undici kept the upstream socket open and buffered response chunks that would never be consumed — retained only in RSS, invisible to V8 heap tooling. Thread `getRequestAbortSignal()` through `RealtimeClient.streamRun/streamRuns/streamBatch` to `longPollingFetch` and cancel the upstream body in the error path. Isolated reproducer showed ~44 KB retained per leaked request; signal propagation releases it cleanly. ([#3442](#3442)) - Fix memory leak where every aborted SSE connection pinned the full request/response graph on Node 20, caused by `AbortSignal.any()` in `sse.ts` retaining its source signals indefinitely (see nodejs/node#54614, nodejs/node#55351). Also clear the `setTimeout(abort)` timer in `entry.server.tsx` so successful HTML renders don't pin the React tree for 30s per request. ([#3430](#3430)) - Preserve filters on the queues page when submitting modal actions. ([#3471](#3471)) - Fix Redis connection leak in realtime streams and broken abort signal propagation. **Redis connections**: Non-blocking methods (ingestData, appendPart, getLastChunkIndex) now share a single Redis connection instead of creating one per request. streamResponse still uses dedicated connections (required for XREAD BLOCK) but now tears them down immediately via disconnect() instead of graceful quit(), with a 15s inactivity fallback. **Abort signal**: request.signal is broken in Remix/Express due to a Node.js undici GC bug (nodejs/node#55428) that severs the signal chain when Remix clones the Request internally. Added getRequestAbortSignal() wired to Express res.on("close") via httpAsyncStorage, which fires reliably on client disconnect. All SSE/streaming routes updated to use it. ([#3399](#3399)) - Prevent dashboard crash (React error #31) when span accessory item text is not a string. Filters out malformed accessory items in SpanCodePathAccessory instead of passing objects to React as children. ([#3400](#3400)) - Upgrade Remix packages from 2.1.0 to 2.17.4 to address security vulnerabilities in React Router ([#3372](#3372)) - Fix Vercel integration settings page (remove redundant section toggles) and improve the Vercel onboarding flow so the modal closes after connecting a GitHub repo and the marketplace `next` URL is preserved across the GitHub app install redirect. ([#3424](#3424)) <details> <summary>Raw changeset output</summary> # Releases ## @trigger.dev/build@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` ## trigger.dev@4.4.5 ### Patch Changes - Add `--no-browser` flag to `init` and `login` to skip auto-opening the browser during authentication. Also error loudly when `init` is run without `--yes` under non-TTY stdin (previously default-and-exited silently, leaving the project half-initialized). Both commands now show an `Examples` section in `--help`. ([#3483](#3483)) - Updated dependencies: - `@trigger.dev/core@4.4.5` - `@trigger.dev/build@4.4.5` - `@trigger.dev/schema-to-json@4.4.5` ## @trigger.dev/core@4.4.5 ### Patch Changes - Add `isReplay` boolean to the run context (`ctx.run.isReplay`), derived from the existing `replayedFromTaskRunFriendlyId` database field. Defaults to `false` for backwards compatibility. ([#3454](#3454)) - Redact the `resolveWaitpoint` runtime log so it only emits `id` and `type` instead of the full completed waitpoint. Previously the log printed the entire waitpoint (including `output`) to stdout in production runs, which could leak sensitive payloads. The value returned by `wait.forToken()` is unchanged. ([#3490](#3490)) - Add `SessionId` friendly ID generator and schemas for the new durable Session primitive. Exported from `@trigger.dev/core/v3/isomorphic` alongside `RunId`, `BatchId`, etc. Ships the `CreateSessionStreamWaitpoint` request/response schemas alongside the main Session CRUD. ([#3417](#3417)) - Truncate large error stacks and messages to prevent OOM crashes. Stack traces are capped at 50 frames (keeping top 5 + bottom 45 with an omission notice), individual stack lines at 1024 chars, and error messages at 1000 chars. Applied in parseError, sanitizeError, and OTel span recording. ([#3405](#3405)) ## @trigger.dev/python@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` - `@trigger.dev/build@4.4.5` - `@trigger.dev/sdk@4.4.5` ## @trigger.dev/react-hooks@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` ## @trigger.dev/redis-worker@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` ## @trigger.dev/rsc@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` ## @trigger.dev/schema-to-json@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` ## @trigger.dev/sdk@4.4.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.5` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread .server-changes/fix-realtime-fetch-signal-leak.md

devin-ai-integration Bot reviewed Apr 24, 2026

View reviewed changes

ericallam force-pushed the fix/realtime-fetch-abort-signal branch from aa2ae56 to 586315b Compare April 24, 2026 14:43

myftija approved these changes Apr 24, 2026

View reviewed changes

ericallam merged commit 5693b62 into main Apr 24, 2026
69 of 76 checks passed

ericallam deleted the fix/realtime-fetch-abort-signal branch April 24, 2026 15:00

github-actions Bot mentioned this pull request Apr 24, 2026

chore: release v4.4.5 #3406

Merged

github-actions Bot mentioned this pull request May 1, 2026

chore: release v4.4.6 #3501

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(webapp): propagate abort signal through realtime proxy fetch#3442

fix(webapp): propagate abort signal through realtime proxy fetch#3442
ericallam merged 1 commit intomainfrom
fix/realtime-fetch-abort-signal

ericallam commented Apr 24, 2026

Uh oh!

changeset-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ericallam commented Apr 24, 2026

Summary

Root cause

What changed

Verification

Risk

Test plan

Uh oh!

changeset-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading