Skip to content

fix(core): truncate large error stacks and messages to prevent OOM#3405

Merged
ericallam merged 3 commits intomainfrom
fix/truncate-large-error-stacks
Apr 17, 2026
Merged

fix(core): truncate large error stacks and messages to prevent OOM#3405
ericallam merged 3 commits intomainfrom
fix/truncate-large-error-stacks

Conversation

@ericallam
Copy link
Copy Markdown
Member

@ericallam ericallam commented Apr 17, 2026

Summary

Large error stacks and messages can OOM the worker process when serialized into OTel spans or TaskRunError objects. This was reported when throwing an error with a massive .stack property from a chat agent hook.

This adds frame-based stack truncation (similar to Sentry's approach) plus message length limits, applied consistently across all error serialization paths.

What changed

packages/core/src/v3/errors.ts

  • truncateStack() — parses error.stack into message lines + frame lines, caps at 50 frames (keep top 5 closest to throw + bottom 45 entry points, with "... N frames omitted ..." in between). Individual lines capped at 1024 chars.
  • truncateMessage() — caps error messages at 1000 chars
  • Applied in parseError() and sanitizeError()

packages/core/src/v3/otel/utils.ts

  • sanitizeSpanError() now uses truncateStack and truncateMessage from errors.ts instead of duplicating truncation logic
  • Non-Error values (strings, JSON) capped at 5000 chars

packages/core/src/v3/tracer.ts

  • startActiveSpan catch block now delegates to recordSpanException() instead of calling span.recordException() directly

Limits

What Limit Rationale
Stack frames 50 Matches Sentry's STACKTRACE_FRAME_LIMIT
Top frames kept 5 Closest to throw site
Bottom frames kept 45 Entry points / framework frames
Per-line length 1024 Matches Sentry, prevents regex DoS
Message length 1000 Bounded but generous
Generic string (non-Error) 5000 Fallback for JSON/string errors in spans

Test plan

  • 17 unit tests in packages/core/test/errors.test.ts
  • E2E: threw a 300-frame / 5000-char-message error in the ai-chat reference app, verified truncated stack and message in span via get_span_details
  • Verified the run survived the error (no OOM, continued waiting for next message)

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 17, 2026

🦋 Changeset detected

Latest commit: e4b2707

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@trigger.dev/core Patch
@trigger.dev/build Patch
trigger.dev Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Walkthrough

Adds stack and message truncation to packages/core: new exported helpers truncateStack (limits to 50 frames — top 5 + bottom 45 — inserts an omission marker, and caps individual stack lines at 1024 chars) and truncateMessage (caps messages at 1000 chars). These truncation routines are applied in parseError, sanitizeError, and OpenTelemetry recording (recordSpanException), and TriggerTracer.startActiveSpan now delegates exception recording to recordSpanException. A changeset for @trigger.dev/core and unit tests covering truncation and span-recording behavior were added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely summarizes the main change: truncating large error stacks and messages to prevent OOM, which is the core objective of the changeset.
Description check ✅ Passed The PR description comprehensively covers the changes, including detailed explanations of affected files, implementation approach, specific limits with rationale, and a thorough test plan with E2E verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/truncate-large-error-stacks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

- Bound message lines in truncateStack (not just frames) to prevent
  huge error.message embedded in error.stack from bypassing OOM cap
- Wrap oversized CUSTOM_ERROR.raw in a valid JSON envelope so
  downstream JSON.parse in createErrorTaskError doesn't crash
- Make recordSpanException safe for non-serializable values
  (circular refs, BigInt, symbols, functions, undefined)
- Add tests covering all edge cases
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/core/test/recordSpanException.test.ts (1)

80-87: Assertion doesn't verify ERROR status code.

The test is named "always calls setStatus ERROR" but only asserts the call count — it would still pass if setStatus were called with OK or any other code. Consider asserting the argument to lock in the contract:

Proposed tightening
-  it("always calls setStatus ERROR", () => {
+  it("always calls setStatus ERROR", () => {
     const span = createMockSpan();
     recordSpanException(span, new Error("test"));
     recordSpanException(span, "string");
     recordSpanException(span, { obj: true });

     expect(span.setStatus).toHaveBeenCalledTimes(3);
+    for (const call of (span.setStatus as any).mock.calls) {
+      expect(call[0]).toEqual({ code: SpanStatusCode.ERROR });
+    }
   });

Requires adding SpanStatusCode to the @opentelemetry/api import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/test/recordSpanException.test.ts` around lines 80 - 87, The
test "always calls setStatus ERROR" only checks call count; update it to assert
the actual argument passed to span.setStatus is the ERROR status. Import
SpanStatusCode from "@opentelemetry/api", then replace or extend the expect to
verify span.setStatus was called with an object containing code:
SpanStatusCode.ERROR (e.g. using toHaveBeenCalledWith or toHaveBeenNthCalledWith
for each call) for the calls made by recordSpanException and keep the call count
assertion if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/test/recordSpanException.test.ts`:
- Around line 80-87: The test "always calls setStatus ERROR" only checks call
count; update it to assert the actual argument passed to span.setStatus is the
ERROR status. Import SpanStatusCode from "@opentelemetry/api", then replace or
extend the expect to verify span.setStatus was called with an object containing
code: SpanStatusCode.ERROR (e.g. using toHaveBeenCalledWith or
toHaveBeenNthCalledWith for each call) for the calls made by recordSpanException
and keep the call count assertion if desired.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8386b356-370c-4672-8169-294add119196

📥 Commits

Reviewing files that changed from the base of the PR and between 589b03f and d05f017.

📒 Files selected for processing (4)
  • packages/core/src/v3/errors.ts
  • packages/core/src/v3/otel/utils.ts
  • packages/core/test/errors.test.ts
  • packages/core/test/recordSpanException.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/core/src/v3/otel/utils.ts
  • packages/core/test/errors.test.ts
  • packages/core/src/v3/errors.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: typecheck / typecheck
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/core/test/recordSpanException.test.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks. These are temporary debug instrumentation and must be stripped using agentcrumbs strip before merge.

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Tests should avoid mocks or stubs and use the helpers from @internal/testcontainers when Redis or Postgres are needed
Use vitest for running unit tests

Files:

  • packages/core/test/recordSpanException.test.ts
packages/core/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (packages/core/CLAUDE.md)

Never import the root package (@trigger.dev/core). Always use subpath imports such as @trigger.dev/core/v3, @trigger.dev/core/v3/utils, @trigger.dev/core/logger, or @trigger.dev/core/schemas

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx}: Use vitest exclusively for testing. Never mock anything — use testcontainers instead.
Place test files next to source files with naming pattern MyService.ts -> MyService.test.ts.
For Redis/PostgreSQL tests in vitest, use testcontainers helpers: redisTest, postgresTest, or containerTest imported from @internal/testcontainers.

Files:

  • packages/core/test/recordSpanException.test.ts
**/*.ts{,x}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • packages/core/test/recordSpanException.test.ts
🧠 Learnings (11)
📚 Learning: 2026-04-15T15:39:06.868Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-15T15:39:06.868Z
Learning: Applies to **/*.test.{ts,tsx} : Use vitest exclusively for testing. Never mock anything — use testcontainers instead.

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-04-07T14:12:18.946Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3331
File: apps/webapp/test/engine/batchPayloads.test.ts:5-24
Timestamp: 2026-04-07T14:12:18.946Z
Learning: In `apps/webapp/test/engine/batchPayloads.test.ts`, using `vi.mock` for `~/v3/objectStore.server` (stubbing `hasObjectStoreClient` and `uploadPacketToObjectStore`), `~/env.server` (overriding offload thresholds), and `~/v3/tracer.server` (stubbing `startActiveSpan`) is intentional and acceptable. Simulating controlled transient upload failures (e.g., fail N times then succeed) to verify `p-retry` behavior cannot be reproduced with real services or testcontainers. This file is an explicit exception to the repo's general no-mocks policy.

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2025-11-27T16:26:37.432Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to **/*.{test,spec}.{ts,tsx} : Use vitest for all tests in the Trigger.dev repository

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-01-15T10:48:02.687Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-15T10:48:02.687Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Use vitest for running unit tests

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-04-16T13:45:18.782Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/test/engine/taskIdentifierRegistry.test.ts:3-19
Timestamp: 2026-04-16T13:45:18.782Z
Learning: In `apps/webapp/test/engine/taskIdentifierRegistry.test.ts`, the `vi.mock` calls for `~/services/taskIdentifierCache.server` (stubbing `getTaskIdentifiersFromCache` and `populateTaskIdentifierCache`), `~/models/task.server` (stubbing `getAllTaskIdentifiers`), and `~/db.server` (stubbing `prisma` and `$replica`) are intentional. The suite uses real Postgres via testcontainers for all `TaskIdentifier` DB operations, but isolates the Redis cache layer and legacy query fallback as separate concerns not exercised in this test file. Do not flag these mocks as violations of the no-mocks policy in future reviews.

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-24T10:42:43.111Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3255
File: apps/webapp/app/routes/api.v1.runs.$runId.spans.$spanId.ts:100-100
Timestamp: 2026-03-24T10:42:43.111Z
Learning: In `apps/webapp/app/routes/api.v1.runs.$runId.spans.$spanId.ts` (and related span-handling code in trigger.dev), `span.entity` is a required (non-optional) field on the `SpanDetail` type and is always present. Do not flag `span.entity.type` as a potential null pointer / suggest optional chaining (`span.entity?.type`) in this context.

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/tests/**/*.test.ts : Implement tests for RunEngine in `src/engine/tests/` using testcontainers for Redis and PostgreSQL containerization

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-02T12:43:25.254Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: internal-packages/run-engine/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:25.254Z
Learning: Applies to internal-packages/run-engine/src/engine/systems/**/*.ts : Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-03T13:07:33.177Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3166
File: internal-packages/run-engine/src/batch-queue/tests/index.test.ts:711-713
Timestamp: 2026-03-03T13:07:33.177Z
Learning: In `internal-packages/run-engine/src/batch-queue/tests/index.test.ts`, test assertions for rate limiter stubs can use `toBeGreaterThanOrEqual` rather than exact equality (`toBe`) because the consumer loop may call the rate limiter during empty pops in addition to actual item processing, and this over-calling is acceptable in integration tests.

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • packages/core/test/recordSpanException.test.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • packages/core/test/recordSpanException.test.ts

devin-ai-integration[bot]

This comment was marked as resolved.

truncateMessage(undefined) and truncateStack(undefined) returned empty
strings, breaking the `error.message ?? fallback` nullish coalescing in
createErrorTaskError. INTERNAL_ERROR.message and stackTrace are optional
in the schema, so sanitizeError now preserves undefined for them.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/core/src/v3/errors.ts (1)

157-207: LGTM on the truncation helpers.

Classification and per-line capping are applied before the frame/message split (line 176-179), so a huge error.message embedded in error.stack is correctly bounded — this addresses the prior concern. Math for the frame keep-window is right: top 5 + bottom 45 with omitted = frameLines.length - 50 when frameLines.length > 50.

Two tiny nits you can take or leave:

  • Line 197: "... ${omitted} frames omitted ..." is grammatically off for omitted === 1 (only triggers at exactly 51 frames). Cheap to fix with a ternary.
  • line.slice(0, MAX_STACK_LINE_LENGTH) + "...[truncated]" yields a final length of 1024 + 14 chars. If you want a strict per-line upper bound, subtract the suffix length from the slice. Not important in practice.
✏️ Pluralization tweak
-    `    ... ${omitted} frames omitted ...`,
+    `    ... ${omitted} ${omitted === 1 ? "frame" : "frames"} omitted ...`,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/v3/errors.ts` around lines 157 - 207, In truncateStack,
make two small fixes: (1) pluralize the omitted-frame message in the array
element built as `"... ${omitted} frames omitted ..."`—use a ternary to produce
"frame" when omitted === 1; (2) enforce the strict per-line cap by accounting
for the suffix length when slicing long lines: instead of slicing to
MAX_STACK_LINE_LENGTH, slice to MAX_STACK_LINE_LENGTH - SUFFIX.length before
appending "...[truncated]" (refer to truncateStack, MAX_STACK_LINE_LENGTH and
the truncation suffix string) so each resulting line never exceeds
MAX_STACK_LINE_LENGTH.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/src/v3/errors.ts`:
- Around line 157-207: In truncateStack, make two small fixes: (1) pluralize the
omitted-frame message in the array element built as `"... ${omitted} frames
omitted ..."`—use a ternary to produce "frame" when omitted === 1; (2) enforce
the strict per-line cap by accounting for the suffix length when slicing long
lines: instead of slicing to MAX_STACK_LINE_LENGTH, slice to
MAX_STACK_LINE_LENGTH - SUFFIX.length before appending "...[truncated]" (refer
to truncateStack, MAX_STACK_LINE_LENGTH and the truncation suffix string) so
each resulting line never exceeds MAX_STACK_LINE_LENGTH.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fe5993b0-3b32-455d-91be-3c3829fae155

📥 Commits

Reviewing files that changed from the base of the PR and between d05f017 and e4b2707.

📒 Files selected for processing (2)
  • packages/core/src/v3/errors.ts
  • packages/core/test/errors.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/test/errors.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/core/src/v3/errors.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks. These are temporary debug instrumentation and must be stripped using agentcrumbs strip before merge.

Files:

  • packages/core/src/v3/errors.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/core/src/v3/errors.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/core/src/v3/errors.ts
packages/core/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (packages/core/CLAUDE.md)

Never import the root package (@trigger.dev/core). Always use subpath imports such as @trigger.dev/core/v3, @trigger.dev/core/v3/utils, @trigger.dev/core/logger, or @trigger.dev/core/schemas

Files:

  • packages/core/src/v3/errors.ts
**/*.ts{,x}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • packages/core/src/v3/errors.ts
🧠 Learnings (13)
📚 Learning: 2024-10-12T01:08:24.066Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 1387
File: packages/cli-v3/src/executions/taskRunProcess.ts:408-413
Timestamp: 2024-10-12T01:08:24.066Z
Learning: In the `parseExecuteError` method in `packages/cli-v3/src/executions/taskRunProcess.ts`, using `String(error)` to populate the `message` field works fine and even prepends `error.name`.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-04-07T14:12:59.018Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3331
File: apps/webapp/app/runEngine/concerns/batchPayloads.server.ts:112-136
Timestamp: 2026-04-07T14:12:59.018Z
Learning: In `apps/webapp/app/runEngine/concerns/batchPayloads.server.ts`, the `pRetry` call wrapping `uploadPacketToObjectStore` intentionally retries **all** error types (no `shouldRetry` filter / `AbortError` guards). The maintainer explicitly prefers over-retrying to under-retrying because multiple heterogeneous object store backends are supported and it is impractical to enumerate all permanent error signatures. Do not flag this as an issue in future reviews.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-22T19:32:16.231Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors/route.tsx:82-151
Timestamp: 2026-03-22T19:32:16.231Z
Learning: In `apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts` and `apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors/route.tsx`, the `errorAlertConfig` field on `ProjectAlertChannel` is intentionally `Json?` (nullable). The `ErrorAlertEvaluator.computeMinInterval()` in `apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` uses `ErrorAlertConfig.safeParse(ch.errorAlertConfig)` and falls back to `DEFAULT_INTERVAL_MS = 300_000` when `errorAlertConfig` is null. No UI currently collects this value — it is scaffolding for a future per-channel evaluation interval feature. Do not flag the absence of `errorAlertConfig` in `CreateAlertChannelOptions` or the action handler as a bug; null configs are safe and expected.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-04-16T14:21:09.410Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierCache.server.ts:33-39
Timestamp: 2026-04-16T14:21:09.410Z
Learning: In `apps/webapp/app/services/taskIdentifierCache.server.ts`, the `decode()` function intentionally uses a plain `JSON.parse` cast instead of Zod validation. The Redis cache is exclusively written by the internal `populateTaskIdentifierCache` function via the symmetric `encode()` helper — there is no external input path. Any shape mismatch would be a serialization bug to surface explicitly, not untrusted data to filter out. Do not suggest adding Zod validation to the `decode()` function or the `getTaskIdentifiersFromCache` return path in future reviews.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-26T09:02:11.935Z
Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:11.935Z
Learning: In `triggerdotdev/trigger.dev`, `TaskRun.annotations` are always written atomically in one operation that conforms exactly to the `RunAnnotations` schema (from `trigger.dev/core/v3`). Using `RunAnnotations.safeParse` in `#parseAnnotations` (e.g., in `apps/webapp/app/services/runsReplicationService.server.ts`) is intentional and correct — there is no risk of partial/legacy annotation payloads causing schema mismatches, so suggesting a relaxed passthrough schema for this parsing is unnecessary.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-04-13T21:44:00.032Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:24-67
Timestamp: 2026-04-13T21:44:00.032Z
Learning: In `apps/webapp/app/services/taskIdentifierRegistry.server.ts`, the sequential upsert/updateMany/findMany writes in `syncTaskIdentifiers` are intentionally NOT wrapped in a Prisma transaction. This function runs only during deployment-change events (low-concurrency path), and any partial `isInLatestDeployment` state is acceptable because it self-corrects on the next deployment. Do not flag this as a missing-transaction/atomicity issue in future reviews.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-04-13T21:44:00.644Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:49-58
Timestamp: 2026-04-13T21:44:00.644Z
Learning: In `triggerdotdev/trigger.dev`, a deployment with zero tasks is not a realistic scenario in practice. Do not flag missing handling for empty-task deployments in `apps/webapp/app/services/taskIdentifierRegistry.server.ts` or related registry/sync logic.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-26T10:02:25.354Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3254
File: apps/webapp/app/services/platformNotifications.server.ts:363-385
Timestamp: 2026-03-26T10:02:25.354Z
Learning: In `triggerdotdev/trigger.dev`, the `getNextCliNotification` fallback in `apps/webapp/app/services/platformNotifications.server.ts` intentionally uses `prisma.orgMember.findFirst` (single org) when no `projectRef` is provided. This is acceptable for v1 because the CLI (`dev` and `login` commands) always passes `projectRef` in normal usage, making the fallback a rare edge case. Do not flag the single-org fallback as a multi-org correctness bug in this file.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2025-08-14T10:09:02.528Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: internal-packages/run-engine/src/engine/index.ts:466-467
Timestamp: 2025-08-14T10:09:02.528Z
Learning: In the triggerdotdev/trigger.dev codebase, it's acceptable to pass `string | undefined` types directly to Prisma operations (both create and update). The codebase consistently uses this pattern and the team is comfortable with how Prisma handles undefined values.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-04-16T18:11:21.474Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3402
File: apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts:250-253
Timestamp: 2026-04-16T18:11:21.474Z
Learning: In `apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts`, the truthy guard `if (versionValue)` used when building graph data points in `ErrorGroupPresenter.getOccurrences` is intentional: only positive occurrence counts should be set on a `point` object for a given version/bucket — zero values are deliberately excluded. Do not flag this as a bug or suggest changing it to `!== undefined`.

Applied to files:

  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • packages/core/src/v3/errors.ts
🔇 Additional comments (1)
packages/core/src/v3/errors.ts (1)

303-352: Sanitizer changes look correct.

  • CUSTOM_ERROR: the { truncated: true, preview } envelope keeps raw valid JSON for the downstream JSON.parse(error.raw) in createErrorTaskError — good.
  • INTERNAL_ERROR: the != null guards correctly preserve undefined for optional message/stackTrace, so the error.message ?? \Internal error (${code})`fallback increateErrorTaskErrorstill fires. Note the asymmetry withBUILT_IN_ERROR(lines 309/311), which still coercesundefined""viatruncateMessage/truncateStack; that's fine here because BUILT_IN_ERROR consumers (new Error(message), e.stack = stackTrace`) tolerate empty strings and there is no nullish-coalescing fallback downstream.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread packages/core/test/recordSpanException.test.ts
@ericallam ericallam merged commit 69acdc2 into main Apr 17, 2026
42 checks passed
@ericallam ericallam deleted the fix/truncate-large-error-stacks branch April 17, 2026 12:25
@github-actions github-actions Bot mentioned this pull request Apr 17, 2026
ericallam pushed a commit that referenced this pull request May 1, 2026
## Summary
8 new features, 18 improvements, 11 bug fixes.

## Breaking changes
- Add server-side deprecation gate for deploys from v3 CLI versions
(gated by `DEPRECATE_V3_CLI_DEPLOYS_ENABLED`). v4 CLI deploys are
unaffected.
([#3415](#3415))

## Improvements
- Add `--no-browser` flag to `init` and `login` to skip auto-opening the
browser during authentication. Also error loudly when `init` is run
without `--yes` under non-TTY stdin (previously default-and-exited
silently, leaving the project half-initialized). Both commands now show
an `Examples` section in `--help`.
([#3483](#3483))
- Add `isReplay` boolean to the run context (`ctx.run.isReplay`),
derived from the existing `replayedFromTaskRunFriendlyId` database
field. Defaults to `false` for backwards compatibility.
([#3454](#3454))
- Redact the `resolveWaitpoint` runtime log so it only emits `id` and
`type` instead of the full completed waitpoint. Previously the log
printed the entire waitpoint (including `output`) to stdout in
production runs, which could leak sensitive payloads. The value returned
by `wait.forToken()` is unchanged.
([#3490](#3490))
- Add `SessionId` friendly ID generator and schemas for the new durable
Session primitive. Exported from `@trigger.dev/core/v3/isomorphic`
alongside `RunId`, `BatchId`, etc. Ships the
`CreateSessionStreamWaitpoint` request/response schemas alongside the
main Session CRUD.
([#3417](#3417))
- Truncate large error stacks and messages to prevent OOM crashes. Stack
traces are capped at 50 frames (keeping top 5 + bottom 45 with an
omission notice), individual stack lines at 1024 chars, and error
messages at 1000 chars. Applied in parseError, sanitizeError, and OTel
span recording.
([#3405](#3405))

## Server changes

These changes affect the self-hosted Docker image and Trigger.dev Cloud:

- Add a "Back office" tab to `/admin` and a per-organization detail page
at `/admin/back-office/orgs/:orgId`. The first action available on that
page is editing the org's API rate limit: admins can save a
`tokenBucket` override (refill rate, interval, max tokens) and see a
plain-English preview of the resulting sustained rate and burst
allowance. Writes are audit-logged via the server logger.
([#3434](#3434))
- Optional `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` env var to
apply a default repository policy when the webapp creates new ECR repos
([#3467](#3467))
- Ship the Errors page to all users, with a polish + bug-fix pass:
pinned "No channel" item in the Slack alert channel picker,
viewer-timezone alert timestamps via Slack's `<!date^>` token, Activity
sparkline peak tooltip, centered loading spinner and bug-icon empty
state on the error detail page, ellipsis on the Configure alerts
trigger.
([#3477](#3477))
- Configure the set of machine presets to build boot snapshots for at
deploy time via `COMPUTE_TEMPLATE_MACHINE_PRESETS` (CSV of preset names,
default `small-1x`). Use `COMPUTE_TEMPLATE_MACHINE_PRESETS_REQUIRED`
(CSV, default = full PRESETS list) to scope which preset failures fail a
required-mode deploy. Optional preset failures are logged and don't
block the deploy.
([#3492](#3492))
- Regenerating a RuntimeEnvironment API key no longer invalidates the
previous key immediately. The old key is recorded in a new
`RevokedApiKey` table with a 24 hour grace window, and
`findEnvironmentByApiKey` falls back to it when the submitted key
doesn't match any live environment. The grace window can be ended early
(or extended) by updating `expiresAt` on the row.
([#3420](#3420))
- Add the `Session` primitive — a durable, task-bound, bidirectional I/O
channel that outlives a single run and acts as the run manager for
`chat.agent`. Ships the Postgres `Session` + `SessionRun` tables,
ClickHouse `sessions_v1` + replication service, the `sessions` JWT
scope, and the public CRUD + realtime routes (`/api/v1/sessions`,
`/realtime/v1/sessions/:session/:io`) including `end-and-continue` for
server-orchestrated run handoffs and session-stream waitpoints.
([#3417](#3417))
- Add `KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED` flag (off by default)
that overrides the cluster default and sets `dnsConfig.options.ndots` on
runner pods (defaulting to 2, configurable via
`KUBERNETES_POD_DNS_NDOTS`). Kubernetes defaults pods to `ndots: 5`, so
any name with fewer than 5 dots — including typical external domains
like `api.example.com` — is first walked through every entry in the
cluster search list (`<ns>.svc.cluster.local`, `svc.cluster.local`,
`cluster.local`) before being tried as-is, turning one resolution into
4+ CoreDNS queries (×2 with A+AAAA). Using a lower `ndots` value reduces
DNS query amplification in the `cluster.local` zone.
  
Note: before enabling, make sure no code path relies on search-list
expansion for names with dots ≥ the configured value — those names will
hit their as-is form first and could resolve externally before falling
back to the cluster search path.
([#3441](#3441))
- Vercel integration option to disable auto promotions
([#3376](#3376))
- Make it clear in the admin that feature flags are global and should
rarely be changed.
([#3408](#3408))
- Admin worker groups API: add GET loader and expose more fields on
POST. ([#3390](#3390))
- Add 60s fresh / 60s stale SWR cache to `getEntitlement` in
`platform.v3.server.ts`. Eliminates a synchronous billing-service HTTP
round trip on every trigger. Reuses the existing `platformCache` (LRU
memory + Redis) pattern already used for `limits` and `usage`. Cache key
is `${orgId}`. Errors return a permissive `{ hasAccess: true }` fallback
(existing behavior) and are also cached to prevent thundering-herd on
billing outages.
([#3388](#3388))
- Show a `MicroVM` badge next to the region name on the regions page.
([#3407](#3407))
- Increase default maximum project count per organization from 10 to 25
([#3409](#3409))
- Merge execution snapshot creation into the dequeue taskRun.update
transaction, reducing 2 DB commits to 1 per dequeue operation
([#3395](#3395))
- Add per-worker Node.js heap metrics to the OTel meter —
`nodejs.memory.heap.used`, `nodejs.memory.heap.total`,
`nodejs.memory.heap.limit`, `nodejs.memory.external`,
`nodejs.memory.array_buffers`, `nodejs.memory.rss`. Host-metrics only
publishes RSS, which overstates V8 heap by the external + native
footprint; these give direct heap visibility per cluster worker so
`NODE_MAX_OLD_SPACE_SIZE` can be sized against observed heap peaks
rather than RSS.
([#3437](#3437))
- Tag Prisma spans with `db.datasource: "writer" | "replica"` so
monitors and trace queries can distinguish the writer pool from the
replica pool. Applies to all `prisma:engine:*` spans (including
`prisma:engine:connection` used by the connection-pool monitors) and the
outer `prisma:client:operation` span.
([#3422](#3422))
- Clarify the cross-region intent in the Terraform and AI-prompt helpers
on the Add Private Connection page. Both already default
`supported_regions` to `["us-east-1", "eu-central-1"]`; added an inline
comment / parenthetical so the user understands why both regions are
listed (Trigger.dev runs in both, so the service must be consumable from
either).
([#3465](#3465))
- Add `RUN_ENGINE_READ_REPLICA_SNAPSHOTS_SINCE_ENABLED` flag (default
off) to route the Prisma reads inside `RunEngine.getSnapshotsSince`
through the read-only replica client. Offloads the snapshot polling
queries (fired by every running task runner) from the primary. When
disabled, behavior is unchanged.
([#3423](#3423))
- Stop creating TaskRunTag records and _TaskRunToTaskRunTag join table
entries during task triggering. The denormalized runTags string array on
TaskRun already stores tag names, making the M2M relation redundant
write overhead.
([#3369](#3369))
- Stop writing per-tick state (`lastScheduledTimestamp`,
`nextScheduledTimestamp`, `lastRunTriggeredAt`) on `TaskSchedule` and
`TaskScheduleInstance`. The schedule engine now carries the previous
fire time forward via the worker queue payload, eliminating ~270K
dead-tuple-driven autovacuums per year on these hot tables and the
associated `IO:XactSync` mini-spikes on the writer. Customer-facing
`payload.lastTimestamp` semantics are unchanged.
([#3476](#3476))
- Replace the expensive DISTINCT query for task filter dropdowns with a
dedicated TaskIdentifier registry table backed by Redis. Environments
migrate automatically on their next deploy, with a transparent fallback
to the legacy query for unmigrated environments. Also fixes duplicate
dropdown entries when a task changes trigger source, and adds
active/archived grouping for removed tasks. Moves BackgroundWorkerTask
reads in the trigger hot path to the read replica.
([#3368](#3368))
- Public Access Tokens (PATs) minted before an API key rotation now keep
working during the 24h grace window. `validatePublicJwtKey` falls back
to any non-expired `RevokedApiKey` rows for the signing environment when
the primary signature check against the env's current `apiKey` fails.
The fallback query only runs on the failure path, so the hot success
path is unchanged.
([#3464](#3464))
- Batch items that hit the environment queue size limit now fast-fail
without
retries and without creating pre-failed TaskRuns.
([#3352](#3352))
- Show the cancel button in the runs list for runs in `DEQUEUED` status.
`DEQUEUED` was missing from `NON_FINAL_RUN_STATUSES` so the list hid the
button even though the single run page allowed it.
([#3421](#3421))
- Reduce 5xx feedback loops on hot debounce keys by quantizing
`delayUntil`,
  adding an unlocked fast-path skip, and gracefully handling redlock
contention in `handleDebounce` so the SDK no longer retries into a herd.
([#3453](#3453))
- Fix RSS memory leak in the realtime proxy routes. `/realtime/v1/runs`,
`/realtime/v1/runs/:id`, and `/realtime/v1/batches/:id` called `fetch()`
into Electric with no abort signal, so when a client disconnected mid
long-poll, undici kept the upstream socket open and buffered response
chunks that would never be consumed — retained only in RSS, invisible to
V8 heap tooling. Thread `getRequestAbortSignal()` through
`RealtimeClient.streamRun/streamRuns/streamBatch` to `longPollingFetch`
and cancel the upstream body in the error path. Isolated reproducer
showed ~44 KB retained per leaked request; signal propagation releases
it cleanly.
([#3442](#3442))
- Fix memory leak where every aborted SSE connection pinned the full
request/response graph on Node 20, caused by `AbortSignal.any()` in
`sse.ts` retaining its source signals indefinitely (see
nodejs/node#54614, nodejs/node#55351). Also clear the
`setTimeout(abort)` timer in `entry.server.tsx` so successful HTML
renders don't pin the React tree for 30s per request.
([#3430](#3430))
- Preserve filters on the queues page when submitting modal actions.
([#3471](#3471))
- Fix Redis connection leak in realtime streams and broken abort signal
propagation.
  
**Redis connections**: Non-blocking methods (ingestData, appendPart,
getLastChunkIndex) now share a single Redis connection instead of
creating one per request. streamResponse still uses dedicated
connections (required for XREAD BLOCK) but now tears them down
immediately via disconnect() instead of graceful quit(), with a 15s
inactivity fallback.
  
**Abort signal**: request.signal is broken in Remix/Express due to a
Node.js undici GC bug (nodejs/node#55428) that severs the signal chain
when Remix clones the Request internally. Added getRequestAbortSignal()
wired to Express res.on("close") via httpAsyncStorage, which fires
reliably on client disconnect. All SSE/streaming routes updated to use
it. ([#3399](#3399))
- Prevent dashboard crash (React error #31) when span accessory item
text is not a string. Filters out malformed accessory items in
SpanCodePathAccessory instead of passing objects to React as children.
([#3400](#3400))
- Upgrade Remix packages from 2.1.0 to 2.17.4 to address security
vulnerabilities in React Router
([#3372](#3372))
- Fix Vercel integration settings page (remove redundant section
toggles) and improve the Vercel onboarding flow so the modal closes
after connecting a GitHub repo and the marketplace `next` URL is
preserved across the GitHub app install redirect.
([#3424](#3424))

<details>
<summary>Raw changeset output</summary>

# Releases
## @trigger.dev/build@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

## trigger.dev@4.4.5

### Patch Changes

- Add `--no-browser` flag to `init` and `login` to skip auto-opening the
browser during authentication. Also error loudly when `init` is run
without `--yes` under non-TTY stdin (previously default-and-exited
silently, leaving the project half-initialized). Both commands now show
an `Examples` section in `--help`.
([#3483](#3483))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`
    -   `@trigger.dev/build@4.4.5`
    -   `@trigger.dev/schema-to-json@4.4.5`

## @trigger.dev/core@4.4.5

### Patch Changes

- Add `isReplay` boolean to the run context (`ctx.run.isReplay`),
derived from the existing `replayedFromTaskRunFriendlyId` database
field. Defaults to `false` for backwards compatibility.
([#3454](#3454))
- Redact the `resolveWaitpoint` runtime log so it only emits `id` and
`type` instead of the full completed waitpoint. Previously the log
printed the entire waitpoint (including `output`) to stdout in
production runs, which could leak sensitive payloads. The value returned
by `wait.forToken()` is unchanged.
([#3490](#3490))
- Add `SessionId` friendly ID generator and schemas for the new durable
Session primitive. Exported from `@trigger.dev/core/v3/isomorphic`
alongside `RunId`, `BatchId`, etc. Ships the
`CreateSessionStreamWaitpoint` request/response schemas alongside the
main Session CRUD.
([#3417](#3417))
- Truncate large error stacks and messages to prevent OOM crashes. Stack
traces are capped at 50 frames (keeping top 5 + bottom 45 with an
omission notice), individual stack lines at 1024 chars, and error
messages at 1000 chars. Applied in parseError, sanitizeError, and OTel
span recording.
([#3405](#3405))

## @trigger.dev/python@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`
    -   `@trigger.dev/build@4.4.5`
    -   `@trigger.dev/sdk@4.4.5`

## @trigger.dev/react-hooks@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

## @trigger.dev/redis-worker@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

## @trigger.dev/rsc@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

## @trigger.dev/schema-to-json@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

## @trigger.dev/sdk@4.4.5

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.5`

</details>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants