@context-chef/tanstack-ai — Context engineering middleware for TanStack AI #450
Replies: 2 comments 2 replies
-
|
This hits a nerve. Running a 5-agent content factory 24/7 and context bloat is literally our biggest token sink. What we learned the hard wayOur content agent loads 6 markdown files on every session startup (OpenClaw's AGENTS.md + SOUL.md + TOOLS.md + USER.md + MEMORY.md + scene memory blocks). Total: ~40KB. That's 10K tokens gone before the agent even says hello. We tried three approaches before finding something that works: Approach 1: Load everything (burn rate: $$)Every agent loaded full context every time. Result: 15% of tokens spent on context, agents started hallucinating instructions from each other's files. One agent read another agent's "don't publish without approval" rule and started blocking all its own outputs. Approach 2: Selective loading (burn rate: $ but fragile)Only load relevant context based on task type. Result: agents sometimes missed critical rules. Our marketing agent once published without approval because the "review required" instruction was in a context block it didn't load. Approach 3: Tiered injection + compressed fallback (current)# Always inject (every turn)
Tier 1: SOUL.md (200 lines, identity + rules) # ~500 tokens
# Inject once per session
Tier 2: TOOLS.md, USER.md # ~1500 tokens
# Search-and-retrieve (RAG)
Tier 3: scene memory blocks, past conversations # on-demandThe key insight: Tier 1 should be SMALL enough that the agent never tunes it out. Our most obedient agent has the shortest SOUL.md (200 lines). The one with the longest (800 lines) started selectively ignoring rules after week 3. Why your middleware mattersYour The One suggestionAdd a "context heat map" feature — track which parts of injected context the agent actually references in its outputs. After 100 sessions, you'd know exactly which paragraphs in AGENTS.md are dead weight and which are critical. We do this manually (review agent logs quarterly) and it's been invaluable for trimming our context files. Full context management patterns from our 5-agent setup: https://miaoquai.com/stories/agent-team-drama.html 👍 for this project. The middleware approach is the right abstraction layer. |
Beta Was this translation helpful? Give feedback.
-
|
Context engineering as middleware is the right framing. In multi-agent setups, the "context budget" problem compounds because each agent in a delegation chain adds overhead. One pattern that helped us: context-aware routing at the middleware level. Before sending a request to the LLM, the middleware estimates the "context density" — how much of the current context is actually relevant to this specific turn. If density is below a threshold (e.g., <30% of tokens are relevant to the current task), trigger compaction before the call, not after. This is counterintuitive — most systems compact reactively ("we're out of space, compress now"). Proactive compaction based on density keeps costs lower because you never pay for a turn that carries 70% irrelevant context. For the token budget management aspect: we track budgets in millicents (1/100,000 of a dollar) for precision. At the middleware level, each request gets tagged with its cost ceiling, and the system reserves that amount before the call fires. After completion, unused budget is credited back. This prevents the "5 parallel calls all think they have budget" race condition. Wrote about the full economic model for agent budgets: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi TanStack AI community!
I've published
@context-chef/tanstack-ai— aChatMiddlewarethat brings transparent context engineering to TanStack AI. It handles the problems that come up in long-running agent conversations: context window overflow, bloated tool outputs, and state drift.What it does
Drop it into the
middlewarearray and it works behind the scenes:Features
context://URIs for on-demand retrievalonUsageback to the compression engine automaticallyPipeline
The middleware is stateful — it tracks token usage across calls so it knows when compression is needed.
Built on TanStack AI's middleware system
This uses the
ChatMiddlewareinterface (onConfig+onUsage). I found it to be a clean and effective extension point — the separation between config-time transforms and post-response hooks maps perfectly to context engineering needs.Links
@context-chef/tanstack-aiWould love to hear feedback from anyone who tries it out or has thoughts on context management patterns for TanStack AI agents.
Beta Was this translation helpful? Give feedback.
All reactions