Stop Stuffing Prompts: How Asana Made Agents More Effective Through Context Engineering

The headshot of Megha Bindiganavale Megha Bindiganavale
September 17th, 2025
4 min read
facebookx-twitterlinkedin
Stop Stuffing Prompts: How Asana Made Agents More Effective Through Context Engineering

Context engineering has become a hot term in the AI world, but Asana has been implementing it to improve our products for nearly a year. Context engineering is at the heart of Asana AI Studio and our forthcoming agents products. Here’s what we've learned about building effective AI systems at scale.

Our system relies not on retrieving more data to fill the context window, but on using the context window more efficiently by understanding the user’s intent. We've learned that we get better results for our customers when we load context more intelligently by understanding the user's query and only loading the smallest set of relevant data possible.

The Context Crisis

Large Language Models (LLMs) face a fundamental problem: more data doesn’t always mean better answers. Bigger context windows create the temptation to load more information, but models don’t attend equally to everything. The result is slower, less accurate responses. By stripping away irrelevant context, we help models focus on the signals that matter most.

In Asana, users ask complex questions through AI Studio and AI chat that span the breadth of their work. For example, "Leave a comment describing the potential dependencies for the incoming task" or "What are Q3 roadmap risks?" These require semantic similarity plus business context, time relevance, and intent. Which raised a critical question for us: how do you keep context quality high as data volume grows? 

Where “retrieve and stuff” fails

Many RAG (Retrieval-Augmented Generation) systems follow the same recipe: embed the query, grab similar docs, stuff them into the prompt, and hope the model sorts it out. It’s become an industry standard but at scale, this approach falls apart. Important details get buried, performance drags, and costs balloon.

The first challenge lies in the limitations of similarity search itself. Vector similarity captures semantic meaning but often misses user intent entirely. A completed task from last year might rank higher than an urgent current task simply because it has better vector similarity, despite being irrelevant to what the user actually needs to know.

Context window challenges compound these problems. While larger context windows create the temptation to include more information, more context often means more noise. LLMs struggle with what researchers call the "lost in the middle" phenomenon, where relevant information gets buried within irrelevant details, leading to degraded performance even when the critical information is present.

At Asana scale, filling the context window with irrelevant information degrades performance, increases latency, and costs a pretty penny. Most existing solutions treat context management as a post-processing problem, applying techniques like naive truncation or sliding windows after retrieval.

Intent Augmented Retrieval

Our approach flips the script: use query intent to guide every stage of data loading. We call this intent-augmented retrieval. It works in two phases: filter first, then sort and summarize before sending to the final LLM. At every step, we apply filtering based on our understanding of what the query actually needs to be resolved.

Always filter first

Our most important lesson: filter before you fetch. Instead of retrieving broadly and trimming later, we analyze query intent upfront. An LLM transforms a natural query like “Show me overdue tasks in the marketing project” into structured filters for scope, due dates, and completion status, eliminating irrelevant data before it ever hits the database.

In parallel, we also cut down noise by filtering down to only the specific object fields that the query actually needs, while quantity filtering determines the appropriate data volume, and source strategy filtering routes queries to the most relevant systems from the outset. Query context analysis reveals that "What's overdue?" requires due dates and completion status, while "Who's working on X?" needs assignee and collaboration data. By filtering at the field level, we cut token usage by ~40% while preserving accuracy, and because filters run in parallel, they return in under 180ms at the 95th percentile.

Sort & summarize with intent

Even with dramatically improved retrieval, we still need to fit within token limits while maximizing information density. Our context refinement process employs several techniques to achieve this balance, starting with advanced reranking that goes far beyond similarity scores. Cross-encoders score results by considering both the query and the content, not just vector proximity. Ordering matters: LLMs pay more attention to what comes first, so better-ranked context produces clearer answers. 

We also summarize long attachments with the question in mind. Ask about risks, and we pull risk-related points from documents; ask about timelines, and we surface dates and dependencies. Intent-driven summarization ensures the right details make it in without overwhelming the model.

Production Impact & Results

We tested these strategies in production with AI Chat, measuring results across hundreds of thousands of queries and a robust evaluation set. Each step improved both efficiency and quality.

In the end, we achieved a 35% reduction in total input tokens while improving response times by 24% at the 95th percentile. We also reduced cost per call by 30%, showing that better performance and lower costs can coexist.

  • Cross-encoder reranking: Reduced input tokens by 40% and improved response times by enabling fewer, better-ranked results.

  • Object field filtering: Contributed approximately 20% token savings by choosing only relevant fields.

  • Result quantity filtering: Improved assertion accuracy from 92-94% to 95-96%.

Before and After Impact of Context Engineering Strategies

Bar chart showing an optimization's effect on tokens, response time, and cost. It highlights an efficiency improvement.

Effect of Reranking and Result Quantity Selection

Bar charts demonstrating reranking's effect on tasks used, with a significant reduction for simple queries.

What we’ve learned

Putting user intent at the center has become a cornerstone of our context engineering. Knowing what people want before loading data leads to cascading benefits: fewer tokens, faster responses, and higher answer quality. It’s a sharp contrast to the traditional “stuff everything in” approach.

Of course, intent isn’t perfect, so edge cases still crop up. We handle them with fallbacks and post-processing to preserve quality. And while larger context windows make it tempting to load more data, we’ve found that staying intent-driven is what keeps solutions scalable.

Looking ahead, we’re exploring how GraphRAG’s structured knowledge graphs could complement our dynamic methods. We expect future enterprise AI systems will combine approaches to manage context at scale with intent-augmented retrieval as a key building block.

Author Biography

Megha Bindiganavale is a Software Engineer on the AI Retrieval team, creating RAG-based tooling to ingest Asana and external data so developers can launch AI features quickly and reliably.

Team Shout Outs

The architecture and implementation of the RAG-based tooling at Asana has been a huge team effort, involving everyone on the team: Bradley Portnoy, Aaron Vinh, Charlie McBride, Zhou Lin, Uday Saraf, Tiger He, and Eric Zhao.

Related articles

Engineering

Inside Asana Warsaw: Building Data, Culture, and Customer Impact