CollabLLM: From Passive Responders to Active Collaborators

1Stanford University, 2Microsoft, 3Georgia Tech
ICML 2025 Oral (1.0% of all submissions)

Make Your LLMsActive Collaborators

CollabLLM is a unified fine-tuning framework that optimizes LLMs for effective and efficient multiturn collaboration with users.

$ conda create -n collabllm python=3.10
$ conda activate collabllm
$ pip install collabllm
$
$ git clone https://github.com/Wuyxin/collabllm.git
$ python -m scripts.engine.build_dataset <...>
$ python -m scripts.train.{sft/*_dpo/ppo} <...>

What is missing from current LLMs?

LLMs act as passive responders, especially when faced with ambiguous inputs. They don't naturally help users explore their needs in multiturn interations or offer suggestions for next steps.

Why do LLMs fail to understand users?

Most LLMs are tuned based on single-turn human preferences. These single-turn rewards encourage models to generate response that may NOT be useful in the long term.

How do we build collaborative LLMs?

CollabLLM rewards LLMs responses based on their long-term impact on the conversation. By finetune LLMs using these long-term, interaction-level rewards, they actively seek information and collaborate more effectively with users.

What Users Said About CollabLLM

“Efficient

I was surprised by the first response. I was expecting a quick summary related to my prompt, but instead the AI asked me some questions. I think this style worked well.I felt like I had to do less editing to personalize the review.

“Stimulate Creativity

Asking questions and making you think of things you never thought of.

The AI assistant listened extremely well and offered suggestions that made sense as if it were a real conversation

“Safer

The AI assistant told me why it wouldn't be helpful for this case.

It helped really well to navigate what to say and what information is needed.

Ready to Make Your LLMs Collaborative?

Our code makes it easy for you to get more collaborative LLMs on your own tasks. Don't waste time interacting with LLMs that fail to understand your need, start building collaborative LLMs!

From the Blog

Insights and updates from our research team

Building the Future of Collaborative AI: Our Journey with CollabLLM

June 12, 2025
6 min read
Shirley Wu, Michel Galley

"The future of AI isn't just about making models smarter—it's about making them truly collaborative partners in human endeavors."

The Challenge We Set Out to Solve

When we first started working with large language models, we noticed something puzzling. We saw that these models were incredibly capable. However, we all experienced a particular kind of frustration, illustrated perfectly by this example from Casey Newton:

My most frustrating experience with Operator was my first one: trying to order groceries.

“Help me buy groceries on Instacart,” I said, expecting it to ask me some basic questions: Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want?

It didn’t ask me any of that. Instead, Operator opened Instacart in a browser tab and began searching for milk in grocery stores located in Des Moines, Iowa.

It’s genuinely surprising: one of the smartest LLMs—capable of solving graduate-level math problems— can still fail at basic human communication.

This is not a minor flaw. LLMs that lack effective communication skills pose challenges across key dimensions: performance, safety, and efficiency. Ask yourself:

  • How can we get satisfactory results if LLMs make assumptions about our preferences?
  • How reliable is it to consult AI on healthcare, legal, or financial decisions?
  • How much time and patience are we expected to waste just trying to get our point across?

The problem runs deeper. We typically evaluate LLMs in simple, sanitized test environments—single-turn prompts with clear, unambiguous instructions. But is that how real communication works?

In real life, solving meaningful problems requires collaboration, iteration, and contextual awareness. Moreover, if humans and LLMs are going to tackle groundbreaking problems together, AI systems can't just passively respond to human requests—they need to actively stimulate human creativity and guide the collaborative process.

That’s why we’re introducing CollabLLM: a framework designed to unlock the potential of human-AI collaboration by enabling LLMs to act as active, collaborative partners rather than passive responders.

Our Breakthrough Approach

The core idea behind CollabLLM is simple: in a multi-turn interaction, what matters most is not how good a single response is—but how it affects the rest of the conversation.

Take this scene from Friends (4:05 in the YouTube clip) / (1:42 in the Bilibili clip): Rachel and Joey are talking about dating strategies. Rachel asks a seemingly simple question:"So, where'd you grow up?" Joey immediately mocks her—"That's your move?"—implying the question is naive. But a few turns later, his tone changes. He's genuinely impressed: "Wow!"—because the question led him to open up and connect. The key insight? What matters isn't how a response is judged in the moment, but how it shapes the entire conversation.

Now imagine a model that chooses to ask a clarifying question instead of giving a direct answer. Standard reinforcement learning from human feedback (RLHF) might penalize that—it didn't provide information right away. But if the question helps uncover useful context that improves the conversation downstream, shouldn't it be rewarded?

That's exactly what CollabLLM does. We define a new reward function that measures the causal effect of a model's response on the future trajectory of a conversation. We call this the Multiturn-aware Reward (MR). It evaluates a single model action based on its longer-term impact—not just immediate helpfulness.

Quiz: is asking a question always better than giving an answer? The answer is—not necessarily. It depends entirely on the objective. In most real-world situations, repeatedly asking questions without making progress is inefficient, because the ultimate goal remains unmet. But take the game 20 Questions as an example—where the objective is to guess what someone is thinking by asking a limited number of yes/no questions. In that case, asking questions is essential, and giving an answer too early would break the format and defeat the purpose of the game. This is where Multiturn-aware Reward (MR) comes in: it allows the model to adapt its behavior based on the context, learning when to ask and when to answer—depending entirely on what the task requires.

Now, going back to the Friends example with Rachel and Joey—how do we measure the value of Rachel's question over the course of a conversation? We need two components:
1) A user simulator to generate realistic follow-up responses (e.g., what Joey might say next), and
2) An evaluator to judge whether the interaction is successful—such as whether Joey becomes more romantically engaged.

Fortunately, both parts are quite feasible. First, the model you're training—let's call it "Rachel"—serves as the policy model generating responses. To simulate realistic dialogue, we prompt another model to act as "Joey," a proxy for the user. While inspired by our earlier example, "Joey" can represent any user simulator: a shopper trying to order groceries, a student asking math questions, or a writer seeking feedback. Second, we define task-specific metrics to evaluate success. In the dating example, it might be emotional engagement; in writing, it could be clarity or persuasiveness; in a question-answering task, it might be factual correctness. These evaluation criteria can even be combined—it's entirely up to your application!

With Multiturn-aware Reward in place, the goal becomes straightforward: train the policy model to maximize this reward. In doing so, the model learns to drive the conversation effectively toward the desired outcome—whether that's solving a task, clarifying a request, or building rapport.

After all, you don't need massive changes to build a collaborative model. Just a new way to define the objective—and a longer lens for measuring what matters in a conversation.

Real-World Impact

The applications of collaborative AI are vast and exciting. From working on document editing to solving complex scientific problems, CollabLLM opens up new possibilities for human-AI collaboration.

We've seen remarkable results in our initial testing, with collaborative LLMs outperforming non-collaboratively trained LLMs across various benchmarks. More importantly, users report a more efficient, engaging, and reliable interaction experience when working with the collaborative LLMs.

What's Next?

We're continuously refining our approach, exploring new collaboration patterns. Our goal is to democratize collaborative AI and enable anyone to build more effective AI-powered solutions.

Join us in building the future of collaborative AI. Check out our code, contribute to the project, and help us shape the next generation of AI systems that truly understand the power of working together.