About TurnWise

Building the future of LLM conversation evaluation

Our Mission

TurnWise was born from a simple observation: evaluating multi-turn LLM conversations is hard, and existing tools don't cut it. We're on a mission to provide developers and researchers with powerful, flexible tools to measure, analyze, and improve AI agent conversations at every level—from individual reasoning steps to entire conversation flows.

What We Do

TurnWise is a comprehensive platform for evaluating multi-turn LLM conversations. We provide:

  • Hierarchical Evaluation: Evaluate conversations, messages, or individual reasoning steps
  • Custom Metrics: Create your own evaluation metrics with prompts and structured outputs
  • Rolling Summaries: Automatically maintain compressed summaries of long conversations
  • Evaluation Pipelines: Build reusable workflows for consistent evaluation across datasets
  • Comprehensive Analytics: Track performance, costs, and insights at every level

Hey, I'm Julien 👋

So here's the thing. I built TurnWise because I was frustrated. I kept building AI agents and had no good way to actually measure if they were getting better or worse over time. The existing tools? They either didn't handle multi-turn conversations well, or they were way too complicated.

I'm a developer who's been working with ML and AI systems for a while now. After building a bunch of AI products, I realized we needed something better for evaluation. So I built the tool I wish I had from the start.

TurnWise isn't perfect (nothing is), but it's honest, open-source, and built for developers who actually need to ship stuff. If you're building AI agents and want to know if they're actually good, this is for you.

What People Are Saying

"Finally, a tool that actually understands multi-turn conversations. We've been using TurnWise to evaluate our customer support agents and it's been a game-changer. The hierarchical evaluation is exactly what we needed."

SM

Sarah Martinez

ML Engineer @ TechCorp

"The rolling summaries feature saved us so much time. We were hitting token limits constantly, and now we can evaluate really long conversations without issues. Plus, the Python SDK makes integration super easy."

AC

Alex Chen

Founder @ AI Startup

"I love how flexible the metric system is. Being able to create custom evaluation metrics with my own prompts and schemas means I can evaluate exactly what matters for our use case. The UI is clean too, which is rare for dev tools."

JD

Jordan Davis

Research Scientist

"The step-level evaluation is incredible. We can now see exactly where our agents are making mistakes in their reasoning, not just at the message level. This level of granularity is exactly what we needed for debugging."

RK

Ryan Kim

AI Product Manager

Our Values

Developer-First

We build tools that developers actually want to use. Simple APIs, clear documentation, and powerful features without the bloat.

Open & Transparent

TurnWise is open source. We believe in transparency, community contributions, and building in the open.

Quality Focused

We're obsessed with building high-quality evaluation tools that provide real insights, not just metrics for metrics' sake.

Continuous Improvement

The AI landscape evolves rapidly. We're committed to continuously improving TurnWise to meet the changing needs of developers.

Join Us on This Journey

Whether you're evaluating your first LLM conversation or managing thousands, TurnWise is here to help you build better AI applications.