Helicone Guide: Open-Source LLM Observability and Monitoring

Introduction

The rapid shift toward production-grade AI applications has exposed a significant gap in the developer stack: the lack of visibility into Large Language Model (LLM) interactions. Developers often struggle with unpredictable costs, varying latencies, and the “black box” nature of API calls to providers like OpenAI and Anthropic. Helicone emerges as a leading open-source solution to these challenges, providing a high-performance observability layer that requires almost zero configuration. With thousands of stars on GitHub and a growing community of AI engineers, Helicone has become a staple for teams looking to move from prototype to production with confidence. In this guide, we will explore how Helicone provides the critical monitoring infrastructure needed to scale AI products effectively.

What Is Helicone?

Helicone is an open-source LLM observability platform that functions as a smart proxy for your AI traffic. By sitting between your application and your LLM provider, Helicone automatically logs requests, tracks token usage, and monitors response times without requiring complex SDK integrations. It is primarily written in TypeScript and Go, ensuring high throughput and minimal overhead. The project is licensed under Apache-2.0, making it an attractive choice for enterprise teams that require data sovereignty and the ability to self-host their monitoring infrastructure. Unlike traditional logging tools, Helicone is purpose-built for the unique requirements of generative AI, such as tracking prompt templates and calculating costs across multiple models.

The platform is designed with a “developer-first” philosophy, allowing teams to integrate it by simply changing a single line of code—usually the API base URL. This proxy-based approach ensures that your application remains decoupled from the observability logic, while still capturing rich metadata for every single inference call. Whether you are using OpenAI, Anthropic, or Azure OpenAI, Helicone provides a unified dashboard to visualize your entire AI operation.

Why Helicone Matters

In the current AI landscape, observability is not just a luxury; it is a requirement for operational stability. When an LLM call fails or returns unexpected results, developers need to know exactly what was sent to the model and what the response looked like. Helicone provides this granular visibility, enabling faster debugging and iterative improvement of prompt engineering. Without such a tool, teams often resort to manual logging or building internal dashboards that are difficult to maintain and scale.

Beyond debugging, Helicone addresses the critical issue of cost management. AI inference costs can spiral out of control if not monitored in real-time. Helicone breaks down costs by user, by model, and by specific features of your application using custom properties. This allows product managers to understand the unit economics of their AI features and identify opportunities for optimization. Furthermore, Helicone’s edge-caching capabilities can significantly reduce both costs and latency by serving repetitive requests from a global cache instead of hitting the LLM provider every time.

Key Features

One-Line Integration: Helicone allows developers to start monitoring by simply changing their API base URL to a Helicone proxy address, requiring zero changes to business logic.
Real-Time Cost Tracking: The platform automatically calculates the cost of every request based on token usage and current model pricing, providing a clear view of your AI spend.
Global Edge Caching: Reduce latency and save money by enabling Helicone’s cache, which stores LLM responses at the edge and serves them for identical subsequent requests.
Custom Properties: Developers can attach custom metadata (like user IDs or session IDs) to requests, allowing for deep filtering and segmenting of performance data in the dashboard.
Prompt Management: Helicone provides tools to version and manage your prompts, making it easy to test new iterations and roll back changes if performance drops.
Request Retries: Automatically handle transient API failures with built-in retry logic, ensuring higher reliability for your end-users without writing custom boilerplate.
User Metrics: Track usage patterns on a per-user basis to identify power users, detect abuse, and understand how different segments interact with your AI features.
Feedback Loops: Integrate user feedback (like thumbs up/down) directly into your logs to create a dataset for fine-tuning and quality evaluation.

How Helicone Compares

When evaluating LLM observability tools, developers often compare Helicone with established players like LangSmith and PromptLayer. The primary differentiator for Helicone is its architectural simplicity and open-source nature. While LangSmith is deeply integrated into the LangChain ecosystem, Helicone remains framework-agnostic, making it easier to use for teams not using LangChain. The following table highlights the core differences between these popular solutions.

Feature	Helicone	LangSmith	PromptLayer
Open Source	Yes (Apache 2.0)	No	No
Integration Method	Proxy or SDK	SDK Only	Proxy or SDK
Self-Hosting	Yes (Docker)	No (Enterprise only)	No
Edge Caching	Yes	Limited	Yes

Helicone stands out by offering a self-hostable option, which is a critical requirement for companies handling sensitive data or those operating in regulated industries. While LangSmith offers deeper tracing for complex agent chains, Helicone focuses on providing a clean, high-performance gateway that captures 99% of what most developers need with significantly less setup effort. PromptLayer is another strong competitor, but its closed-source nature can be a dealbreaker for teams that want full control over their observability stack.

Getting Started: Installation

There are two primary ways to get started with Helicone: using their hosted cloud version for rapid setup, or self-hosting the platform using Docker. Below are the installation steps for the most common environments.

Method 1: Proxy Integration (Easiest)

This method works by routing your API requests through Helicone’s gateway. You don’t need to install any new libraries; simply change the base URL in your existing LLM client.

# For OpenAI Python SDK
client = OpenAI(
    api_key="your_openai_key",
    base_url="https://oai.hconeai.com/v1",
    default_headers={
        "Helicone-Auth": "Bearer YOUR_HELICONE_API_KEY"
    }
)

Method 2: Self-Hosting with Docker

If you prefer to keep your data within your own infrastructure, you can run Helicone locally using Docker Compose. This requires Docker and Docker Compose to be installed on your machine.

git clone https://github.com/Helicone/helicone.git
cd helicone
docker-compose up

Once the containers are running, you can access the Helicone dashboard at localhost:3000 and point your application traffic to your local proxy instance.

How to Use Helicone

Once you have configured the proxy or SDK, using Helicone is largely a passive experience. Every request sent to the LLM will now appear in your Helicone dashboard. However, to get the most value out of the platform, you should leverage custom properties and the caching system. Using custom headers, you can tag requests with specific application context.

For example, if you are building a multi-tenant application, you can tag every request with a Tenant-ID. This allows you to filter your dashboard to see how much each tenant is costing you and whether specific tenants are experiencing higher latency than others. This proactive monitoring helps in identifying performance bottlenecks before they affect the entire user base.

Code Examples

The following examples demonstrate how to use advanced Helicone features like caching and custom properties in different programming languages.

Node.js with Custom Properties

import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.hconeai.com/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
    "Helicone-Property-App-Name": "MyAIApp",
    "Helicone-Property-Environment": "production"
  }
});

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Explain quantum physics." }]
});

Python with Caching Enabled

from openai import OpenAI

client = OpenAI(
    base_url="https://oai.hconeai.com/v1",
    default_headers={
        "Helicone-Auth": "Bearer YOUR_API_KEY",
        "Helicone-Cache-Enabled": "true"
    }
)

# This call will be cached globally at the edge
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

Advanced Configuration

Helicone supports several advanced headers that give you fine-grained control over how your data is handled. These include rate limiting and request timeouts. For example, you can set a Helicone-Rate-Limit-Policy to ensure that a single user cannot exhaust your entire OpenAI quota. This is particularly useful for public-facing AI applications where abuse is a constant risk.

Additionally, you can configure data masking for privacy compliance. By using the Helicone-Mask-Request and Helicone-Mask-Response headers, you can tell the proxy to strip out sensitive information before it is logged to the dashboard, ensuring that PII (Personally Identifiable Information) never leaves your environment.

Real-World Use Cases

SaaS Cost Optimization: A customer support platform uses Helicone to track which help articles lead to the highest token usage, allowing them to optimize their knowledge base and reduce inference costs by 30%.
Developer Debugging: A startup building an AI coding assistant uses Helicone’s request logs to reproduce subtle bugs where the LLM produces invalid syntax in specific edge cases.
Compliance and Auditing: An enterprise healthcare company self-hosts Helicone to maintain a full audit trail of all AI interactions, ensuring they meet strict regulatory requirements for data logging.
A/B Testing Prompts: A marketing agency uses Helicone’s prompt management to test two different versions of a product description generator, using the dashboard to compare user feedback scores for each version.

Contributing to Helicone

Helicone is an open-source project that thrives on community contributions. The core team encourages developers to report bugs, suggest features, and submit pull requests. If you are interested in contributing, the best place to start is the CONTRIBUTING.md file in the GitHub repository. The project has a clear roadmap and a set of “good first issues” for those new to the codebase.

The codebase is split between a TypeScript frontend/backend and a Go-based proxy. This provides opportunities for developers with different skill sets to contribute. Whether you are improving the UI of the dashboard or optimizing the performance of the proxy gateway, your contributions help make LLM observability accessible to everyone.

Community and Support

The Helicone community is active across several platforms. For technical support and architectural discussions, the Discord server is the primary hub. You can also find the team on Twitter/X, where they share product updates and tips for LLM optimization. Documentation is hosted on a dedicated site, providing comprehensive guides on everything from basic setup to advanced enterprise configurations. If you encounter a bug, the GitHub Issues page is the official place for tracking and resolution.

Conclusion

Helicone is a powerful, flexible, and essential tool for any developer building in the AI space. Its proxy-based architecture offers a unique combination of ease-of-use and deep visibility, making it suitable for projects ranging from hobbyist experiments to enterprise-grade applications. By providing real-time logging, cost tracking, and edge-caching, Helicone removes the guesswork from LLM operations, allowing teams to focus on building great products.

While there are several observability tools on the market, Helicone’s commitment to open-source and its framework-agnostic approach set it apart. Whether you choose to use the hosted cloud version or self-host on your own infrastructure, Helicone provides the peace of mind that comes with knowing exactly how your AI is performing. We recommend starting with the simple proxy integration today to immediately gain insights into your LLM traffic.

Resources

What is Helicone and what problem does it solve?

Helicone is an open-source LLM observability platform that acts as a proxy between your application and AI providers. It solves the problem of ‘black box’ AI calls by providing detailed logs, cost tracking, and performance monitoring with a single line of code integration.

How do I install Helicone?

The easiest way to install Helicone is to change your LLM client’s base URL to Helicone’s proxy (oai.hconeai.com) and add your API key to the headers. Alternatively, you can self-host the entire platform using Docker Compose for full data control.

How does Helicone compare to LangSmith?

Unlike LangSmith, Helicone is open-source and offers a proxy-based integration that is independent of any specific framework. While LangSmith excels at tracing LangChain applications, Helicone provides a broader, high-performance monitoring layer for any LLM provider or SDK.

Is Helicone open source?

Yes, Helicone is fully open-source and licensed under the Apache-2.0 license. This allows developers to inspect the code, contribute to its development, and self-host the platform on their own servers for privacy and security.

Can I use Helicone with Anthropic?

Yes, Helicone supports multiple providers including OpenAI, Anthropic, and Azure OpenAI. You simply route your Anthropic API calls through the Helicone gateway to start capturing logs and cost data immediately.

Does Helicone track costs automatically?

Yes, Helicone automatically calculates the cost of each request by mapping token usage to the specific model’s pricing. You can view these costs in real-time in the dashboard, categorized by user, model, or custom properties.

Can I use Helicone for prompt management?

Helicone includes built-in prompt management features that allow you to version your prompts and track their performance over time. This makes it easy to experiment with different prompt versions and revert to previous ones if quality decreases.

Does Helicone impact latency?

Helicone is designed for high performance with minimal overhead, typically adding only a few milliseconds to each request. Furthermore, its edge-caching feature can actually decrease latency significantly for repetitive requests by serving them from a local cache.

How to self-host Helicone?

To self-host Helicone, you can use the provided Docker Compose file in the official GitHub repository. This will spin up the database, the dashboard, and the proxy gateway on your local machine or server, keeping all your data in-house.