Comparisons

Honeycomb relies on read-time aggregation over a column store on top of SSDs. This allows us to provide blazing-fast results while sharding well horizontally, without constraints on cardinality or index size.

Many tools currently exist for introspecting data from your various systems; some of them share our strengths, but few of them share our philosophies: fast and mostly right is better than 100% correct, data exploration should feel iterative and flexible, and you should be able to ask questions you couldn’t predict. Modern infrastructure is fluid, dynamic and fast-changing, and good engineering teams will need to ask new questions in a swift, nimble manner. Asking new questions should be a support problem, not an engineering problem.

Let’s compare Honeycomb to a few other types of tools in the observability space, to provide context on how to think about Honeycomb. Some of the most important categories are time series, log aggregators, APM and exception trackers, and request tracing. (We’ve also provided some simple examples with real data!)

Time series databases and point metrics

Hosted or otherwise, time series-based monitoring solutions rely on write-time aggregation of predefined metrics: rolling “increment”s and “decrement”s up into a single number per time bucket. Many modern implementations support tags on metrics, but due to constraints inherent in the implementation of pre-aggregated time series, systems must tightly constrain writes and reads to prevent a combinatorial explosion on writes.

Graphite, statsd, DataDog, SignalFX, Prometheus, Ganglia, InfluxDB, OpenTSDB are good examples of this product family.

Time series is a very mature product space. And time series are great for lots of things. They aggregate data at write time, so you can scale writes quickly without having to shard horizontally, if you’re willing to sacrifice some accuracy and flexibility. They automatically display based on time, which makes them convenient for visualization.

But time series storage formats are essentially storing dots in space that represent some lossy time frame, not unique events with native relationships between all the attributes of each event.. The newer generation of time series analytics uses tags to mock out & fake the rich functionality of event-based systems, like searching and correlation … with varying degrees of success. If you didn’t predict what you were going to need to search on, the tags will be useless, leaving you with no information at all about a critical period in time when you are experiencing unknown-unknown events, or new behavior.

In contrast, Honeycomb relies on read-time aggregation and an underlying column store to achieve incredibly fast reads on potentially very wide datasets. This approach provides incredible flexibility and context around each individual data point.

For example: Take a single request to your API, via HTTP POST, from an iOS client. A fairly common set of questions for an observability system might contain:

What is my overall rate of request/sec?
What about from just iOS clients?
And the average latency for overall requests, vs POST requests?

Each separate question has to be anticipated and defined as a metric. A typical statsd-based system might encourage incrementing these few different metrics:

statsd.increment("api.requests")
statsd.increment("api.requests.ios")
statsd.histogram("api.requests.roundtrip_ms", 157)      // expands to Count/Avg/Median/Max/P95
statsd.histogram("api.requests.post.roundtrip_ms", 157)

On the other hand, Honeycomb accepts a single event like below, opting to calculate “overall rate of request/sec” at query time. This gives us the ability to ask any question about rates or latency, rather than just the ones we predicted in the list above.

{"method":"post", "client":"ios", "endpoint":"/foo/", "roundtrip_ms":157}

This approach allows the user to ask new questions, over the same data, without having to go back to the drawing board. By keeping the metadata for a single request together, it becomes trivial to isolate information like “Well, what sorts of requests are resulting in a 0ms roundtrip_ms?”

Honeycomb is also designed around a model that has been proven to handle the largest sites on the Internet, since the bones of it are heavily inspired by our usage of Scuba at Facebook.

Log management tools and aggregators

The mental model one uses for log aggregation events should be much more familiar to Honeycomb users. With traditional unstructured logs, each log line preserves all or most of the attributes and metadata for any given line — but when you’re dealing with unstructured data, you usually incur the additional costs of having to store and parse large blobs of text. (So 2006.)

Loggly, ELK, Papertrail, SumoLogic, Splunk are good examples of this product family.

Modern log aggregators rely on strict schemas and additional indexes to maintain performance as log lines get wider, and adding additional attributes per event can hurt the performance of all queries on that dataset. You’re heavily incentivized to log less information, capture fewer attributes about each event, and index or search on massively fewer details.

Honeycomb was designed to support the extreme flexibility and compute power demanded by real-world use cases, while maximizing ease of use.

For example: Given the question “What’s the average latency over all requests?” and a (simple) structured log line of the form:

{"method":"post", "client":"ios", "endpoint":"/foo/", "roundtrip_ms":157}

A typical log processing tool reads in the whole line, extracts out the roundtrip_ms of 157, and uses that value to calculate the average overall latency. This all occurs at query time, unless the user predefined an index on the roundtrip_ms attribute—an approach that works fine, until a new database_roundtrip_ms attribute is added and queried upon.

Honeycomb, built as a column store from day 1, sidesteps the cost of reading the other attributes and simply goes on its way having read 157. (This optimization is even more pronounced if a single event has tens or hundreds of attributes, as is more and more common in production systems today.)

And while we do support tailers that convert unstructured, freeform log lines into structured data, a cardinal value of ours is that data must be structured before it hits our API, so it can be efficiently stored and searched on after that. String processing is expensive and not getting any better, so we push that work to the client where it can be prepared near the source.

APM (Application Performance Monitoring) tools, exception trackers

Traditional Application Performance Monitoring (APM) tools and exception trackers typically integrate tightly with a particular language or platform, are easy to install, and can be very useful “out of the box”—but trade those benefits off with strict boundaries on how the user can interact with the collected data.

NewRelic, Sentry, AppDynamics, DynaTrace, SmartBear, Yeller, Airbrake are good examples of this product family.

APM tools typically utilize one of the two previous approaches (pre-aggregated metrics or log analysis) to collect and present data in a carefully curated manner, without the ability to customize metrics or visualizations.

Exception trackers, on the other hand, are absolutely critical additions to a developer’s workflow, but they are insanely insufficient for gathering information about non-exceptional cases. Exceptions by definition are only thrown after something bad has happened or after a threshold has been hit; Honeycomb allows for exploration of statistics independent of that trigger.

Honeycomb removes the guard rails and empowers teams to develop their own intuition for their systems. Honeycomb’s connectors provide an easy set of entry points for exploring data, with practically infinite potential for drilling down into fine-grained details within the system. APM tools often make it easy to find the Top 10 of anything; Honeycomb makes it as trivial to find #11 as #100,001.

Distributed systems tracing

Request tracing has been around a long time at places like Google, Facebook, and Twitter, and has lately become more widely available with Lightstep and opentracing.io. Tracing products have a basket of different use cases, such as capturing every element in the stack that is recorded for a particular UUID, or tracing the progress of a unique request ID from end to end.

Dapper, Zipkin, opentracing.io, and Lightstep are good examples of this product family.

This can be useful for debugging unexpected behavior reported by users, or capturing timing information from the perspective of outside the service (not just self-reported timing), and debugging the edges and routes between services. In connection with exception traces and other service-level analytics, it can help you track down subtle bugs.

Distributed tracing is absolutely necessary when you’re running something at Google scale. You can’t possibly capture even a percentage of all events, so you sample based on either a known problem (“trace this UUID”) or a small random sample. We are huge fans of sampling, but the fundamental question of what to trace is one of the biggest problems with this model. Another problem is the lack of visibility in to services, storage layers etc—tracing is great at leading you to the problem, but you largely rely on something else to debug the problem itself.

Tracing is depth-first search; Honeycomb’s approach is breadth-first. You can use either approach to approximate the other—for example, you can capture all requests for a given UUID using Honeycomb, you can generate unique request IDs at the loadbalancer or webserver entry point, append it as a header to the request, and trace its way through your stack by incrementing suffix integers, etc.

In summary

There’s no such thing as an intrinsically best tool—there’s only something that meets your needs. A critical part of technical judgment is understanding your use case, then selecting a basket of utilities that meet your needs, balanced with resisting sprawl or tool proliferation.

Honeycomb is a new type of tool; it doesn’t fit neatly into any of these existing archetypes, though it shares aspects with all of them. We’d like to think it selects some of the best parts of all categories, but we certainly don’t think it’s always the best thing for everyone. :)

If you have any questions about Honeycomb may or may not be good for you, feel free to reach out to support@honeycomb.io, and we’ll gladly help you figure out whether we’re right for each other.