Best Practices

The recommendations presented in this document are exactly that: They are not required for Honeycomb use, but do make for better Honeycomb use.

Create a Dataset For Each Environment

If you plan to collect data from multiple environments (like prod, dev, staging, and so on), we recommend you create a Dataset for each one and name your Datasets accordingly: prod.queries, dev.queries, and staging.queries, for example. This removes the chance of your mixing prod and dev data without realizing it, and creating bad data as a result.

In general, all events in the same Dataset should be considered equivalent either in their frequency and scope, or in the system layer in which they occur. You should separate events into different Datasets when you cannot establish equivalency between them.

You may, for example, find it useful to capture API and batch-processing events in the same Dataset if they share some request_id field. By contrast, events from two different environments with only one differentiator (like the value of some “environment” column) might appear highly similar and, as a result, be more easily confused. Relying on consistent application of some “environment” filter is risky and can create misleading results.

Here is another example, from one of our lovely customers. They’ve put API and web requests in the same Dataset, because–for them–an API request is really one type of web request that has more fields. Our customer adds the extra API fields (even though the web requests don’t have them) because Honeycomb supports sparse data and provides filters that enable our customer to look at web or API requests, and so on. Our customer does not want to filter out web requests, however, when looking at something like overall traffic.

For this same company, SQL queries reside in a different Dataset because SQL queries are not in any way equivalent to API data: There can be multiple (or no) SQL queries for a single API query, for instance.

Categorize Columns

Large Datasets can quickly come to feel unwieldy. Once you have a Dataset with more than 40 columns or so, use naming conventions to categorize your columns: http.method, http.status, and http.url, for example, or server.hostname, server.buildnumber, and so on.

This practice makes columns easier to find in the Honeycomb UI. It also makes it easier for everyone on your team to have a shared understanding of a Dataset, sooner. If a column is labeled “status code,” for instance, you may know what that means, but the next person may not.

Capturing Events

While we aim to be as flexible as possible, investing a bit more care into event construction will allow you to get the most out of Honeycomb queries:

Name your attributes consistently (and with units!). While we support defining human-readable aliases, your experience will be better when it’s clear from the start what sorts of attributes are contained by a set of events. Similarly, don’t use the same name for two different types of values in different areas of your system.
Use flat payloads of scalar values. More complex JSON types (arrays, objects) will be serialized and stored as a string value.