InfluxDB: a time-series database to go

Facebook X Reddit

InfluxDB is exciting new technology I’m learning at the moment. It’s a specialized database for storing time series. In short time-series is defined by its name and metric name(s). It consists of sequence of {timestamp, metric values} ((Baron Schwartz’s Blog: Time-Series Requirements)) where each record is uniquely defined by timestamp. The most obvious use case right now I see in using them is in custom system monitoring and Internet of Things – storing measurements for future analysis ((List of use cases for it can also be found in InfluxDB documentation here.)). It’s open-sourced and written in Go, my language of choice right now.

Because this is my first encounter with such databases I was eager to learn some background of them first before starting coding my own demo application with InfluxDB. I quickly stumbled upon blog posts from Jim Moiron and Baron Schwartz. Particular from the latter were instantly interesting to me since I’ve already read High Performance MySQL where he’s listed as co-author. ((I’ve used the book while writing my thesis about NewSQL databases.))

I highly recommend reading those blog posts. Reading them helped me understand general concepts I still didn’t master after reading some documentation pages on InfluxDB website.

InfluxDB

Blog posts above convinced me that InfluxDB is a TSDB to go ((I know you should pick the best tool for job but everyone has thing called “favorite thing of choice”)). I’ve installed it with official Docker image. I then followed Getting started from its documentation and was impressed with its query language which feels like SQL. Database can be accessed via CLI or HTTP API.

Key concept from InfluxDB:

Measurement: conceptually like table in RDBMS.
Fields: Columns in measurement. It’s values are not limited to just numbers.
Tags: Columns in measurement which compared to fields are indexed. Querying by them is faster.
Line protocol: format which represent a data point and consists of measurement name, optional tags, at least one field and optional timestamp (if no timestamp is present, current is applied)
Field/tag set: combination of all fields/tags values.
Time-series: consists of measurement and field set.

Difference between tags amd fields: As already mentioned, tags are indexed. When querying by them not all measurement rows are read so queries are more performant. Tags are usually data known before measurement is taken (server name, animal species for which measurement is taken,…), fields are measurement values taken at given time.