Grafana
Grafana has become the de facto standard Open Source graphing and visualisation tool for systems monitoring and is used by many of our teams. It is a highly versatile tool, and while it is commonly used with Prometheus, it can also be used to visualise metrics from cloud platform monitoring systems such as Azure Monitor and AWS CloudWatch. Grafana also has a rich plugin system for panel components to help you represent your metrics the way you want.
Over the years, Grafana has added to its core feature set. Though it's primarily used as described above, it's moving to become a more capable 'analytics platform' now also capable of:
- Alerting
- Log analysis and visualisation
- Annotations (record system events like failures or upgrades on the graphs)
- PDF exporting of dashboards as reports
How is it used?
Grafana is purpose built to visualise data through dashboards. These dashboards tend to have two primary use cases, briefly described below. For more information on monitoring and alerting as a subject, please see the Monitoring and Alerting article on the Knowledge Base.
Shared Monitoring Screens
Often, dashboards are purpose built to be viewed on large TVs, usually co-located with the teams supporting the services. This helps provide contextual information on the performance and behaviour of the platforms to the team and provides a richer picture above and beyond what alerting on these metrics would provide. The dashboards in this case should be considerate to their audience showing clear metrics and graphs that are easily processed and understood.
Interactive Dashboards
Often used by the team when reviewing performance or investigating an issue. These dashboards tend to be more detailed and as a result usually require the users to scroll through the data and interact with filters and time periods to suit their needs.
How do you run it?
Grafana is more often than not deployed into the customer's monitoring stack directly. It runs best as a container but can be deployed onto VMs without much issue. There is also a cloud hosted offering that includes a Prometheus endpoint (details here).
Data Sources
Grafana can scrape or ingest data from a number data sources including Prometheus, Cloudwatch, InfluxDB and Azure Monitor (full list in the docs). You can also extend Grafana with community built integrations, or your own bespoke integrations thanks to its plugin architecture. Data is never stored in Grafana as such but simply scraped when the dashboard is refreshed.
Extending Grafana
Grafana can be extended in many ways, there are a number of main plugin types you can use to customize Grafana:
- Panel - A visual panel used in dashboards, this gives you new ways of representing a wide range of data. This could take the form of world maps, heatmaps, scrolling feeds and more. This helps you present the information in the best way possible.
- Datasource - As described earlier, this allows you to extend where you scrape the data from.
- Apps are bundles of panels, data sources, dashboards and new UI pages. Useful when integrating Grafana with other products that require a richer integration.
Check out the official list of plugins for more information.
Performance
Grafana operates on 'live' data, it retrieves the data when it refreshes or reloads the pages. This can be done automatically, useful for those running these on central monitoring dashboards.
When doing so, this will naturally put load onto those back-end data sources. As such, it's worth keeping the following in-mind so you don't saturate those platforms;
- Set the refresh time to something sensible, every 1 minute for instance
- When using datasources like Prometheus, make use of recording rules so that common queries are less expensive to return
- Ideally filter the data in the queries to the back-end, though filtering can be done in Grafana you've already paid the processing and bandwidth to return the query.