Consul server monitoring dashboard
This page provides reference information about the Grafana dashboard configuration included in the hashicorp/consul
GitHub repository.
Grafana queries overview
This dashboard provides the following information about service mesh operations.
Raft commit time
Description: This metric measures the time it takes to commit Raft log entries. Stable values are expected for a healthy cluster. High values can indicate issues with resources such as memory, CPU, or disk space.
consul_raft_commitTime
Raft commits per 5 minutes
Description: This metric tracks the rate of Raft log commits emitted by the leader, showing how quickly changes are being applied across the cluster.
rate(consul_raft_apply[5m])
Last contacted leader
Description: Measures the duration since the last contact with the Raft leader. Spikes in this metric can indicate network issues or an unavailable leader, which may affect cluster stability.
consul_raft_leader_lastContact != 0
Election events
Description: Tracks Raft state transitions, which indicate leadership elections. Frequent transitions might suggest cluster instability and require investigation.
rate(consul_raft_state_candidate[1m])
rate(consul_raft_state_leader[1m])
Autopilot health
Description: A boolean metric that shows a value of 1 when Autopilot is healthy and 0 when issues are detected. Ensures that the cluster has sufficient resources and an operational leader.
consul_autopilot_healthy
DNS queries per 5 minutes
Description: This metric tracks the rate of DNS queries per node, bucketed into 5 minute intervals. It helps monitor the query load on Consul’s DNS service.
rate(consul_dns_domain_query_count[5m])
DNS domain query time
Description: Measures the time spent handling DNS domain queries. Spikes in this metric may indicate high contention in the catalog or too many concurrent queries.
consul_dns_domain_query
DNS reverse query time
Description: Tracks the time spent processing reverse DNS queries. Spikes in query time may indicate performance bottlenecks or increased workload.
consul_dns_ptr_query
KV applies per 5 minutes
Description: This metric tracks the rate of key-value store applies over 5 minute intervals, indicating the operational load on Consul’s KV store.
rate(consul_kvs_apply_count[5m])
KV apply time
Description: Measures the time taken to apply updates to the key-value store. Spikes in this metric might suggest resource contention or client overload.
consul_kvs_apply
Transaction apply time
Description: Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transaction operations.
consul_txn_apply
ACL resolves per 5 minutes
Description: This metric tracks the rate of ACL token resolutions over 5 minute intervals. It provides insights into the activity related to ACL tokens within the cluster.
rate(consul_acl_ResolveToken_count[5m])
ACL resolve token time
Description: Measures the time taken to resolve ACL tokens into their associated policies.
consul_acl_ResolveToken
ACL updates per 5 minutes
Description: Tracks the rate of ACL updates over 5 minute intervals. This metric helps monitor changes in ACL configurations over time.
rate(consul_acl_apply_count[5m])
ACL apply time
Description: Measures the time spent applying ACL changes. Spikes in apply time might suggest resource constraints or high operational load.
consul_acl_apply
Catalog operations per 5 minutes
Description: Tracks the rate of register and deregister operations in the Consul catalog, providing insights into the churn of services within the cluster.
rate(consul_catalog_register_count[5m])
rate(consul_catalog_deregister_count[5m])
Catalog operation time
Description: Measures the time taken to complete catalog register or deregister operations. Spikes in this metric may indicate performance issues within the catalog.
consul_catalog_register
consul_catalog_deregister