Metrics ΒΆ

See how to set up metrics.

Metrics are a way to measure the state of your application from within and something that is built into a microservice architecture from the very beginning. We suggest you start with the basics, that is defining what is fascinating to your team to track in terms of service health and level of service quality.

We have standardized on the OpenMetrics format for metrics. This is a text-based format that is easy to parse and understand. It is also the format used by Prometheus, which is the most popular metrics system.

Your application's metrics are scraped (pulled) from the /metrics endpoint and stored in Mimir. You query and visualize metrics in Grafana. Enable metrics collection in your Nais manifest.

mermaid
graph LR
  Mimir --GET /metrics--> Pod
  nais.yaml -.register target.-> Mimir
  nais.yaml -.configure.-> Pod

All applications that have Prometheus scraping enabled will show up in the default Grafana dashboard, or create their own.

Metric naming ΒΆ

For metric names we use the Internet standard Prometheus naming conventions:

  • Metric names should have a (single-word) application prefix relevant to the domain the metric belongs to.
  • Metric names should be nouns in snake_case; do not use verbs.
  • Metric names should have units to make interpreting your metrics queries straightforward.
  • Metric names should represent the same logical thing-being-measured across different labels (e.g. the number of HTTP requests, not the number of GET requests, the number of POST requests, etc.)

Label naming ΒΆ

Use labels to differentiate the characteristics of the thing that is being measured:

  • api_http_requests_total - differentiate request types by adding an operation label: operation="create|update|delete"
  • api_request_duration_seconds - differentiate request stages by adding a stage label: stage="extract|transform|load"

Do not put the label names in the metric name, as this introduces redundancy and will cause confusion if the respective labels are aggregated away.

Warning

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Metric types ΒΆ

You can introduce the metric types with the classic example of counting an ongoing process:

Plaintext

You should, as a developer, that build metrics into your application have solid grasp of the semantics of the different metric types, which include:

  • Counter: Sum of things, forever growing. Example; number of requests to this service, etc.
  • Gauge: Value is arbitrary and can go up and down. Example: Current number of active connections.
  • Summary: Calculate arbitrary buckets of aggregated textual observations. Example: Response time of 99% of requests or larger buckets etc.
  • Histogram: Like Summaries, Histograms can be used to monitor latencies (or other things like request sizes). Unlike Summaries, Histograms have more features, if you want to learn more you can read the difference between histograms and summaries.

Cluster metrics ΒΆ

Nais clusters comes with a set of metrics that are available for all applications. Many of these relates to Kubernetes and includes metrics like CPU and memory usage, number of pods, etc. You can find a comprehensive list in the kube-state-metrics documentation.

Our ingress controller also exposes metrics about the number of requests, response times, etc. You can find a comprehensive list in our ingress documentation.

Debugging metrics ΒΆ

If you're having trouble with your metrics, use the Explore view in Grafana to test your PromQL queries. Pick the Mimir data source that matches your environment.

If your metrics are not showing up, you can check whether your application is being scraped by querying the up metric for your application in Explore:

promql

1 means scraping works. 0 or no result means there's a problem with the scrape configuration.