Attention, ce billet se traine depuis plus de 3 mois. Les informations qu'il contient ne sont peut-être plus à jour.
L'alerting avec Prometheus
Rédigé par 4 commentaires
/ /L'alerting, c'est une brique en plus
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scheme: http
timeout: 10s
N'oubliez pas de reload la configuration de Prometheus :
curl -X POST http://localhost:9090/-/reload
Configurer la détection des problèmes
root@server /etc/prometheus/rules # ls
memory up
# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
- '/etc/prometheus/rules/up'
- '/etc/prometheus/rules/memory'
ALERT InstanceDown
IF up == 0
FOR 1m
LABELS { severity = "page" }
ANNOTATIONS {
summary = "Instance {{$labels.instance}} is down",
description = "{{$labels.instance}} of job {{$labels.job}} has been down for more than 1 minutes"
}
ALERT MemoryUsage
IF ((node_memory_MemTotal-node_memory_MemFree-node_memory_Cached)/(node_memory_MemTotal)*100) > 95
FOR 10m
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "Instance {{$labels.instance}} is in danger",
description = "RAM of {{$labels.instance}} has been too used for more than 10 minutes"
}
Configurer l'envoi d'emails d'alerte
global:
smtp_smarthost: 'localhost:25'
smtp_from: 'alertmanager@mon.email'
route:
receiver: 'team-mails'
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receivers:
- name: 'team-mails'
email_configs:
- to: 'ladestination@email.mail'