« Prometheus » : différence entre les versions
| Ligne 269 : | Ligne 269 : | ||
On édite le fichier de configuration : | On édite le fichier de configuration : | ||
# vi /etc/alertmanager/alertmanager.yml | # vi /etc/alertmanager/alertmanager.yml | ||
=== Exemple avec " | === Exemple avec "office365" === | ||
{{Méta bandeau | {{Méta bandeau | ||
| niveau = information | | niveau = information | ||
| Ligne 280 : | Ligne 275 : | ||
| texte = fichier "yml" : bien respecter les aligments / toujours utiliser de vrais espaces (pas d'espace avec tabulation) | | texte = fichier "yml" : bien respecter les aligments / toujours utiliser de vrais espaces (pas d'espace avec tabulation) | ||
}} | }} | ||
<font color = | # alertmanager.yml | ||
group_by: [ | global: | ||
receiver: email-me | # The smarthost and SMTP sender used for mail notifications. | ||
smtp_smarthost: '<font color = blue>smtp.office365.com:587</font>' | |||
receivers: | smtp_from: '<font color = blue>mail365valide@exemple.net</font>' | ||
- name: email-me | smtp_auth_username: '<font color = blue>mail365valide@exemple.net</font>' | ||
email_configs: | smtp_auth_password: '<font color = blue>supermotdepasse</font>' | ||
- to: <font color = blue> | route: | ||
# When a new group of alerts is created by an incoming alert, wait at | |||
# least 'group_wait' to send the initial notification. | |||
# This way ensures that you get multiple alerts for the same group that start | |||
# firing shortly after another are batched together on the first | |||
# notification. | |||
group_wait: 10s | |||
# When the first notification was sent, wait 'group_interval' to send a batch | |||
# of new alerts that started firing for that group. | |||
group_interval: 30s | |||
# If an alert has successfully been sent, wait 'repeat_interval' to | |||
# resend them. | |||
repeat_interval: 30m | |||
group_by: ['alertname', 'cluster', 'service'] | |||
# defalt receiver | |||
receiver: email-me | |||
receivers: | |||
- name: email-me | |||
email_configs: | |||
- to: '<font color = blue>monmail@exemple.net</font>' | |||
send_resolved: true | |||
# service alertmanager restart | # service alertmanager restart | ||
Version du 26 avril 2021 à 13:07
Serveur Prometheus
36 % partie serveur Prometheus terminé
LXC Alpine
Installation serveur de base
# apk update && apk upgrade # apk add prometheus # rc-update add prometheus default # service prometheus start
On test en se rendant sur l'adresse http://IP_PROMETHEUS:9090 :
Facile!
(Optionnel) Securisation serveur
On installe un proxy local pour sécuriser le flux :
# apk add nginx # rc-update add nginx default # service nginx start
On crée un .htpasswd pour l'utilsateur "admin" et son mot de passe :
# apk add apache2-utils # htpasswd -c /etc/nginx/.htpasswd admin
New password:
On prépare le cryptage (self signed) :
# apk add openssl # mkdir -p /root/certs/prometheus/ && cd /root/certs/prometheus
# openssl req \
-x509 \
-newkey rsa:4096 \
-nodes \
-keyout prometheus.key \
-out prometheus.crt
On configure le vhost :
- Alpine Linux 3.12
# vi /etc/nginx/conf.d/prometheus.conf
- Alpine Linux 3.13
# vi /etc/nginx/http.d/prometheus.conf
server {
listen 9191 ssl;
ssl_certificate /root/certs/prometheus/prometheus.crt;
ssl_certificate_key /root/certs/prometheus/prometheus.key;
location / {
auth_basic "Prometheus";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:9090/;
}
}
# service nginx restart
On configure prometheus en rajoutant ces trois lignes :
# vi /etc/init.d/prometheus
...
command_args="--config.file=$prometheus_config_file \
--web.listen-address="127.0.0.1:9090" \
--web.external-url=https://127.0.0.1:9191 \
--web.route-prefix="/" \
--storage.tsdb.path=$prometheus_storage_path \
...
# service prometheus restart
* Caching service dependencies ... [ ok ] * Starting prometheus ... [ ok ]
(Optionnel) Rétention de données
Par défaut Prometheus conserve les données 15 jours, il est possible de modifier cela de plusieurs façons :
- --storage.tsdb.retention.size [EXPERIMENTAL] : Limiter la taille maximal conservée (B, KB, MB, GB, TB, PB, EB).
- --storage.tsdb.retention.time : Limiter le temps de rétention maximal.
Si vous indiquez plusieurs valeurs la première limitation atteinte fera autorité.
Dans cette exemple nous allons modifier la rétention pour stocker jusqu'a 5Gb de données :
# vi /etc/init.d/prometheus
On remplace la ligne --storage.tsdb.retention.time=$prometheus_retention_time"
...
--storage.tsdb.path=$prometheus_storage_path \
--storage.tsdb.retention.size="5GB""
command_user="prometheus:prometheus"
...
AlerManager
Source Autre Source Autre Source Encore une
LXC Alpine
Installation de base
# apk add alertmanager # rc-update add alertmanager default # service alertmanager start
Se rendre sur http://IP_SERVEUR:9093 pour vérifier le bon fonctionnement :
Liaison Prometheus/Alertmanager
Si "Prometheus" et "Alertmanager" sont sur le même serveur :
# vi /etc/prometheus/prometheus.yml
On configure prometheus en rajoutant ces trois lignes :
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# service prometheus restart
On peut se rendre sur son serveur "Prometheus" et vérifier le bon fonctionnement :
Sécurisation
On prépare le cryptage :
# mkdir -p /root/certs/alertmanager/ && cd /root/certs/alertmanager
# openssl req \
-x509 \
-newkey rsa:4096 \
-nodes \
-keyout alertmanager.key \
-out alertmanager.crt
On configure le vhost :
- Alpine Linux 3.12
# vi /etc/nginx/conf.d/alertmanager.conf
- Alpine Linux 3.13
# vi /etc/nginx/http.d/alertmanager.conf
server {
listen 9193 ssl;
ssl_certificate /root/certs/alertmanager/alertmanager.crt;
ssl_certificate_key /root/certs/alertmanager/alertmanager.key;
location / {
auth_basic "alertmanager";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:9093/;
}
}
# service nginx restart
On configure AlertManager :
# vi /etc/init.d/alertmanager
On ajoute les lignes suivantes :
...
command_args="--config.file=$alertmanager_config_file \
--storage.path=$alertmanager_storage_path \
--web.listen-address="127.0.0.1:9093" \
--web.external-url=https://127.0.0.1:9193 \
--web.route-prefix="/" \
$alertmanager_args"
command_user="prometheus:prometheus"
...
Liaison sécurisé Prometheus/Alertmanager
Installation sur le même serveur
Auncunes modifications nécessaire.
Alertmanager sécurisé distant
On configure "Prometheus" :
Prometheus:/# vi /etc/prometheus/prometheus.yml
...
# Alertmanager configuration
alerting:
alertmanagers:
- scheme: https
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
- IP_ALERTMANAGER:9193
basic_auth:
username: admin
password: motdepasse
...
Prometheus:/# service prometheus restart
Envoie d'alertes par courriels
On édite le fichier de configuration :
# vi /etc/alertmanager/alertmanager.yml
Exemple avec "office365"
# alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'smtp.office365.com:587'
smtp_from: 'mail365valide@exemple.net'
smtp_auth_username: 'mail365valide@exemple.net'
smtp_auth_password: 'supermotdepasse'
route:
# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 10s
# When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 30s
# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 30m
group_by: ['alertname', 'cluster', 'service']
# defalt receiver
receiver: email-me
receivers:
- name: email-me
email_configs:
- to: 'monmail@exemple.net'
send_resolved: true
# service alertmanager restart
Alertes
Source très complète de qualitay
Configuration de Prometheus
On édite Prometheus pour activer le fichier de règles "rules.yml"
Prometheus:/# vi /etc/prometheus/prometheus.yml
... # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - /etc/prometheus/rules.yml # - "first_rules.yml" # - "second_rules.yml" ...
On recharge pour activer la nouvelle configuration :
Prometheus:/# service alertmanager restart
Exemples d'alertes
Surveillance des exporteur
Prometheus:/# vi /etc/prometheus/rules.yml
- alert: PrometheusTargetMissing
expr: up == 0
for: 0m
labels:
severity: critical
annotations:
summary: Prometheus target missing (instance {{ $labels.instance }})
description: A Prometheus target has disappeared. An exporter might be crashed.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}
Prometheus:/# service prometheus restart
Grafana
Importation Serveur Prometheus Distant (self signed)
Export
ProxMox
Prometheus Node Exporter
ProxMox
ProxMox:~# apt install prometheus-node-exporter
On crée un fichier de configuration avec l'adresse et port d'écoute :
ProxMox:~# echo 'ARGS=--web.listen-address=12.34.56.789:9100' > /etc/prometheus.conf
On active le fichier de configuration en modifiant le script :
ProxMox:~# vi /lib/systemd/system/prometheus-node-exporter.service
EnvironmentFile=/etc/prometheus.conf
ProxMox:~# systemctl daemon-reload ProxMox:~# service prometheus-node-exporter restart
Import Serveur Prometheus
Prometheus:~# vi /etc/prometheus/prometheus.yml
...
- job_name: node
static_configs:
- targets: [12.34.56.789:9100]
Prometheus:~# service prometheus restart
On vérifie que l'exporter est "up" en se rendant sur son serveur Prometheus dans "Status" -> "Targets"
Prometheus PVE Exporter
source en français de qualitay
ProxMox
On crée un groupe et un utilisateur avec les droits d'accès au monitoring :
ProxMox:~# pveum groupadd monitoring -comment 'Monitoring group' ProxMox:~# pveum aclmod / -group monitoring -role PVEAuditor ProxMox:~# pveum useradd pve_exporter@pve ProxMox:~# pveum usermod pve_exporter@pve -group monitoring ProxMox:~# pveum passwd pve_exporter@pve
On installe l'exporter :
ProxMox:~# apt-get install python3-pip ProxMox:~# pip3 install prometheus-pve-exporter
On va ensuite créer un fichier de configuration :
ProxMox:~# mkdir -p /usr/share/pve_exporter/
ProxMox:~# vi /usr/share/pve_exporter/pve_exporter.yml
default:
user: pve_exporter@pve
password: MOTDEPASSE
verify_ssl: false
On créé le fichier pour systemd :
ProxMox:~# vi /etc/systemd/system/pve_exporter.service
[Unit] Description=Proxmox VE Prometheus Exporter After=network.target Wants=network.target [Service] Restart=on-failure WorkingDirectory=/usr/share/pve_exporter ExecStart=/usr/local/bin/pve_exporter /usr/share/pve_exporter/pve_exporter.yml 9221 12.34.56.789 [Install] WantedBy=multi-user.target
ProxMox:~# systemctl daemon-reload ProxMox:~# systemctl enable pve_exporter ProxMox:~# systemctl start pve_exporter
Prometheus
Prometheus:~# vi /etc/prometheus/prometheus.yml
- job_name: 'pve'
static_configs:
- targets:
- 12.34.56.789:9221 # Proxmox VE node with PVE exporter.
- 12.34.45.790:9221 # Proxmox VE node with PVE exporter.
metrics_path: /pve
params:
module: [default]
Prometheus:~# service prometheus restart
On vérifie que l'exporter est "up" en se rendant sur son serveur Prometheus dans "Status" -> "Targets"
Export Nvidia
docker run --name NVexport -p IP_EXPOSE:9445:9445 -d --restart=always -e LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 --volume /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 --privileged mindprince/nvidia_gpu_prometheus_exporter:0.1