Installing & Configuring Node Exporter-Prometheus-Grafana-AlertManager-Blackbox

Installing & Configuring Node Exporter-Prometheus-Grafana-AlertManager-Blackbox

Prometheus is a monitoring solution for storing time series data like metrics.

Grafana allows visualizing the data stored in Prometheus (and other sources).

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Node Exporter

Go to the home folder

cd

Download node Exporter from the below link

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz

extract the downloaded file

tar -xvf node_exporter-1.3.1.linux-amd64.tar.gz

rename the folder

mv node_exporter-1.3.1.linux-amd64 node_exporter

add no shell user so all can run under this user

sudo useradd --system --no-create-home --shell /usr/sbin/nologin prometheus

go to folder

cd node_exporter

move node_exporter binary file to /usr/local/bin/

mv node_exporter /usr/local/bin/.

change ownership and permission of the moved file

sudo chown prometheus:prometheus /usr/local/bin/node_exporter

check the version if you need

node_exporter --version

create service file

sudo vim /etc/systemd/system/node_exporter.service

enter the below text to a file and save it

[Unit]
Description=Node exporter to collect machine metrics

[Service]
Restart=always
User=prometheus
ExecStart=/usr/local/bin/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

If you wish to change the port of the node exporter use the below text in the service File

[Unit]
Description=Node exporter to collect machine metrics

[Service]
Restart=always
User=prometheus
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9500
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

Reload service daemon

sudo systemctl daemon-reload

Start the node_exporter and enable it to run at bootup and check the status

sudo systemctl restart node_exporter.service
sudo systemctl enable node_exporter.service
sudo systemctl status node_exporter.service

Open port 9100

ubuntu 
iptables -A INPUT -p tcp --dport 9100 -j ACCEPT 
CENTOS or RHEL
iptables -I INPUT -p tcp --dport 9100 -j ACCEPT

Node Exporter will be running in port 9100

http://localhost:9100/ or http://<IP ADDRESS>:9100/

You can find the simple installation script on my GitHub page

Prometheus Installation

Download a tar file

wget https://github.com/prometheus/prometheus/releases/download/v2.36.1/prometheus-2.36.1.linux-amd64.tar.gz

extract a file

tar -xvf prometheus-2.36.1.linux-amd64.tar.gz

rename a folder

mv prometheus-2.36.1.linux-amd64 prometheus

move to /etc/ folder

mv prometheus /etc/prometheus

add a directory for Prometheus to store data

mkdir /etc/prometheus/data

change permissions

chown -R prometheus:prometheus /etc/prometheus
chmod -R 755 /etc/prometheus

edit prometheus.yml file

vim /etc/prometheus/prometheus.yml

add below config which is the default

global:
  scrape_interval: 15s 
  evaluation_interval: 15s 
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093'
rule_files:
  - alert.rules.yml
scrape_configs:
  - job_name: "prometheus_master"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: 'Nodes'
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9100"]
  - job_name: 'Site Monitoring'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://www.google.com 
        - https://www.google.co.in ## add your websites here
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
  - job_name: 'Internal Service Monitoring via Tcp '
    scrape_timeout: 15s
    scrape_interval: 15s
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - localhost:9090   ## add your tcp connections here
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115

below is the alert.rules.yml

groups:
 - name: Alert

   rules:
   - alert: InstanceDown
     expr: up == 0
     for: 1m
   - alert: LoadAverage5m High
     expr: node_load5 >= 5.0
     for: 3m
     annotations:
       summary: "Instance {{ $labels.instance }} - high load average"
       description: "{{ $labels.instance  }} (measured by {{ $labels.job }}) has high load average ({{ $value }}) over 5 minutes."
   - alert: MemoryUsage
     expr: 100 - ((node_memory_MemAvailable_bytes * 100) / node_memory_MemTotal_bytes) >= 95
     for: 5m
     annotations:
       summary: "Instance {{ $labels.instance }} - high memory usage"
       description: "{{ $labels.instance  }} (measured by {{ $labels.job }}) has high memory usage ({{ $value }}) over 5 minutes."
   - alert: DiskUsage
     expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) >= 90
     for: 5m
     annotations:
       summary: "Instance {{ $labels.instance }} - high disk usage"
       description: "{{ $labels.instance  }} (measured by {{ $labels.job }}) has high disk usage ({{ $value }}) over 5 minutes."
   - alert: Host-Unusual-Network-Throughput-In
     expr: sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100
     for: 5m
     labels:
       severity: warning
     annotations:
       summary: Host unusual network throughput in (instance {{ $labels.instance }})
       description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
   - alert: Host-Unusual-Network-ThroughputOut
     expr: sum by (instance) (rate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100
     for: 5m
     labels:
       severity: warning
     annotations:
       summary: Host unusual network throughput out (instance {{ $labels.instance }})
       description: "Host network interfaces are probably sending too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

move the Prometheus file to /usr/local/bin/

mv /etc/prometheus/prometheus /usr/local/bin/.

create a systemd service file

sudo vim /etc/systemd/system/prometheus.service

add the following

[Unit]
Description=Monitoring system and time series database
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /etc/prometheus/data  \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries

ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
LimitNOFILE=8192

[Install]
WantedBy=multi-user.target

reload the daemon

sudo systemctl daemon-reload

start, enable and check the status of the service

sudo systemctl restart prometheus.service
sudo systemctl enable prometheus.service
sudo systemctl status prometheus.service

Open port 9090

ubuntu 
iptables -A INPUT -p tcp --dport 9090 -j ACCEPT 
CENTOS or RHEL
iptables -I INPUT -p tcp --dport 9090 -j ACCEPT

Prometheus will be running in port 9090

http://localhost:9090/ or http://<IP ADDRESS>:9090/

Grafana Installation

Run the following command for ubuntu

sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/oss/release/grafana_8.5.5_amd64.deb
sudo dpkg -i grafana_8.5.5_amd64.deb

Run the following for the RPM-based servers

wget https://dl.grafana.com/oss/release/grafana-9.3.1-1.x86_64.rpm
sudo yum install grafana-9.3.1-1.x86_64.rpm

start and enable it

sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server
systemctl start grafana-server
systemctl status grafana-server

Open port 3000

ubuntu 
iptables -A INPUT -p tcp --dport 3000 -j ACCEPT 
CENTOS or RHEL
iptables -I INPUT -p tcp --dport 3000 -j ACCEPT

open in browser

http://localhost:3000/ or http://<IP ADDRESS>:3000/

No go to the dashboard and enter the credentials

default user : admin
default pass  : admin

change the password at first login.

In Grafana add a data source

settings > data source > add Datasource — select Prometheus data source

Assuming Prometheus is on the same server, add the URL as http://localhost:9090/

Click on save and test

Go to dashboard > browse > Import - enter 1860 > import > select data source > save

Enjoy visualizing metrics ….!!!

Setting Up Alert manager

Download Alert-manager

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz

Extract it

tar -xvf alertmanager-0.24.0.linux-amd64.tar.gz

move alertmanager

mv alertmanager-0.24.0.linux-amd64 /opt/alertmanager

change permissions

cd /opt/alertmanager/
chown prometheus:prometheus -R /opt/alertmanager

mv alertmanager binary

cp /opt/alertmanager/alertmanager /usr/local/bin/
cp /opt/alertmanager/amtool /usr/local/bin/

Add systemd service

vim /etc/systemd/system/alertmanager.service

Add below content

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/opt/alertmanager/
ExecStart=/usr/local/bin/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --web.external-url http://0.0.0.0:9093

[Install]
WantedBy=multi-user.target

save and exit

edit alertmanager configs

vim /opt/alertmanager/alertmanager.yml

Add below config

route:
 group_by: [Alertname]
 # If an alert isn't caught by a route, send it slack.
 repeat_interval: 1h
 receiver: slack_general
 routes:

  # Send severity=slack alerts to slack.
  - match:
          #severity: slack
    receiver: slack_general

receivers:
- name: slack_general
  slack_configs:
  - api_url: https://chat.alerting.com/hooks/ksgdfbshdfsjhfw
    channel: '#alerts'
  email_configs:
  - to: 'alert@alerting.com,alerts@alertng.com'
    from: 'alert@alert.com'
    smarthost: smtp.gmail.com.com:587
    auth_username: "alert@alert.com"
    auth_identity: "alert@alert.com"
    auth_password: "password"
    send_resolved: true

run below commands

systemctl daemon-reload
systemctl status prometheus
systemctl restart alertmanager
systemctl status alertmanager

Setting Up BlackBox

Download Latest BlackBox Exporter

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz

Extract it

tar -xvf blackbox_exporter-0.22.0.linux-amd64.tar.gz

Copy BlackBox binary to /usr/local/bin

cp blackbox_exporter-0.22.0.linux-amd64/blackbox_exporter /usr/local/bin/blackbox_exporter

change permissions

chown prometheus:prometheus /usr/local/bin/blackbox_exporter

Remove old files

rm -rf blackbox_exporter-0.22.0.linux-amd64*

make a new directory

mkdir /etc/blackbox_exporter

Add the Config file

vim /etc/blackbox_exporter/blackbox.yml

Add the following

modules:
  http_2xx:
    prober: http
    timeout: 15s
    http: 
      fail_if_not_ssl: true
      fail_if_ssl: false
      method: GET
      no_follow_redirects: false
      preferred_ip_protocol: ip4
      valid_http_versions: 
        - HTTP/1.1
        - HTTP/2.0
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  grpc:
    prober: grpc
    grpc:
      tls: true
      preferred_ip_protocol: "ip4"
  grpc_plain:
    prober: grpc
    grpc:
      tls: false
      service: "service1"
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
      - send: "SSH-2.0-blackbox-ssh-check"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp

Add service file

vim /etc/systemd/system/blackbox_exporter.service

Copy & paste the following

[Unit]
Description=Blackbox Exporter Service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox_exporter/blackbox.yml \
  --web.listen-address=0.0.0.0:9115

Restart=always

[Install]
WantedBy=multi-user.target

Change the permissions of a config file

chown prometheus:prometheus /etc/blackbox_exporter/blackbox.yml

Restart services

systemctl daemon-reload
systemctl start blackbox_exporter
systemctl status blackbox_exporter
systemctl enable blackbox_exporter

You're good to GO........!

Note: You can also add the BlackBox exporter dashboard to grafana dashboard with dashboard ID 7587 with Prometheus data source