Most of the time you are wondering what is happening inside your application and there are enough monitoring apps on the market that one of them should fulfill your needs. However, I decided to go for a DIY approach, as it gives you a better understanding, control and it is almost free - well you still have to pay for the computation. And it can become a new hobby. :)
I've looked around for a project, and I found Prometheus. It's Github repository has been downloaded only less than Kubernetes, which makes it the best candidate.
In this blog post, I'm going to demonstrate how to install Prometheus and how to get the metrics of each server. I will expand on this in the future posts.
To get started, you need a server to run the Prometheus engine on it and Node exporters on the servers you like to monitor.
Installing Prometheus:
# Create a user to run Prometheus with
$ sudo useradd --no-create-home --shell /bin/false prometheus
# Create the directories following Linux Filesystem Hierarchy
$ sudo mkdir /etc/prometheus /var/lib/prometheus
# Setting the Prometheus user as the owner of those Directories
$ sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
# Downloading the source
$ wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
# Check the SHA256 Checksum (it is a good practice)
[$(sha256sum prometheus-2.0.0.linux-amd64.tar.gz | awk '{printf $1}') == "e12917b25b32980daee0e9cf879d9ec197e2893924bd1574604eb0f550034d46"] && echo "yes, they match!" || echo "Noooooo!"
$ tar xvfz prometheus-2.0.0.linux-amd64.tar.gz
$ sudo cp prometheus-2.0.0.linux-amd64/prometheus /usr/local/bin/
$ sudo chown prometheus:prometheus /usr/local/bin/prometheus
$ sudo cp prometheus-2.0.0.linux-amd64/promtool /usr/local/bin/
$ sudo chown prometheus:prometheus /usr/local/bin/promtool
$ sudo cp -r prometheus-2.0.0.linux-amd64/consoles /etc/prometheus
$ sudo chown -R prometheus:prometheus /etc/prometheus/consoles
$ sudo cp -r prometheus-2.0.0.linux-amd64/console_libraries /etc/prometheus
$ sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
$ rm -rf prometheus-2.0.0.linux-amd64.tar.gz prometheus-2.0.0.linux-amd64
$sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
To start Prometheus
# The \ at the end of each line is to tell shell to ignore the next line so the command is actually one line but in this way it reads better
$ sudo -u prometheus /usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries &
# The & at the end make the command as a background process which means you can still continue working on the server without having to press ctrl-c which terminates Prometheus
Now the Prometheus is running with its default config, and you can have a look by typing http://SERVER'S_IP:9090
so let's move to the other component which gathers the system information such as Memory, CPU or network from the machine. The agent is called Node Exporter for which the installation process is the same.
Installing Node Exporter:
$ sudo useradd --no-create-home --shell /bin/false node_exporter
$ cd /home/$(whoami)
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz
# Check the SHA256 Checksum
[$(sha256sum node_exporter-0.15.2.linux-amd64.tar.gz | awk '{printf $1}') == "1ce667467e442d1f7fbfa7de29a8ffc3a7a0c84d24d7c695cc88b29e0752df37"] && echo "yes, they match!" || echo "Noooooo!"
$ tar xvfz node_exporter-0.15.2.linux-amd64.tar.gz
$ sudo cp node_exporter-0.15.1.linux-amd64/node_exporter /usr/local/bin
$ sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
$ sudo -u node_exporter /usr/local/bin/node_exporter &
Now the Node exporter is running on port 9100 but Prometheus is not configured to read from it yet, so back to config:
Prometheus config file
If you remember there is a file /etc/prometheus/prometheus.yml
which is used as config file.
The following is an example for a server that Prometheus and Node Exporter are running on it, as you need information even for the server you are running Prometheus.
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
Now if there is another server that you want to monitor, you'll need to install Node Exporter on it and change the config so Prometheus will retrieve data from it:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100', 'SERVER'S_IP:9100']
As you can see targets are arrays that you can add the new servers as you go along. Just make sure that you restart Prometheus.
If you've followed this step by step there is an issue restarting the Prometheus as we sent it to the background (remember adding that & at the end of the Prometheus command). So we need to first find the process ID of Prometheus and after that send a TERM signal to stop it.
Finding the process ID (PID)
$ ps aux | grep [p]rometheus
In the result you look for the line that shows the command you run, the second column is the PID. Now you need to send a TERM signal to the process.
sudo kill THE_PID
After this, start Prometheus again like how we did before.
There is still a long way to go, but for now, this should keep you busy. Look and get familiar with the code before we move to the next stage.
Happy coding!