For the last few years, I have built a centralized monitoring system based on Prometheus that gathers various metrics across my whole private fleet of servers.
Since writing Prometheus exporters is rather simple, I have written some of them myself:
- lywsd03mmc-exporter, a Prometheus exporter for the LYWSD03MMC BLE thermometer which monitors my flat’s temperature and air humidity (as well as when to replace the batteries).
- card10-bme680-exporter, which accesses the environmental sensor of the card10 via serial port for measuring air quality.
- tab-exporter which exports the number of Firefox tabs I have open (this needs countfftabs).
- I also forked and improved nano-exporter, a very lightweight and zero-dependency Linux version of node_exporter.
Additionally I use the following pre-made exporters:
- node_exporter for monitoring FreeBSD and some other hosts.
- prometheus-nginxlog-exporter for web server metrics.
- chrony_exporter for monitoring my NTP server.
- gpsd-prometheus-exporter for monitoring the GPS signal on my NTP server.
- smokeping_prober for detecting network outages and other problems.
- mtail for extracting metrics out of Postfix logs and other log files.
As you can see, this is quite a lot of different exporters running on different hosts.
A few months ago I decided to rebuild the centralized metrics server on top of VictoriaMetrics and with proper access control.
Why VictoriaMetrics? I tried it for a bit and it seems to use less RAM and less storage while supporting long term storage nicely. It also has better mechanisms for importing and exporting data than Prometheus.
Setting up VictoriaMetrics
Setting up victoria-metrics
is very easy. I run it like this:
victoria-metrics -enableTCP6 \
-storageDataPath=/srv/victoria-metrics \
-retentionPeriod=99y \
-httpListenAddr=127.0.0.1:8428 \
-selfScrapeInterval=20s \
-promscrape.config /usr/local/etc/prometheus/prometheus.yaml
Note that you need to enable IPv6 manually all the time.
The prometheus.yaml
file is compatible with stock Prometheus.
I then use Grafana to connect to it, using the Prometheus protocol.
Scraping non-public endpoints
I don’t consider most of above metrics to be super private, but they certainly leak metadata (e.g. am I at home or not, how much mail do I get) so I don’t want to publish them on the net accessible to everyone that finds them.
Since Prometheus mainly favors a pull based model, we need to figure out ways to protect the data.
“Obvious” solutions like using mTLS or a maintenance VPN would require reconfiguring many machines and were deemded too much effort.
Essentially, I found three solutions that I will describe in detail:
Hiding metrics behind existing web servers
This is the easiest mechanism, when your host already runs a web server: simply use it as a proxy for the metrics, and filter access by IP address or Basic Auth. Since most webservers have HTTPS today already, you get encryption for free.
A simple nginx configuration to do this would be:
location /metrics {
proxy_http_version 1.1;
proxy_pass http://127.0.0.1:9100/metrics;
access_log off;
allow 127.0.0.1;
allow ...;
deny all;
}
You need to configure the metrics exporter to only listen on localhost.
Reverse SSH tunnelling
This is a quite elegant solution that provides encryption, flexible
configuration, and can be used when the scrape target doesn’t have a
public IP address. OpenSSH provides the -R
flag to do reverse port
forwarding, but most people don’t know it also can be used to run a
reverse SOCKS proxy!
For this, I create a separate Unix user on scrape target and server, and assign it a SSH key. Then, the target runs:
ssh -o ServerAliveInterval=15 -o ExitOnForwardFailure=yes -R8083 server.example.com -NT
You should run this using service supervision so it tries to reconnect on network failures.
On the server side, you restrict access to only open a port
using /etc/ssh/authorized_keys/scrape-user
:
restrict,port-forwarding,permitlisten="8083" ssh-ed25519 ....
Then, the server can use port 8083 as a SOCKS proxy to access the network of the scrape target directly! So you can write a scrape config like:
- job_name: 'nano-exporter-hecate'
proxy_url: 'socks5://127.0.0.1:8083'
static_configs:
- targets: ['127.0.0.1:9100']
labels:
instance: 'hecate.home.vuxu.org:9100'
- targets: ['10.0.0.119:9100']
labels:
instance: 'leto.home.vuxu.org:9100'
Here, we use a host in my home network that is always on, and can also
safely scrape other hosts in the same LAN.
(Note that the IP addresses in targets
are resolved relative to the SSH client.)
Pushing with vmagent
I used the SSH approach for my notebook as well, but there’s the
problem that we lose data when there’s no Internet connection
available. I have thus moved my notebook to a solution using
vmagent
, which is
included with VictoriaMetrics.
vmagent
does scrape metrics just like VictoriaMetrics (and also
supports all other metrics protocols, but I don’t use them), but it
simply forwards everything via the Prometheus remote write protocol,
and locally buffers data if it can’t forward the metrics currently.
On the server side, we need to provide access to the remote write
protocol. Since VictoriaMetrics operates without internal access
control, we can use the
vmauth
gateway to
implement Basic Auth over TLS. (Again, you can use an existing HTTPS
server and proxy it, but in this case I don’t have a HTTPS server on
the metrics host.)
vmauth
needs some configuration. First, we create a self-signed
certificate (Let’s Encrypt support is limited to the commercial
version of VictoriaMetrics unfortunately):
openssl req -x509 -newkey ed25519 \
-keyout /usr/local/etc/vmauth/key.pem \
-out /usr/local/etc/vmauth/cert.pem \
-sha256 -days 3650 -nodes -subj "/CN=server.example.org" \
-addext "subjectAltName = DNS:server.example.org"
I then run it as:
vmauth -enableTCP6 \
-tls \
-tlsCertFile=/usr/local/etc/vmauth/cert.pem \
-tlsKeyFile=/usr/local/etc/vmauth/key.pem \
-reloadAuthKey=secret \
-flagsAuthKey=secret \
-metricsAuthKey=secret \
-pprofAuthKey=secret \
-auth.config=/usr/local/etc/vmauth/vmauth.yaml
(I think it’s unfortunate that we need to add auth-keys now, as internal and forwarded API are exposed on the same port…)
The vmauth.yaml
configures who can access:
users:
- username: "client"
password: "evenmoresecret"
url_prefix: "http://localhost:8428/"
Here, localhost:8428
is the VictoriaMetrics instance.
Finally, on the scrape target we can now run vmagent
:
vmagent -enableTCP6 \
-promscrape.config=/etc/vmagent/promscrape.yml \
-httpListenAddr=127.0.0.1:8429 \
-remoteWrite.url=https://server.example.org:8427/api/v1/write \
-remoteWrite.label=vmagent=myhostname \
-remoteWrite.retryMinInterval=30s \
-remoteWrite.basicAuth.username=client \
-remoteWrite.basicAuth.passwordFile=/etc/vmagent/passwd \
-remoteWrite.tlsCAFile=/etc/vmagent/cert.pem
The cert.pem
is copied from the server, the password is stored in /etc/vmagent/passwd
.
Note that the vmagent
instance is configured locally, so we can again
scrape targets that are only reachable from it. We also can adjust
the scrape targets without having to touch the metrics server itself.
NP: Godspeed You! Black Emperor—Broken Spires At Dead Kapital