Nethserver 8 backup notification

Hi!

Some time ago, I used the script provided by @giacomo in this thread to receive email notifications for module backups.

Put it at /var/lib/nethserver/cluster/events/backup-status-changed/20notify.

#!/bin/bash

# Change the following variables to match your environment
MAIL_FROM="no-reply@nethserver.org"
MAIL_TO="giacomo@nethesis.it"
MAIL_SUBJECT="Backup status changed:"
MAIL_TEMPLATE="The backup status for {BACKUP_NAME} on {MODULE_ID} has changed to {STATUS}. Please check the system for details."

# WARNING - DO NOT EDIT BELOW THIS LINE (unless you know what you're doing)

# Redis command
rdb="redis-cli --raw"

# Read event data from stdin
read -r event_data
if ! echo "$event_data" | jq . >/dev/null 2>&1; then
    echo "Failed to parse JSON input" >&2
    exit 1
fi

# Extract necessary fields from event_data
module_id=$(echo "$event_data" | jq -r '.module_id')
backup_id=$(echo "$event_data" | jq -r '.backup_id')

leader_id=$($rdb hget cluster/environment NODE_ID)
self_id=$NODE_ID

if [[ "$self_id" != "$leader_id" ]]; then
    exit 0 # LEADER ONLY! Do not run this procedure in worker nodes.
fi
backup_name=$($rdb hget "cluster/backup/$backup_id" "name")

errors=$($rdb hget "module/$module_id/backup_status/$backup_id" errors)
if [[ -z "$errors" ]]; then
    echo "INFO: Status unknown, exiting." >&2
    exit 0
fi

if [[ "$errors" == "0" ]]; then
    status="SUCCESS"
else
    status="FAIL"
fi

# Send email
subject="$backup_name ($module_id): $status"
msg="$(echo "$MAIL_TEMPLATE" | sed "s/{BACKUP_NAME}/$backup_name/g; s/{STATUS}/$status/g; s/{MODULE_ID}/$module_id/g")"
echo "$msg" | runagent ns8-sendmail -s "$subject" -f "$MAIL_FROM" "$MAIL_TO"

Then make it executable:

chmod a+x /var/lib/nethserver/cluster/events/backup-status-changed/20notify

Bear in mind: it will send a notification for each instance inside a backup schedule.

This script will send an email like this for each module:

Receiving one email per module was quite annoying for me, so I opted for a “recap mail” solution instead.

  • A single email that includes a summary of the various modules backup:

  • In case of failed modules, it highlights them clearly (this is a forced example):

It’s almost as simple as @giacomo solution but you need to create two files in two different directories:

  1. Create a file in /var/lib/nethserver/cluster/actions/run-backup/90notify (or something similar, with a bigger prefix) containing this script:
#!/bin/bash
set -uo pipefail

read -r event_data || true

if ! echo "$event_data" | jq . >/dev/null 2>&1; then
    exit 0
fi

backup_id=$(echo "$event_data" | jq -r '.id // empt>

if [[ -z "$backup_id" ]]; then
    exit 0
fi
#            Python script name
#                     |
#                     V
/usr/local/bin/ns8-backup-recap "$backup_id" || true
exit 0

Then run:

chmod +x /var/lib/nethserver/cluster/actions/run-backup/90notify

(or whatever file name you have chosen)

This will be executed immediately after 50run_backup and 80upload_cluster_backup, which are already present in the run-backup directory.

  1. Then create a file called ns8-backup-recap in /usr/local/bin/nethserver/containing this script:
#!/usr/bin/env python3

import json
import subprocess
import sys
import time

MAIL_FROM = "MAIL_FROM_PLACEHOLDER"
MAIL_TO = "MAIL_TO_PLACEHOLDER"
MAIL_SUBJECT_PREFIX = "Backup recap"

def run_cmd(cmd, input_text=None, check=True):
    proc = subprocess.run(
        cmd,
        input=input_text,
        text=True,
        capture_output=True
    )
    if check and proc.returncode != 0:
        raise RuntimeError(proc.stderr.strip() or proc.stdout.strip() or f"command failed: {' '.join(cmd)}")
    return proc

def get_backup_data(backup_id):
    raw = run_cmd(["api-cli", "run", "list-backups"]).stdout
    data = json.loads(raw)
    for backup in data.get("backups", []):
        if str(backup.get("id")) == str(backup_id):
            return backup
    return None

def human_size(num):
    units = ["B", "KB", "MB", "GB", "TB", "PB"]
    n = float(num)
    for unit in units:
        if n < 1024 or unit == units[-1]:
            if unit == "B":
                return f"{int(n)} {unit}"
            return f"{n:.2f} {unit}"
        n /= 1024.0

def fmt_ts(ts):
    if not ts:
        return "-"
    return time.strftime("%Y-%m-%d %H:%M:%S %Z", time.localtime(int(ts)))

def summarize_backup(backup):
    instances = backup.get("instances", [])
    rows = []
    failed_modules = []
    total_size = 0
    total_files = 0
    started = []
    ended = []

    for inst in instances:
        module_id = inst.get("module_id", "_")
        status = inst.get("status") or {}
        success = status.get("success") is True
        state = "SUCCESS" if success else "FAIL"

        if not success:
            failed_modules.append(module_id)

        total_size += int(status.get("total_size", 0) or 0)
        total_files += int(status.get("total_file_count", 0) or 0)

        if status.get("start"):
            started.append(int(status["start"]))
        if status.get("end"):
            ended.append(int(status["end"]))

        rows.append({
            "module_id": module_id,
            "state": state,
            "size": int(status.get("total_size", 0) or 0),
            "files": int(status.get("total_file_count", 0) or 0),
            "snapshots": int(status.get("snapshots_count", 0) or 0),
        })

    overall = "FAIL" if failed_modules else "SUCCESS"
    return {
        "overall": overall,
        "rows": sorted(rows, key=lambda x: x["module_id"]),
        "failed_modules": sorted(failed_modules),
        "total_instances": len(instances),
        "total_size": total_size,
        "total_files": total_files,
        "start": min(started) if started else None,
        "end": max(ended) if ended else None,
    }

def build_subject(backup_name, summary):
    if summary["failed_modules"]:
        return f"{MAIL_SUBJECT_PREFIX}: FAIL - {backup_name} - {', '.join(summary['failed_modules'])}"
    return f"{MAIL_SUBJECT_PREFIX}: SUCCESS - {backup_name}"

def build_body(backup, summary):
    name = backup.get("name", "backup")
    backup_id = backup.get("id", "")
    repository = backup.get("repository", "-")
    retention = backup.get("retention", "-")
    schedule = backup.get("schedule", "-")

    def esc(value):
        if value is None:
            return "-"
        return (
            str(value)
            .replace("&", "&amp;")
            .replace("<", "&lt;")
            .replace(">", "&gt;")
            .replace('"', "&quot;")
        )

    overall_color = "#1f7a1f" if summary["overall"] == "SUCCESS" else "#b42318"
    overall_bg = "#eaf7ea" if summary["overall"] == "SUCCESS" else "#fdecec"

    rows_html = []
    for row in summary["rows"]:
        status_color = "#1f7a1f" if row["state"] == "SUCCESS" else "#b42318"
        status_bg = "#eaf7ea" if row["state"] == "SUCCESS" else "#fdecec"

        rows_html.append(f"""
            <tr>
              <td style="padding:10px 12px;border-bottom:1px solid #e5e7eb;">{esc(row['module_id'])}</td>
              <td style="padding:10px 12px;border-bottom:1px solid #e5e7eb;">
                <span style="display:inline-block;padding:4px 10px;border-radius:999px;font-weight:600;color:{status_color};background:{status_bg};">
                  {esc(row['state'])}
                </span>
              </td>
              <td style="padding:10px 12px;border-bottom:1px solid #e5e7eb;text-align:right;">{esc(human_size(row['size']))}</td>
              <td style="padding:10px 12px;border-bottom:1px solid #e5e7eb;text-align:right;">{esc(row['files'])}</td>
              <td style="padding:10px 12px;border-bottom:1px solid #e5e7eb;text-align:right;">{esc(row['snapshots'])}</td>
            </tr>
        """)

    if summary["failed_modules"]:
        failed_html = "".join(
            f"<li style='margin:4px 0;'>{esc(mod)}</li>"
            for mod in summary["failed_modules"]
        )
        failed_block = f"""
            <div style="margin-top:24px;padding:12px 14px;border-radius:8px;background:#fdecec;color:#b42318;">
              <h3 style="margin:0 0 8px 0;font-size:16px;color:#111827;">Modules backup jobs failed:</h3>
              <ul style="margin:0;padding-left:20px;color:#374151;">
                {failed_html}
              </ul>
            </div>
        """
    else:
        failed_block = """
            <div style="margin-top:24px;padding:12px 14px;border-radius:8px;background:#eaf7ea;color:#1f7a1f;font-weight:600;">
              All modules backup jobs were successfully completed.
            </div>
        """

    return f"""<!DOCTYPE html>
<html lang="it">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Backup recap</title>
</head>
<body style="margin:0;padding:24px;background:#f3f4f6;font-family:Trebuchet MS,Segoe UI,sans-serif;color:#111827;">
  <div style="max-width:860px;margin:0 auto;background:#ffffff;border:1px solid #e5e7eb;border-radius:12px;overflow:hidden;">
    <div style="padding:24px 28px;background:#161616;color:#ffffff;">
      <h1 style="margin:0;font-size:24px;line-height:1.2;">Backup status recap for job: {esc(name)}</h1>
      <p style="margin:8px 0 0 0;font-size:14px;color:#d1d5db;">
        Final status recap of backup job: {esc(name)}
      </p>
    </div>

    <div style="padding:24px 28px;">
      <div style="margin-bottom:20px;">
        <span style="display:inline-block;padding:6px 12px;border-radius:999px;font-size:14px;font-weight:700;color:{overall_color};background:{overall_bg};">
          Overall status: {esc(summary['overall'])}
        </span>
		{failed_block}
      </div>

      <table style="width:100%;border-collapse:collapse;margin-bottom:24px;">
        <tr>
          <td style="padding:2px 0;color:#6b7280;width:180px;">Backup name</td>
          <td style="padding:2px 0;font-weight:600;">{esc(name)}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Backup ID</td>
          <td style="padding:2px 0;">{esc(backup_id)}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Repository</td>
          <td style="padding:2px 0;">{esc(repository)}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Schedule</td>
          <td style="padding:2px 0;">{esc(schedule)}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Retention</td>
          <td style="padding:2px 0;">{esc(retention)}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Start time</td>
          <td style="padding:2px 0;">{esc(fmt_ts(summary['start']))}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">End time</td>
          <td style="padding:2px 0;">{esc(fmt_ts(summary['end']))}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Istances</td>
          <td style="padding:2px 0;">{esc(summary['total_instances'])}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Total size</td>
          <td style="padding:2px 0;">{esc(human_size(summary['total_size']))}</td>
        </tr>
        <tr>
          <td style="padding:2px 0;color:#6b7280;">Total files</td>
          <td style="padding:2px 0;">{esc(summary['total_files'])}</td>
        </tr>
      </table>

      <h2 style="margin:0 0 12px 0;font-size:18px;color:#111827;">Modules details</h2>

      <table style="width:100%;border-collapse:collapse;border:1px solid #e5e7eb;border-radius:8px;overflow:hidden;">
        <thead>
          <tr style="background:#f9fafb;">
            <th style="text-align:left;padding:12px;border-bottom:1px solid #e5e7eb;">Module</th>
            <th style="text-align:left;padding:12px;border-bottom:1px solid #e5e7eb;">Status</th>
            <th style="text-align:right;padding:12px;border-bottom:1px solid #e5e7eb;">Size</th>
            <th style="text-align:right;padding:12px;border-bottom:1px solid #e5e7eb;">Files</th>
            <th style="text-align:right;padding:12px;border-bottom:1px solid #e5e7eb;">Snapshots</th>
          </tr>
        </thead>
        <tbody>
          {''.join(rows_html)}
        </tbody>
      </table>
    </div>
  </div>
</body>
</html>
"""

def send_mail(subject, body):
    cmd = [
        "runagent", "ns8-sendmail",
        "-s", subject,
        "-f", MAIL_FROM,
        MAIL_TO
    ]
    proc = subprocess.run(cmd, input=body, text=True, capture_output=True)
    if proc.returncode != 0:
        raise RuntimeError(proc.stderr.strip() or proc.stdout.strip() or f"ns8-sendmail failed with exit code {proc.returncode}")

def main():
    if len(sys.argv) < 2:
        sys.exit(0)

    backup_id = sys.argv[1]
    backup = get_backup_data(backup_id)
    if not backup:
        sys.exit(0)

    summary = summarize_backup(backup)
    subject = build_subject(backup.get("name", f"backup-{backup_id}"), summary)
    body = build_body(backup, summary)
    send_mail(subject, body)

if __name__ == "__main__":
    main()

Then run:

chmod +x /usr/local/lib/nethserver/ns8-backup-recap

(Or whatever name you have chosen)
[Be careful if you change the name be shure to edit it in the 90notify script]

This will be everything you need to do.

Operationally, it works correctly, but the only issue is that if the repository is unavailable and the NS8 system encounters an error, the notification isn’t sent. Any ideas on how to integrate it better?

Another concern is that the wrapper file I’ve created might be removed during updates, given the directory where it lives.

P.S. It’s almost fully AI-generated, so it may contain some major flaws.

2 Likes

Thank you for sharing, very nice work!

Let’s see if I understand well: if the backup repository is offline, the list-backups API can’t retrieve the backup status list. Is this behavior are you experiencing?

This is because we do not keep a local cache, maybe you could save something more inside 90notify?

I know that @davidep is working hard for a full refactor of the backup, and maybe this issue could be solved in the next release. :crossed_fingers:

AFAIK this should not happen unless you replace an existing script.

1 Like