Disk space issue NS8 doesn't even start services!

NethServer Version: latest
Module: ALL!

So I installed an NS8 (not my first), for a friend SOHO, using the pre-made image (Rocky), just for mail services, nothing else (yet). Set a single email, connected it to a few mail clients and told them, they can move PROGRESSIVELY older mail (for archiving) to that account.

They did that for a day or so, then they reported me, that it doesn’t seem to sync with the server (their mail clients).
So I went to the web interface only to find it was NOT available!

Then I rebooted NS8 (it is a VM) AND NOW IT DOES NOT BOOT NS8, but only Linux (the console prompt says “Rocky Linux 9.5 (Blue Onyx)” but NOT the cluster admin URLs!!!

HELP! What can I do!? I need to at least access the mail that was already transfered to their server!
What happened? Could it be overwhelmed by some sudden “attack” of mailbox transfers? (their emails are more than 100GB total… they may have tried to move folders NOT progressively after some point)
How could this destroy NS8 itself???

EDIT:
Note that the server’s qcow image has grown to 37GB and lives inside an array that has plenty of more space to grow.
Is it possible it cannot grow more for some reason and is full or something?

EDIT #2:
Even network is not up. I cannot ping the server or the server ping anything else.

I used NS7 for years and NS8 almost since the beginning. I have stable NS8 installs, I never expected a server would crash in a day and NOT boot NS8 at all…

At this point I am looking even at just using this qcow as source image to setup a fresh NS8 and get (somehow) the mailbox from the “old” one (where “old” means a few days ago…). I will try anything you can propose! Just help please…

The received mails should be located at:

/home/mail1/.local/share/containers/storage/volumes/dovecot-data/_data/

Please check first if the data is still there.

Usually you should boot the fresh rocky image and setup the network, then go to cluster-admin and setup NS8 like create a cluster. If you reboot before you setup NS8 you get a broken network/cluster-admin etc. as you explained it.
But that never happened when the NS8 was already working.

Maybe you can get some helpful logs using journalctl

  1. I did find the mailbox in the directory you mentioned. It does have content. Is there a specific file I should look for in that tree, that “should” be big enough (they probably migrated 10+GB) to verify I have it?
    (EDIT: There seem to be some .dot folders which seem to be the mailbox folders… They have gargled text, which I assume are how mail server translates… Greek folders.)

  2. As I said, this was not my first NS8, or second or third (and my first NS8 itself was installed maybe 10 times as I started from beta). This one was properly set to network, was reboot at least 2-3 times already and worked fine. This is why I told them to start migrating mail. As I said, this is not just broken cluster-admin… I doubt any NS8 services are running at all. Among other services!
    NMTUI itself doesn’t even work! it says “Could not contact NetworkManager: Could not connect: Connection refused.”!!!

  3. So if mailbox survived, how I import it to a fresh NS8? (assuming I attach this qcow as a second disk… as I don’t see any other way)

  4. journalctl has 17k lines and plenty of errors. But since I can just see the console from a web-based shell of the VM (so not “get” capabilities AFAIK)… I don’t know how to retrieve it…

…to be honest I am hoping NS team themselves are to be triggered by how a fresh working NS8 with a single “addon” (mail and webtop- with webtop not even used yet)., just “crashed” fully… when the only actual interaction users had, was to migrate mail folders to the single mailbox using their mail clients (one folder at a time)…

To check the sizes:

du -h /home/mail1/.local/share/containers/storage/volumes/dovecot-data/_data

Check if containers are running:

podman ps

Check if the api is working:

api-cli run cluster/get-name

Check network services:

ss -tulpn
  • Install mail app on new NS8
  • Get the right UID and GID on the new NS8:
    root@contabo:~# ls -l /home/mail1/.local/share/containers/storage/volumes/dovecot-data/_data/
    total 36
    drwx------ 4 427779 427780 4096 May 12 17:47 admin
    
  • Put the mail directories and files to the same location on the new NS8
  • Set the owner to the right UID and GID:
    chown -R 427779:427780 /home/mail1/.local/share/containers/storage/volumes/dovecot-data/_data/*
    

You could do

journalctl > logdump

to create a file logdump containing all logs.
On the new NS8 you can grab it from the “old” disk.

(don’t know how to multi-quote here)

  1. Size is 27GB which seems ok.

image
…no space??? Could the pre-build qcow have some grow limit (if that is even possible). Because as I said the host has plenty of space.

image

  1. For migrating the mailbox, it is my last resort. We will revisit this IF I go that way.

  2. Cannot dump the log because… see #2… no space! I guess this is the issue! Is this resolvable and if yes, will it allow NS8 to recover?
    (but I guess that second part we need to see later)

My host is UNRAID (i.e. KVM). As I said I used the pre-built qcow image, so I didn’t define a size somewhere and the file did grow from it’s initial size (what was it… 2-3GB?) to around 37GB… maybe this is the size defined in the partition table? Can I edit this somehow?

As you used the Rocky image and the virtual disk is big enough you could just expand the filesystem in the VM:

Check devices:

[root@node2 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   40G  0 disk 
├─sda1   8:1    0   99M  0 part /boot/efi
├─sda2   8:2    0 1000M  0 part /boot
├─sda3   8:3    0    4M  0 part 
├─sda4   8:4    0    1M  0 part 
└─sda5   8:5    0 38.9G  0 part /var/lib/containers/storage/overlay
                                /

For /dev/sda5 you’d need to execute the following: (remove --dry-run when it looks ok)

growpart --dry-run /dev/sda 5

About expanding/resizing a disk, see also Disk usage — NS8 documentation

1 Like

So I did a qemu-img resize, before your post.
I made it 200G.

I booted NS8 and I want to use its own tools to grow the partition or partitions to my qemu size…

Here is lsblk output:


…I see vda5 is the one that “grew” to the limit.
You can also see that “vda” now shows as 200G.

Is vda5 the one that I need to grow again? Or maybe the other partitions too? (and is that possible if they are not in the end of the virtual disk?)

How do I grow the paritio (or parititons) to the full size I defined the qcow?

…erm… problem…
image

…maybe I should delete something useless to get some space? (I don’t even know how much space this tmp needs)
Or use some external tool?

EDIT: I used a temporary ram based /tmp and this worked.
I am now going to reboot and see what happens…

*** I HAVE NETHSERVER8 BOOTING!!! ***

…now is there any “health check” for NS8?

1 Like

Yeah, I also found that mounting /tmp could help, see https://stackoverflow.com/questions/59420015/unable-to-growpart-because-no-space-left

mount -o size=10M,rw,nodev,nosuid -t tmpfs tmpfs /tmp

I’d check the nodes page in cluster-admin. Usually everything should work again after “no space left” issues…

Seems to work.
I didn’t connect any clients (everybody is asleep now in this timezone), but webtop (a good way to test it first time in that install) did connect to the important mailbox and I can see the contents. That’s all that matters.

Thank you very much for your help Markus, appreciated.

…and I hope the system can now handle a single 150GB mailbox…
(yes… don’t ask)

2 Likes

I wonder if NS8 itself could auto set some triggers (free space check) to notify sdmin before catastrophy.

The very least, someone should update documentation to say about the default partition sizes of the pre-built images, as a warning and link to the resize procedure.

In any case it has proven its resiliency…

2 Likes

The metrics app also checks for low disk space, see Metrics and alerts — NS8 documentation

1 Like