Speeding up Nethserver on Proxmox

thorsten · December 17, 2020, 2:19pm

Hi,

Last weekend my SSD on my Proxmox Server failed. I used the SSD simply for ZIL log and cache according to Proxmox manuals. Prices were low, so I preplaced the 128 GB SSD (equivalent to my RAM) against a WD RED 512 GB SSD. So what to do with the memory?

As I am a not really firm to Linux, but I dared to play around a little. So I parted the SSD and assigned the partitions (GPT or SWAP) as follows:

128 GB ZIL log
128 GB ZIL cache
128 GB Proxmox swap
32 GB (equivalent to NS VM RAM) formatted as SWAP and hard-attached to the corresponding NS-VM. Within the VM -> attached the device as SWAP within /etc/fstab using UUID.
3 x 16 GB other VM same configuration

From my first experience the NS VM reboot, DNS request, web-frontend (Cockpit), Nextcloud was speed up dramatically. I could get used to this

I am open to comments, suggestions, ideas or background information if this procedure is nonsense or really meaningful. If someone thinks it is meaningful, I will provide a little how to (as this does not work from the Proxmox Web-Frontend), however this is not a “real NS topic”.

Best regards
Thorsten

Andy_Wismer · December 17, 2020, 3:50pm

@thorsten

Hi Thorsten

I may be mistaken, but I was under the impression that a ZIL-Cache should NOT be used for anything else… (Swap), but I can’t really say how that can affect performance or lifetime of the SSD…

But in any case, the ZIL-Cache should be the first partition on the SSD!

Use flash for caching/logs. If you have only one SSD, use parted of gdisk to create a small partition for the ZIL (ZFS intent log) and a larger one for the L2ARC (ZFS read cache on disk). Make sure that the ZIL is on the first partition.

Other tips:

Enable compression on your Zpool!

My 2 cents
Andy

thorsten · December 17, 2020, 4:01pm

Yes, I think this is a missunderstanding:

The cache and the log got separate and independent partitions for the pool, however those two partitions are located on the same physical SSD together with some more partitions for swap.

See here:

root@vn01:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 06:16:08 with 0 errors on Sun Dec 13 06:40:0                                                                                             9 2020
config:
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0
        logs
          sdd1      ONLINE       0     0     0
        cache
          sdd2      ONLINE       0     0     0

cfdisk /dev/sdd

Disk: /dev/sdd
Size: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Label: gpt, identifier: xxxxxxxx
Device Start End Sectors Size Type
/dev/sdd1 2048 268437503 268435456 128G Linux filesystem
/dev/sdd2 268437504 536872959 268435456 128G Linux filesystem
/dev/sdd3 536872960 805308415 268435456 128G Linux swap
/dev/sdd4 805308416 838862847 33554432 16G Linux swap
/dev/sdd5 838862848 905971711 67108864 32G Linux swap
Free space 905971712 976773134 70801423 33.8G

with
sdd1 → logs
sdd2 → cache
sdd3 → proxmox swap
sdd4 → added to NS VM as swap (used as internal swap within VM from fstab)
ssd5 → added to some Ubuntu VM as swap (used as NS internal swap within VM from fstab)

Andy_Wismer · December 17, 2020, 4:12pm

@thorsten

Hi Thorsten

No, the disk layout seems clear and what I understood.

But Proxmox (and others) DO emphasize that the Cache should be the first partition (log can be second). I’m sure they have a reason for this!

I’m a bit worried that having Swap and Zil-Cache on the same SSD may “age” your SSD a bit too fast! I used part of a SSD as cache - it wasn’t a WD-Red, but also a brand name, can’t recall anymore. In any case, due to the usage as Cache and other use, the disk died within 3-4 Months!

My 2 cents
Andy

stephdl · December 17, 2020, 8:13pm

Can we imagine to store reliably to a zfs pool of SSD…I must admit that a spinning disk is for me something like really old technology.

Just some interrogation, actually I have an array of 4 disks in raid6(lvm+software raid) but in the future (my drives are about 38000 Hours) I would hope to move to full SSD solution

Andy_Wismer · December 17, 2020, 8:52pm

@stephdl

Salut Stéphane

Nowadays almost everything uses SSDs… The main exception: Large volume data, like Backups are still often on real HDs.

One client has an 8-Bay Synology NAS, the first 4 are SSDs in RAID10 for Proxmox, the rest are 6 TB WD Red for backups…
The 6 TB Disks (Actually, the Volume they are in is synched daily home to the exact same NAS, albeit different Disks.
In case the NAS fries during say Christmas / New Year, where it can take longer to get spares, the Boss can drive home, pick up that NAS and continue working. The Data is already on that NAS!

My 2 cents
Andy

stephdl · December 18, 2020, 9:15am

We can read that SSD must have custom settings to save their life time, does it is real, does it is something that you have magic recipe for proxmox, or you think it is already included in the linux kernel and you really do not care (at least for my laptop, I do nothing more that to use LVM)

Andy_Wismer · December 18, 2020, 9:56am

@stephdl

Salut Stéphane

If you installed Linux on a SSD, it should have no problems for a normal Workstation or Notebook. Linux knows how to handle SSDs.
It can become a problem if you just cloned over your existing HD (Real rotation disk) to a SSD. That Linux was not setup on a SSD, and just by cloning it over, it will not have any settings for SSD.
But this is also usually nothing to worry too much about.

Much more critical are “heavy” use stuff, like Proxmox (any virtualization), Storage (NAS/SAN), or Database & Cache…

Synology has on their Midrange NAS often Options to put in two SSDs for caching the HDs (mSSDs in the newer ones).
Now, if you use a normal, good SSD, like Samsung Evo series, you’ld expect problemless working… I did do this , 1.5 years ago.
The Disks were dead within 3-4 Months!
And these were either SANdisk or Samsung (Not No-Name disks)!

For really heavy loads, you need so called Enterprise class SSDs. Top brands are eg. Toshiba and Hitachi.

Problem with SSDs is the “wear” on the memory themselves.
See the Wiki article below:

or

The english version has a bit more info, but I know that you understand french better!

A very Basic sample:

Let’s say we’re talking about a typical 512 GB SSD. A normal Disk will also be sold as 512 GB, but a SSD will often be sold as a 400GB, 460GB or 500 GB SSD, but NOT as a 512, which it actually is. This “spare” space is used for wear levelling - just like old HDs used to move defective blocks to usable space…

Even in Proxmox, a classic “heavy lifter”, it really also depends what you’re actually using the SSDs for. Storage? Cache? ZFS?
I wouldn’t choose a cheap disk to do my ZIL-Cache!

Hope this answers some of your questions!

Mes deux centimes
Andy

Synology DS1819+ NAS, Storage manager, Disks:

The moment you see something in the “Expected Lifespan” field, your wear is high on the SSDs, and you can expect it to degrade fast!

thorsten · December 28, 2020, 9:29am

@Andy_Wismer

I think this is an important point. Maybe the first, second … other partitions to be used as logs or cache are derived from old spinning disks. The more outside the most used partition is the less movement is necessary per data. I guess this got unimportant on the time of SSDs where the speed is not depend on the movement of parts and there is no sector size difference for outer and inner diameter…

If I am right, the partition (first, second) of a SSD is less (un-) important compared to an HDD if used for cache / log.

Andy_Wismer · December 28, 2020, 9:44am

@thorsten

In this sense, you’re quite correct. The position of a partition doesn’t matter anymore.

SSDs are very different from HDs with moving parts and a disk. The fact that a disk is like an old vinyl record means that the inside tracks / sections are different from the outer. I’m on purpose not using the term “partitions” here. A disk does not allow true random access, which a SSD can (more or less).

On the other hand, anyone nowadays planning on using a conventional HD for cache/log in ZFS must be driving a horse chariot, not a car… “Somewhat outdated concepts”!!!

However the quality of the material still does matter!

Actually, the one single advantage a HD had over SSDs of today is the “acoustic feedback” of a disks conditions. With HDs, you knew it was time when you heard the clicking noise… SSDs die more or less silently. With HDs, you could also check if at least the motor was turning, the centrifugal forces could be felt, if holding the disk. This is not possible with SSDs anymore, as they have no moving parts, there’s also no centrifugal force…

However, you do get the best performance (and longest life expectancy) if nothing else is using that SSD, only one partition dedicated to cache or log.
Logs are written sequentially, when things happen. Dedicated, the disk is written from start to finish.
Anything else use that disk, the log is written where there’s space - this tends to use up a disk faster than just logs.

Cache is similiar, but different from logs, as they are not per se sequential.

My 2 cents
Andy

thorsten · December 28, 2020, 11:55am

Andy_Wismer · December 28, 2020, 11:59am

Mike Bombich, the programmer of the best Mac cloning tool (Carbon Copy Cloner) even used that in his older advertisments… “The ominous clicking noise…”

Andy_Wismer · December 28, 2020, 12:14pm

@thorsten

Mind you, the “joke” about acoustic feedback did or does have it’s practical uses…

A new client for IT:
An almost dying disk with important data (and no backups done by the client before!) - you knew by the noise that the disk was on it’s last moments…

If you had issues, a common solution (by me) was to put the disk on the table, lift up one end up
about 1.5-2cm, and let it drop on the table… “Controlled Gravitational Repairs”… This often got the disk back working again for a couple of hours, enough to get the data of the disk or clone it!

There is also the term “Percussive Repairs” for things, “Percussive Therapy” for humans!

My 2 cents
Andy

thorsten · December 28, 2020, 12:22pm

Based on a rescue article on German c’t magazine I spend several hours with me laptop close to my deep freezer in the cellar. … The HDD was in the deep freezer, the cable through the lips and me outside trying to rescue the data … heat is an issue on broken disks … and I managed to receive a lot of files using this approach, but it took time … If I finde the article I will post it

Andy_Wismer · December 28, 2020, 12:35pm

I’ve done that too. In one case had to use a ventilator and a heat sink affixed to the old HD, just to get a clone of that disk! (It worked!).

Often, the excess heat is caused after one or several head crashes, damaging the disk’s magnetic surface, but also acting as sandpaper (Schmirgelpapier) for the disks/heads. And resistance = heat. And too much heat either causes the heat sensor to trigger, switching of the disk, or making the situation worse!

“Creative” ideas for a last resort repair!

My 2 cents
Andy