Issues while using Proxmox

jfernandez · April 25, 2018, 8:30pm

I’m posting several problems that I have encounter so far when using Proxmox with ZFS (No L2ARC or ZiL) for virtualization and how did I solve them:

My first problem comes to hardware, I started this project as a trail, as such, I used non-professional hardware due to not having any fundings. So my hardware was mostly 2009 (4x1TB SATA II, 6x4GiB non-ECC DDR3 1066MHz RAM, i7 960 CPU), yet as usual Linux manage to find a way to boot and run.

I installed Proxmox using ZFS with default parameters so I could use RAID10. And set this configuration:

# Change swappiness to swap on out of RAM
echo 'vm.swappiness = 0' | tee -a /etc/sysctl.conf
# Change swappiness right away
sysctl vm.swappiness=0

Changed repositories to No-Subscription

Fix recieve mail notifications

dpkg-reconfigure postfix
# Choose [Internet with smarthost]
# Set [System mail name] = [servername.domain.com] which is your FQDN.
# Set [SMTP relay host] = [192.168.###.###] which is your relay mail server ip. You could use his FQDN.
# Set [Root and postmaster mail recipient] = [proxmox@domain.com] which is the mail address to recieve proxmox notifications
# Set [Other destinations to accept mail for] = [servername.domain.com, localhost.domain.com, localhost]
# Choose [No] for [Force synchronous updates on mail queue?]
# Set [Local networks] = [127.0.0.0/8] you only want to relay mails from localhost
# Choose [yes] for [Use procmail for local delivery?]
# Set [Mailbox size limit] = [0]
# Set [Local address extension character] = [+]
# Set [Internet protocols to use] = [ipv4]
# Set [Internet protocols to use] = [ipv4]

KSM (Kernel Samepage Merging) tunning
https://pve.proxmox.com/wiki/Dynamic_Memory_Management

# KSM_MONITOR_INTERVAL: Number of seconds ksmtuned should sleep between tuning adjustments
# Every KSM_MONITOR_INTERVAL seconds ksmtuned adjust how aggressive KSM will
# search for duplicated pages based on free memory.
# KSM_SLEEP_MSEC: Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
# How many Milliseconds to sleep between scans of 16GB of RAM.
# The actual sleep time is calculated as sleep = KSM_SLEEP_MSEC * 16 / Total GB of RAM
# The final sleep value will be written to /sys/kernel/mm/ksm/sleep_millisecs
# KSM_NPAGES_BOOST: Amount to increment the number of pages to scan
# The number of pages to be scanned will be increased by KSM_NPAGES_BOOST
# when the amount of free ram < threshold (see KSM_THRES_* below)
# KSM_NPAGES_DECAY: Amount to decrease the number of pages to scan
# The number of pages to be scanned will be decreased by KSM_NPAGES_DECAY
# when the amount of free ram >= threshold (see KSM_THRES_* below)
# KSM_NPAGES_MIN: Minimum number of pages to be scanned at all times
# KSM_NPAGES_MAX: Maximum number of pages to be scanned at all times
# KSM_THRES_COEF: Decimal percentage of free RAM
# If free memory is less than this percentage KSM will be activated
# KSM_THRES_CONST: Bytes
# If free memory is less than this number KSM will be activated
# Reload configuration
systemctl reload ksmtuned.service

After setting this, I created a NS7 KVM template with 1GiB RAM, 2 CPU cores, 30 GiB HDD Cache=[No Cache] VirtIO SCSI, 1 NIC (eth0) with VirtIO (paravirtualized). Notice the cache type in the HDD, is important to keep the HDD cache to [No Cache], when I first started using Proxmox I read a lot of best practices that mentioned writeback as the HDD cache type, according to this guide, when you use this type of cache:

This mode causes qemu-kvm to interact with the disk image file or block device with neither O_DSYNC nor O_DIRECT semantics, so the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, and the normal page cache management will handle commitment to the storage device. Additionally, the guest’s virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity.
Analogous to a raid controller with RAM cache.

In my experience, using writeback cache causes Proxmox (with ZFS and no RAID card) to fill the buffer memory, on high I/O operations this causes Proxmox node to swap due to buffer memory and ZFS ARC filling all RAM. So, I avoided using any type of cache rather than [No Cache].

Another issue that’s bugging me is NFS, every time I copy or do a backup into NFS my cache memory fills up to the total size of all copied files. I’m assuming this might be a bad setup on either the export on NFS server or a bad mount option on the client (Proxmox).

I hope my experience help someone, please is you think I can improve this post or have any ideas do share.

m.traeumner · April 26, 2018, 7:02am

Thanks for sharing, I like this kind of posts.

giacomo · April 26, 2018, 7:26am

I agree with Michael, very interesting post!

Thanks

robb · April 26, 2018, 10:53am

I tend to give an NS VM at least 2GB ram. I tried with only 1GB before, but noticed several unexplainable hiccups, especially when ram usage was 100%. With 2GB I never had these hiccups. (long periods unresponsiveness and such)

jfernandez · April 26, 2018, 9:16pm

@robb I would recommend you to use the statics module to evaluate those “hiccups”. Right now I deployed a new KVM from my NS7 template (I’m planning to make a How-to explaining some caveats when creating NS7 KVM templates on Proxmox) and RAM usage is about 18% (180 MB/ 993 MB) and overall RAM is 70% ( 698 / 993 MB ), swap is empty so far.

Andy_Wismer · April 27, 2018, 8:48am

Hi

I use Proxmox a lot - albeit without ZFS. Most of my available Hardware at home or at clients just do not have enough memory to run VMs and ZFS. ZFS needs 8 GB min, 16 GB will run, and 32 or more will fly. Most of my Proxmox Hosts have only 16 or 32 GB to start with. So ZFS is out of the question - at least for these hosts.

I usually set up a NethServer KVM with 2 GB RAM (To keep the swapfile at that size - I don’t really want major swpping to take place. After Setup, and reboot, I upgrade the VM to 4 or 8 GB RAM, which is usually sufficient for SME companies or home use.
1 or two GB RAM would give problems like during updates, backups and any database operations, and the machine became unbearably slow…

My 2 cents!
Andy Wismer

jfernandez · April 27, 2018, 1:39pm

Hi @Andy_Wismer, nice to meet you. Regarding ZFS memory needs, I don’t agree with you, ZFS ARC memory has the same to role as RAID card’s memory, better cards has more memory and CPU, same principle applies to ARC ZFS. Unless you wanna use memory deduplication, a recommended value is 1GB of RAM for each 1TB on HDD, though more is welcome. An easy way to test this would be by using a benchmark or a simple dd, like this one:

# Testing write speed
dd if=/dev/zero of=file10GB bs=1M count=10240
 
# Testing read speed
dd if=file10GB of=/dev/null bs=1M

I recommend to first change to /tmp before doing this, or to a folder like /opt, this is of course assuming that all you have your entire system on ZFS.

You can use:
mount | grep zfs

To find that out

Another thing to have in mind is that this test are an approximation since in both you are only doing either writing or reading and not both, but it would give an idea. I did a lot of reading, testing and asking people before giving ZFS a try, I suggest you to do the same. I could give you my little experience, but I’m no expert, just a young fellow surfing the net to see “how good are the waves” of ZFS