I’m posting several problems that I have encounter so far when using Proxmox with ZFS (No L2ARC or ZiL) for virtualization and how did I solve them:
My first problem comes to hardware, I started this project as a trail, as such, I used non-professional hardware due to not having any fundings. So my hardware was mostly 2009 (4x1TB SATA II, 6x4GiB non-ECC DDR3 1066MHz RAM, i7 960 CPU), yet as usual Linux manage to find a way to boot and run.
I installed Proxmox using ZFS with default parameters so I could use RAID10. And set this configuration:
# Change swappiness to swap on out of RAM
echo 'vm.swappiness = 0' | tee -a /etc/sysctl.conf
# Change swappiness right away
sysctl vm.swappiness=0
Changed repositories to No-Subscription
Fix recieve mail notifications
dpkg-reconfigure postfix
# Choose [Internet with smarthost]
# Set [System mail name] = [servername.domain.com] which is your FQDN.
# Set [SMTP relay host] = [192.168.###.###] which is your relay mail server ip. You could use his FQDN.
# Set [Root and postmaster mail recipient] = [proxmox@domain.com] which is the mail address to recieve proxmox notifications
# Set [Other destinations to accept mail for] = [servername.domain.com, localhost.domain.com, localhost]
# Choose [No] for [Force synchronous updates on mail queue?]
# Set [Local networks] = [127.0.0.0/8] you only want to relay mails from localhost
# Choose [yes] for [Use procmail for local delivery?]
# Set [Mailbox size limit] = [0]
# Set [Local address extension character] = [+]
# Set [Internet protocols to use] = [ipv4]
# Set [Internet protocols to use] = [ipv4]
KSM (Kernel Samepage Merging) tunning
https://pve.proxmox.com/wiki/Dynamic_Memory_Management
# KSM_MONITOR_INTERVAL: Number of seconds ksmtuned should sleep between tuning adjustments
# Every KSM_MONITOR_INTERVAL seconds ksmtuned adjust how aggressive KSM will
# search for duplicated pages based on free memory.
# KSM_SLEEP_MSEC: Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
# How many Milliseconds to sleep between scans of 16GB of RAM.
# The actual sleep time is calculated as sleep = KSM_SLEEP_MSEC * 16 / Total GB of RAM
# The final sleep value will be written to /sys/kernel/mm/ksm/sleep_millisecs
# KSM_NPAGES_BOOST: Amount to increment the number of pages to scan
# The number of pages to be scanned will be increased by KSM_NPAGES_BOOST
# when the amount of free ram < threshold (see KSM_THRES_* below)
# KSM_NPAGES_DECAY: Amount to decrease the number of pages to scan
# The number of pages to be scanned will be decreased by KSM_NPAGES_DECAY
# when the amount of free ram >= threshold (see KSM_THRES_* below)
# KSM_NPAGES_MIN: Minimum number of pages to be scanned at all times
# KSM_NPAGES_MAX: Maximum number of pages to be scanned at all times
# KSM_THRES_COEF: Decimal percentage of free RAM
# If free memory is less than this percentage KSM will be activated
# KSM_THRES_CONST: Bytes
# If free memory is less than this number KSM will be activated
# Reload configuration
systemctl reload ksmtuned.service
After setting this, I created a NS7 KVM template with 1GiB RAM, 2 CPU cores, 30 GiB HDD Cache=[No Cache] VirtIO SCSI, 1 NIC (eth0) with VirtIO (paravirtualized). Notice the cache type in the HDD, is important to keep the HDD cache to [No Cache], when I first started using Proxmox I read a lot of best practices that mentioned writeback as the HDD cache type, according to this guide, when you use this type of cache:
This mode causes qemu-kvm to interact with the disk image file or block device with neither O_DSYNC nor O_DIRECT semantics, so the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, and the normal page cache management will handle commitment to the storage device. Additionally, the guest’s virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity.
Analogous to a raid controller with RAM cache.
In my experience, using writeback cache causes Proxmox (with ZFS and no RAID card) to fill the buffer memory, on high I/O operations this causes Proxmox node to swap due to buffer memory and ZFS ARC filling all RAM. So, I avoided using any type of cache rather than [No Cache].
Another issue that’s bugging me is NFS, every time I copy or do a backup into NFS my cache memory fills up to the total size of all copied files. I’m assuming this might be a bad setup on either the export on NFS server or a bad mount option on the client (Proxmox).
I hope my experience help someone, please is you think I can improve this post or have any ideas do share.