Takeaways from 3 years of running Red Hat Satellite with ZFS (ZoL) on RHEL
Red Hat Satellite provides distribution for rpms and containers for many Red Hat Products. It's an over-simplification but it works well enough for the purpose of this post.
As someone who works with RHOSP and RHEL on a daily basis, I find it convenient to have a local Red Hat Satellite VM and all of my permanent or temporary RHEL/RHOSP nodes connect to it.
Why use Red Hat Satellite at home?
There are a few reasons:
- It provides a sort of local Internet Cache of rpms and containers: I might be working on RHOSP 16.2 and perhaps I will need to stand up a temporary RHOSP13 cloud to assist a customer or work on a BZ. This is usually fine but it could impact the Home bandwidth at the worst moment possible, especially when the rest of the Family is taking remote classes.
- It's always faster to cache everything locally and in some cases of Network congestion in the middle of the day, it helped me save the virtual deployments I was launching in my Lab.
Why did I want a VM for something like Red Hat Satellite?
Some of my hard requirements for the Red Hat Satellite VM were:
- I wanted it to be a self-contained VM that I could bring over to a customer site on a USB disk.
- Since it had to be a VM, I wanted to be able to copy that VM without too much trouble between my hypervisors.
- Since I wanted everything RHEL7/RHEL8 and RHOSP in that VM, that meant at least 1.5Tb of consumed disk space (and possibly more than that available for peak download times).
Back in 2017, when I started that journey, I started with a single VM and a single qcow2 disk file.
This proved slow, inefficient and difficult to manage: with a 500G qcow2 file, you need double that space to rsync that qcow2 some place else. Also, with a couple million files/links on XFS in a RHEL7 VM, I/O wasn't great. I needed to become creative with storage and increase performance tremendously and I needed this -inside- the VM to help with my daily activities.
Introducing ZoL (ZFS-on-Linux)
ZFS is a state-of-the-art "filesystem' and is a constantly updated Open Source project. The origin of ZFS can be traced back to Solaris and Open Indiana. Over the years it was ported to many different platforms (BSD, OSX, Linux, etc..) and when Oracle closed down the source, ZoL started its own path.
On RHEL, it is not difficult to add ZoL to your system. There are even binary kmods available which make this convenient.
Back to my Red Hat Satellite VM.. I wanted decent filesystem performance, instant snapshots and some reliability (checksums, RAIDZ, etc..). I am not going to describe the 3 years of that adventure and how I managed to find an optimal configuration but I've been using this since the Sat 6.3 times and my Satellite is still alive an kicking (on RHEL 7.9 and Satellite 6.9.7 at the moment).
Carving out storage for the VM
Since I wanted to be able to copy/backup/clone that massive VM and because I had a huge performance requirement, it quickly became evident that I needed to break down storage into multiple disks. So I maxed out of the number of disks allowed by qemu-kvm. Each of these was initially created as a 64G B qcow2 file. Eventually, disk consumption grew and I had to make the virtio disks 96GB, then 128GB and now around 160GB.
Here's what my sat6 VM looks like today:
For performance and consistency reasons, each of these virtio disks was configured with write-through cache and ignored discards:
I decided to use multiple zvols of 5 devices each, configured as raidz1 (in case something happens at the host-filesystem level) resulting in the following configuration:
As for the ZFS pool:
At the beginning, I only consumed 10 disks but as I started adding repositories into my Satellite VM, I quickly needed to add disk space. This was performed by adding a few more zols and over time by making the individual qcow2s larger (which ZoL handles just fine).
The resulting configuration has a single pool and several ZFS filesystems (one for each directory that Red Hat Satellite usually hammers with heavy I/O):
The benefit of this complexity is apparent in several different areas:
- Content Views promote in just a few minutes.
- The Satellite UI is somewhat faster since it makes heavy use of filesystem I/O.
- Having multiple versions of several large Content Views is not an issue (ZFS handles millions of files/links very well)
- Doing storage benchmarks (fio, satellite-benchmark) consistently show read and write performance between 800-1000MB/sec with around 150k/170k iops:
Overall, having ZoL inside of my Satellite VM has been a very welcome change. In the past 3 years, I've not run into a single problem -caused- by the fact I was running ZFS as the underlying filesystem for the Satellite software. Rome wasn't built in one day and the ZFS history of that pool gives an insight into what it took to get there:
Of course, not everything is perfect and in some cases, I've had to wait a few days after a new RHEL minor release came out so that the ZoL maintainers would produce updated ZFS kmods for the new release.
Sizing Satellite with ZFS
The above VM has 56Gb of RAM and 16 vcpus. At the moment, Satellite 6.9 is using the basic tuning for medium-sized Satellites. Of those 56Gb RAM, 16Gb are set aside for the use of the ZFS ARC cache.
Overall, with 20-30 Content Views and close to 600 unique repositories, I consider this VM to be somewhat representative of a medium-sized Satellite at customer site. It has been quite an experience running that VM:
Useful Links:
- Impact of Disk Speed on Satellite Operations: https://access.redhat.com/solutions/3397771
- Satellite Storage Benchmark: https://github.com/RedHatSatellite/satellite-support/blob/master/storage-benchmark
Comments
Post a Comment