FreeBSD (and Linux), Podman containers and Large Receive Offload.

I can’t deny I am quite unsatisfied with my current status quo. I’ve been on Linux since 1995, kernel version 0.99p. I’ve advocated Linux since it was even difficult to make a network adapter work. Linux has definitely grown into the foundation of the current Internet and most workloads out there.

However, I feel that the community has gone into a space of unneeded complexity in many circumstances. Great hyperscalers and giant websites sure have these needs. I’ve worked for many of them, after all. I sensed that I had the need to go back to a much simpler approach, where the Unix spirit still has its root. When Eva started to talk about ZFS and FreeBSD enthusiastically, I decided to give it a try.

I also had quite a personal view of containers. I like to use containers as a way to deliver software, more or less like a package manager. Most of my personal software is packaged in containers so that I can run them anywhere, even from the command line. It was natural that I wanted to build containers in FreeBSD as well.

I have considered jails, but I was eager to test at the same time both Podman and FreeBSD. So, being forced at home by a rainy weekend in London, I decided to explore Podman and the differences between running that on Linux and FreeBSD. And I also wondered how Podman compares to Docker on Linux, since I never liked how Docker “messed up” with the system.

I decided to use one of my toolchains as an experiment. This container builds the Harbour compiler and the Clipper Database Utility (DBU). I know, writing about xBase/Clipper makes me an old lady, but I still love the text user interface that comes with that. I measured how long it takes to build that container with Podman on Debian Linux (12), with Podman and FreeBSD 13.2 and with Docker on Debian Linux.

The environment

I tested both Debian 12 (bookworm) and FreeBSD 13.2 as guests under Proxmox 7.2-3 (KVM). Each VM has 1 vCPU, 2GB of RAM and 100GB of disk space. Network cards and disks were configured with VirtIO. The hardware is a mini PC with Intel i5-4590T CPU, 16GB of RAM and a Crucial 1TB SSD. The SSD has a declared performance of maximum read/write at 540MB/sec, which definitely impacts the timings. I know I can have better hardware, but this is what I have in my storage room in London and what I could use during the weekend. So … please forgive me. 🙏😇

While I tried to keep the Dockerfile for Linux and the Dockerfile for FreeBSD as much similar as possible, it’s undeniable that the comparison can’t be exactly 1:1. The OSes are different, and so are the required add-on packages to be installed.

This wasn’t meant to be an academic paper, but rather satisfy my inner curiosity of an indication of how Podman is behaving on FreeBSD.

To have as many consistent results as possible, no other workload was running while performing the tests except the intended workload. Also, I have cached the OS container image (aka pulled) before executing the tests. The images are different in size, which would have given FreeBSD a significant disadvantage. Look at the image sizes:

Image Size
Debian 12 Bookworm 121M
FreeBSD 13.0 1.01GB

I also tried to avoid any cachable object that would invalidate the results. Any other interim or generated intermediate container image has been removed before any test.

The build process

It’s worth mentioning what steps are in the Dockerfile to understand the following paragraphs better:

The (unexpected) results

I ran the “time podman build” command on the three platforms, but I couldn’t believe at the results:

Distribution Result
Podman/Debian 12 15m 16s
Podman/FreeBSD 13.2 1h 25m 6s
Docker/Debian 12 28m 36s

It took more than one hour to build the container on FreeBSD, while it took 15 minutes under Debian. I expected some differences, perhaps in the worst case, twice as much, but not THAT much!

I didn’t have any tangible data, but my gut feeling was that a lot of time was wasted by cloning the original Harbour repository under FreeBSD. So I ran the git clone command under the respective OSes, with the following results:

Distribution Result
Debian 12 5m 27s
FreeBSD 13.2 5m 53s

Similar, right??? “What the fork is going on” was my first comment. So I decided to run an inverse test, i.e. run the build without the git clone step under Podman (and running it before):

Distribution Result
Podman/Debian 12 8m 52s
Podman/FreeBSD 13.2 8m 56s
Docker/Debian 12 23m 6s

“What the actual fork???” Results between Debian and FreeBSD were actually very comparable this time. That doesn’t make sense, does that? So I decided to invest more time and measure every single step within the building process:

Step Podman/Debian 12 Podman/FreeBSD 13.2 Docker/Debian 12
git clone 5m 55s 1h 16m 31s 5m 38s
make 7m 33s 6m 20s 15m 11s
make install 6s 38s 6m 47s
Compile DBU 1s 1s 2s

And here’s the catch. As suspected, the git clone command took most of the time in the building process on FreeBSD. But why git clone on the OS itself, i.e. not under Podman, was performing as expected?

As I am new to FreeBSD, I’ve limited debugging capabilities. I ran tcpdump and saw that there were strange PSH packets. I ran truss, and I saw the git process performing read/write operations. But I wasn’t able to connect the dots.

The turning point

I took the courage and contacted the author of ocijail, which is the runtime that Podman uses under the hood to run the container on FreeBSD (Podman uses crun under Linux). It turned out that the author, Doug, is a very nice guy who helped me. And I also discovered later that he’s also the one who ported Podman and buildah to FreeBSD.

Dough ran a series of tests that gave similar results to mine. A key step for him was confirming that running the git clone command through buildah gave him a result similar to what was on the bare OS level. Using the combination of buildah and chroot is basically like running Podman, except for the network stack.

But then he had an intuition: he noticed during the tests that the network speed under Podman was significantly lower than under buildah. And it was confirmed on my setup. The speed under buildah was 16MB/s in my environment, while under Podman an “astonishing”: 32Kb/s. Yep, 32Kb/s!

So, it was somehow the network. We both run Podman with the option “–network=host” to skip the network stack (CNI) and use the host’s network stack. And … the git clone result was 5m 56s, which was comparable to running the command natively on the host. It was the network.

Finding the culprit

Doug was able to reproduce the problem in his environment. He noticed that packets were “optimised” through Large Receive Offload (LRO) even if the packets were intended to be routed to the container (or jail). This was causing retransmits (the PSH packets I saw) and very slow performance as the packets were too large to be routed to the jail.

If you aren’t aware (as I was), Large Receive Offload (LRO) increases the inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that must be processed.

FreeBSD applies LRO automatically on an interface (LRO option), but -in theory- should be disabled when it acts like a router. So, Doug opened bug #273046.

The workaround is to disable LRO on the interface (using the -lro option). With that option disabled, Podman under FreeBSD has a comparable performance compared to Podman on Linux.

Takeaways

It’s unsurprising that Podman is a very nice tool for running containers. It has been around on Linux for a while and definitely, as the above numbers show, it has some performance advantages over Docker. Sometimes, Docker takes over twice the time to perform the same task. Even though I haven’t objective data now, I can tell you that -from my experience- Docker usually messed up with the base system. Podman, instead, seems to impact the system less. On Linux, it can even run rootless and can generate a systemd service, which IMHO integrates into the OS more … even if I don’t like systemd approach that much (ok, kill me now!).

FreeBSD on the server side is mature, it feels like running Linux 10 years ago to me. It’s fine, but sometimes I miss some commands from Linux. ZFS is definitely amazing, especially the experience of mirror boot, it reminds me of mainframes. Podman on FreeBSD instead is quite a surprise to me, though. I thought I would have encountered more issues, but instead, it works as expected (despite the LRO “bug”).

I think that running containers on FreeBSD, along with their automation with ansible, gives me what I was searching for. Something more towards the Unix experience that I wanted, plus the flexibility that containers can give, especially in case of disaster recovery.

P.S. If you are a developer, and you haven’t done it yet, I strongly encourage you to consider Podman as an alternative to Docker. Even in your CI/CD pipelines.

The end (or the beginning)

I haven’t been writing a tech article in (perhaps) more than two years. My last writing was a book at the beginning of the pandemic. I went through a lot, and only in the last few weeks, my brain allowed me to experiment again and find the energy and the courage to write this article. Perhaps it is not the best one you’ll ever read, but it means the world to me.

First and foremost, I need to thank Eva. She’s the one who gave me inspiration to get back to the keyboard again, despite what I was going through. She has been alongside my journey every single day and encouraged me to be myself. She’s not “just” my best friend, but a true partner, despite our distance. Since we first met one year ago, she has become a fundamental part of my life. Eva, ditto what you said to me on 2022-09-27.

Thank you Doug, for being so nice and helping me explore my curiosity. I owe you a few pints.

I also need to thank Kat (aka “usrbinkat”) for playing an important part in my “tech recovery” (I still have your message printed in my wallet). Ezio, Matteo, Serena, Gianluca, the “sapphic” authors and readers on Twitter and all the people who helped me in this difficult period of my life. And to those who accepted me as I am now.

With love, Tara (aka Gippa)

P.S. A few days after I finished writing this article, I found out this one by Stefano Marinelli who ran into a similar issue, althought related to jails: https://it-notes.dragas.net/2023/08/14/boosting-network-performance-in-freebsds-vnet-jails/

2023-09-07