Amazon Sponsors Year-Long FreeBSD Development and Release Engineering Effort
In November 2023, I assumed the role of FreeBSD release engineering lead, coinciding with the announcement of FreeBSD 14.0. However, my responsibilities extended to maintaining the FreeBSD on Amazon EC2 platform, a task I've been undertaking since 2010. By early 2024, the dual role had become overwhelming, with my release engineering duties outcompeting my FreeBSD/EC2 work, leading to a stagnation in feature development and increased instances of uninvestigated issues. For several years, I had discussed the possibility of Amazon sponsoring my FreeBSD/EC2 work with Amazonians, often hearing that while they agreed on the importance, they lacked the budget. However, in April 2024, I found a contact with available funds, and Amazon agreed to support me for a year via GitHub Sponsors, nominally providing 40 hours per month. In practice, I devoted closer to 50 hours monthly, evenly split between EC2-specific issues, release management, and other release engineering tasks. During my funded period, I managed to oversee four FreeBSD releases: 13.4 in September 2024, 14.2 in December 2024, 13.5 in March 2025, and 14.3, which is scheduled for release on June 10, 2025. The work involved coordinating code integration, reviewing merge requests, building and testing images, and addressing any release-related issues. The effort required for each release varied, with FreeBSD 13.5 taking the least time (33.5 hours) and FreeBSD 14.2 the most (79 hours). As the releases progressed, the workload tended to decrease due to fewer bugs and issues. On the FreeBSD/EC2 side, two major features were prioritized. The first was a "power driver" for AWS Graviton instances, which ensures that FreeBSD properly recognizes and responds to shutdown signals from the EC2 API. Initially, FreeBSD ignored these signals, leading to forced shutdowns after a few minutes. I resolved this by adding code to interpret the ACPI _AEI object and configure the PL061 GPIO controller correctly. A minor issue with GPIO configuration being ignored on Graviton systems was also addressed, and a quirk (ACPI_Q_AEI_NOPULL) was implemented to bypass this problem until Amazon resolves it. The second priority was device hotplug on AWS Graviton instances, particularly focusing on hot unplug. Several issues were identified across different EC2 instance types. For example, on some Graviton systems, IRQ reservations leaked during PCI attach, leading to kernel panics after multiple EBS volume operations. I added a boot loader setting to disable legacy PCI interrupt routing on EC2. Another bug involved the firmware using PCI device power states incorrectly, which I mitigated by implementing an ACPI quirk (ACPI_Q_CLEAR_PME_ON_DETACH). Additionally, a kernel panic in FreeBSD's nvme driver during PCIe unplug was reported and addressed by the driver maintainer. Lastly, "ghost" devices appearing on the PCI bus post-eject were another EC2 bug that I temporarily fixed by adding a 10 ms delay (ACPI_Q_DELAY_BEFORE_EJECT_RESCAN) between eject signals and bus rescan. Beyond these features, I also focused on improving FreeBSD's boot performance on EC2. In early 2024, boot times on EC2 instances mysteriously slowed by a factor of three, which I traced to a root disk size increase from 5 GB to 6 GB. Adjusting the disk size back to 8 GB restored performance. Another bottleneck was the kernel entropy seeding process on Graviton 2 instances, which was delayed due to inefficiencies in obtaining entropy. By moving the entropy seeding request to the correct place in the boot loader and optimizing the entropy collection method, boot times on arm64/base/UFS images decreased from 25 seconds to 8 seconds. Similarly, booting ZFS images was significantly slower due to a transaction group verification issue. This was resolved by recording a higher transaction group, reducing boot times from 22 seconds to 11 seconds. I also expanded the variety of FreeBSD AMIs, introducing two new flavors: small AMIs, which omit debug symbols and other non-essential components, reducing disk usage from 5 GB to 1 GB; and builder AMIs, designed to help users create their own customized AMIs. Managing the increased number of AMIs and maintaining disk space efficiency required extensive clean-up and scripting to remove old images and EBS snapshots, ultimately freeing up 336 TB of storage. Additionally, I tackled broader release engineering challenges, such as parallelizing release builds to reduce build time from 22 hours to 13 hours and enhancing build reproducibility by regularly comparing built AMIs against their originals using diffoscope. Smaller tasks included fixing build breakages, reviewing patches for the ENA driver, and supporting the implementation of OCI containers. Despite these achievements, the end of Amazon's sponsorship in mid-2025 will limit my capacity to continue this intensive work. FreeBSD releases will continue, but late-landing features might be removed rather than fixed. Other planned improvements, such as automatically growing filesystems with EBS volume expansion and better configuration with multiple network interfaces, may stagnate. Industry insiders commend the significant contributions made during this sponsored year, noting that the enhancements to boot performance and device hotplug are crucial for FreeBSD's viability on modern cloud platforms. The increased variety of AMIs and streamlined release processes have also been beneficial, demonstrating the value of dedicated sponsorship in open source projects. FreeBSD, a robust and widely-used operating system, has gained substantial improvements in cloud compatibility, particularly on AWS Graviton instances, thanks to this collaboration. Amazon's support has not only advanced FreeBSD's capabilities but also highlighted the importance of such partnerships in fostering innovation and solving complex technical challenges.