FCoE Boot from SAN: Understanding the 2TB LUN Size Limitation
In the early days of computing, digital memory and storage were highly expensive. To save space, engineers frequently abbreviated years, meaning the year 1979 would be represented as '79' instead of '1979'. As the year 2000 approached, people realized that the two digit space allotted for storing the year is not sufficient. This was popularly known as the Y2K problem.
The Y2K problem came to mind while investigating a customer-reported issue of the networking product we are working on. The networking product is a Converged Network Adapter (CNA). A single device which combines the functionality of a traditional Network Interface Card (NIC) and a Host Bus Adapter (HBA).
FCoE (Fibre Channel over Ethernet) allows Fibre Channel traffic to run over Ethernet networks. Because of this, it inherits the storage capabilities from Fibre Channel. Using Boot from SAN (BFS) is popular at places where FCoE and high performance block storage access is available. Boot from SAN is a common architecture in enterprise data centers.
"Logical Block Addressing (LBA) is the scheme used by operating systems to address and locate data blocks on computer storage devices."
The SCSI READ CAPACITY command
When a storage initiator (the HBA or CNA) needs to discover the size of a LUN, it sends a SCSI READ CAPACITY command to the storage target. There are two versions:
- READ CAPACITY (10): 32-bit LBA width, maximum LUN size approx. 2TB (2,199,023,255,552 bytes)
- READ CAPACITY (16): 64-bit LBA width, maximum LUN size approx. 9.4 ZB (almost impossible to cross this limit)
- READ CAPACITY (10) was the original command, defined when 2TB was considered an enormous storage device.
- UEFI (Unified Extensible Firmware Interface) is the modern variant of the old BIOS that connects hardware to OS.
The limitation of the Marvell FastLinQ CNA
The Marvell FastLinQ UEFI FCoE driver supports only the READ CAPACITY (10) command. This is a 32-bit LBA (Logical Block Address) command, which caps the addressable LUN size at approximately 2TB. When a LUN exceeded this size, the storage controller returned 0xFFFFFFFF as the last LBA value — the maximum value representable in 32 bits. This is the storage controller's way of signaling 'this LUN is larger than what READ CAPACITY (10) can represent'. Marvell FastLinQ UEFI FCoE driver does not implement READ CAPACITY (16). When it received 0xFFFFFFFF, the LUN could not be correctly enumerated. And the boot process failed.
What could be done
- Workaround 1: Keep the boot LUN under 2TB. This is the simplest workaround. The boot LUN does not need to be large — it only needs to hold the OS.
- Workaround 2: Use iSCSI boot instead of FCoE boot. iSCSI is a storage networking protocol that links devices over TCP/IP networks.
- Workaround 3: Use a local boot disk. Booting from a local NVMe or SATA drive avoids this limitation entirely.
Takeaways
- A system may support large LUNs at runtime but fail during boot due to firmware constraints.
- Legacy design decisions (such as 32-bit addressing) can silently limit modern systems in unexpected ways.
- Firmware limitations can persist long after OS and hardware have evolved.
- Engineering decisions have long lifetimes — future scale must be kept in mind.