Last week I wrote about our experiences on setting up a diskless Ubuntu installation into our blade server environment. There were some unsolved issues which we managed to resolve and finally got Ubuntu to boot. In this post, I’m going to continue from where the last post ended.
The I/O error when installing Ubuntu was, as expected, caused by a too complex mapping of the attached storage. We reduced the connections to a single port and mapped it via QLogic configuration utility to a single controller on the disk array. This way the installer as well as the boot manager should be able to see the connection as soon as the server firmware has booted. After this change, the installation was in a way successful. It did complete and didn’t throw any error. However, after the system restart and extracting the installation media, the system failed to boot. It couldn’t find the kernel to start, or the boot loader itself. We were left pondering if the QLogic driver was not compatible with the kernel, GRUB or some other essential component, or if the server UEFI was not configured properly.
After hours of browsing the documentation, consulting the community, searching on Google and testing various settings we found the working settings. It was confusing enough that there were four places to configure the QLogic firmware and two places to configure the boot sequence, but at least one fibre channel setup utility failed to load. The most important thing was to disable the UEFI boot and use only the legacy boot option. This, in the other hand, caused the blade server chassis to report an error. Perhaps it cannot communicate properly without the UEFI boot. We’re looking more into that in the future. Another thing worth knowing is that by disabling the UEFI boot, the boot manager configuration doesn’t seem to have any impact and it didn’t even recognize the CD drive. Deep within the UEFI/BIOS settings was another set of options to define the order in which I/O devices are started. We removed any local disk controllers and set the fibre channel and CD to be top priorities. Another settings we made just in case, were automatic login for each connected fibre channel card, boot primarily from a specific card and a LUN combination and disabled the local SAS controller entirely. The IBM support site has a few articles on disabling the Optimized Boot in order for boot device changes to work correctly. In legacy mode the optimized boot is locked in the enabled state. This seems to have no effect but it was misleading information.
With the said configuration, we once again ran the Ubuntu installer, provided the QLogic drivers (which can be found for Ubuntu) via USB thumb drive and hoped for the best. The installation was fine and it was the very first time Ubuntu would actually boot. There was no need for any special boot loader, network boot or other hacks. Afterwards, I think that lack of proper documentation, a mix of old blade chassis and new blade server hardware and inconsistent UEFI were the major causes for the difficulties we had. Of course, this was a completely new case and we didn’t have any previous experience on a setup like this.
When we get the system fully configured and working with the complex SAN connections, I’m going to introduce some guidelines and more detailed guide on how to build a configuration like we did. It’s going to be more technical oriented, but I think it is knowledge that needs to be shared. Also, I hope to provide some benchmark results if you’re evaluating a diskless Linux installation.