Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Polls  |  Forum

Keywords: Match:
Benchmarking boot latency on x86
by Gilad Ben-Yossef

Foreword -- In this tutorial whitepaper, the co-founder, CTO, and "chief coffee drinker" of Isreali development firm Codefidence explains how to instrument x86-based systems to measure boot latency. Author Gilad Ben-Yossef supplies helpful code snippets and great advice to help any x86 device or PC boot faster.

Ben-Yossef originally published the article on his Tuxology blog. Codefidence specializes in embedded Linux development and training. Enjoy . . . !


Benchmarking boot latency on x86
by Gilad Ben-Yossef


When it come to our gadgets, we are all hungry for immediate satisfaction, and it is the challenge of embedded developers to meet those needs. There are many tricks one can employ to to achieve boot-time nirvana, but as Knuth taught us, premature optimization is the root of all evil. Therefore, the first step is to measure the boot latency and find out what parts of the boot sequence contribute to it. Otherwise, one can end up wasting time optimizing the wrong code.

Generally speaking, on a 32- or 64-bit x86 platform, the boot process of a Linux system is comprised of the following phases and milestones:
  • Power up -- when the power is set to on

  • BIOS -- including POST (Power On Self Test), device initialization, running of option ROMs, and loading the bootloader from the MBR (Master Boot Record)

  • Bootloader -- loading an operating system (OS) kernel and ancillary data (such as Linux initrd or initramfs) into RAM

  • Kernel initialization -- including CPU, peripherals, and kernel data structures, as well as bring-up of the non-boot cores in the case of multi-core machines

  • First user application, first line of code -- when the first line of user application source code is executed
Unfortunately, measuring the contribution of each of these phases to the overall boot latency is not an easy task. At each phase, different kind of code is executing, including 16 bit-BIOS code, which is part of the machine firmware; 16- or 32-bit bootloader code; 32-bit kernel code; and, user applications. Each is executing in a completely different software environment, and it is hard to find a common ground to compare the time each phase takes.

Luckily, the x86 architecture provides a useful tool: the TSC (Time Stamp Clock) register. Introduced in the original Intel Pentium, the TSC register counts the number of clock ticks from the last processor reset. Reading the current value of the TSC register is done using the RDTSC instruction.

The TSC register provides an accurate hi-res timer, assuming that there are no processor frequency changes (for example via SpeedStep technology), and that one always samples the register of the same core in a multi-core environment. Fortunately, both of these steps are easy to guarantee during the boot phase.

The easy part

As of Linux kernel version 2.6.11, the kernel supports a build-time configuration option called CONFIG_PRINTK_TIMES. This adds a time stamp based on the TSC value to each kernel log message, including during the boot process. Turning on this option is the first step towards a boot latency benchmark, as it provides a breakdown of the latency of the various points within the kernel initialization phase. In addition, turning off CPU frequency scaling OS support is required during configuration of the kernel. Otherwise, the TSC register can be rendered useless.

The PRINTK_TIMES feature can be used to get an understanding of how various kernel subsystems contribute to overall kernel initialization latency, but it only goes so far. Because the printk timestamp counts from the kernel boot and includes only kernel log messages, it does not take into account the latency induced by the BIOS, bootloader code, or the time to the first application source code line being executed. In order to measure these elements, one must add code that queries and prints the TSC value during bootloader execution, as well as during execution of the first user space program.

Instrumenting a user application

Code to query and print the TSC is rather simple. Here is an example taken from Wikipedia's article on RDTSC:

#include stdint.h>

inline uint64_t rdtsc() {
uint32_t lo, hi;
/*
* cpuid will serialize the following rdtsc with respect to all other
* instructions the processor may be handling.
*/
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx");
return (uint64_t)hi << 32 | lo;
}
A call to the rdtsc() function above can be used to measure the boot-time latency contribution of the various phases other than kernel initialization. The function can be called by the bootloader, or by a user program that will run at boot. Here is the user program, which is the easier of the two:

#include stdint.h>
#include stdio.h>

inline uint64_t rdtsc() {
...
}

int main(void) {

printf("TSC: %lld\n", rdtsc());
t1:
goto t1;

return 0;
}
The rdtsc function is the same function provided earlier in this article. Note that an infinite loop has been added at the end of the program. This hack is meant to keep the program from exiting when it has finished, thus making it easier to run it as an "init" program without having the kernel panic when the program exits.

Yet another issue to bear in mind is that on a multi-core board, it is prudent to pin the program to run on the boot CPU using the sched_setaffinity() system call (see this article for the details), as the value of the TSC register of different cores is not consistent, a fact not handled in the code example above."

To build the program, the following one-liner can be used:
gcc timestamp.c -static -o timestamp
After building the program, it is necessary to copy the timestamp program binary onto the root file system of the target board, and instruct the kernel to run the program during boot using the kernel command line parameter "init=/sbin/timestamp". If using syslinux, this can be done by specifying append="init=/sbin/timestamp" in your boot loader configuration file. Doing so instructs the kernel to run the specified program, rather then the default of "/sbin/init", at boot.

Alternatively, one can use the initramfs mechanism of the Linux kernel to include the small user-space program as part of the kernel binary image itself. Either way, it is fairly easy to write a user space program to print the value of the TSC register, and have it run as the first program during boot.

Instrumenting the bootloader

A similar effect can be produced with the boot loader, but it is not nearly as easy. There are quite a few bootloaders that can be used to load Linux on the x86 architecture (e.g. GRUB, LILO, syslinux), and x86 bootloader code can be notoriously complex. It is usually written in 16-bit assembly, and part of it runs under less than ideal conditions.

Luckily, the syslinux bootloader offers an easy way to integrate 32-bit C code that can be run as part of the boot process. It does this via its support for the Comboot API, which enables one to write a small 32-bit C program and integrate it as part of the bootloader phase.

Below is the timestamp program again, this time presented as a Comboot 32 module. It is derived from the hello.c example program supplied as part of the syslinux source code:

#include string.h>
#include stdio.h>
#include console.h>

inline uint64_t rdtsc() {
...
}

int main(int argc, char *argv[])
{
uint32_t lo, hi;

openconsole(&dev_stdcon_r, &dev_stdcon_w);

printf("TSC: %lld\n", rdtsc());

syslinux_run_command("linux");

return 0;
}
The rdstc function is the same one presented earlier in this article. Also, the little comboot32 program is concluded by invoking the syslinux_run_command() comboot 32 API call, which causes the Linux kernel image with the name of "linux" to be executed when the timestamp program ends.

To build the comboot32 program, it is a good idea to look inside the com32/samples/ directory that is part of the syslinux source code and follow the Makefile example provided there.

Having completed the Comboot32 program, named timestamp.c32, the next step is to instruct the syslinux bootloader to run the program at boot-time. The program will then print the value of the TSC register and proceed to load the Linux kernel.

The syslinux bootloader reads a configuration file, called syslinux.cfg, located in same partition into which it is installed, and then executes the command in the file. In this case, the file content will be:

default timestamp.c32

prompt 0
Putting it all together

This is a good time to recap the different stages of the boot process and determine a benchmark plan for each:
  • When the power is turned on, the board firmware runs the BIOS, which initializes the hardware and loads and runs the syslinux bootloader.

  • Syslinux reads the syslinux.cfg file, and according to the commands in it, executes the timestamp.c32 Comboot32 program. Comboot32 prints the current value of the TSC register to the VGA screen and proceeds to instruct syslinux to load the Linux kernel.

  • As the kernel boots, every printk log message sent by it to the kernel log buffer is timestamped, relative to the kernel start time, which is based on the TSC register.

  • Finally, after the kernel initialization has completed, the kernel runs the first user space program. In this case, this is the user program version of the timestamp program, which once again prints the current value of the TSC register and halts.
At the end of this process, the following values should be present:
  • The value of the TSC register when the bootloader runs -- By dividing this value by the number of cycles per second (Hertz) of the machine, one can convert this cycle count to the the number of seconds since the CPU reset up to this point.

  • The time it takes the kernel to finish initialization -- Reflecting the PRINTK_TIMES timestamps in the log messages, this value is already represented in seconds, so no conversion is needed.

  • The final TSC register value when the first user code program runs -- Here, too, one should divide the raw cycle count as reported by the TSC register by the number of cycles per second the machine is running at to obtain the number of seconds.
From these measurements one can easily construct a somewhat crude, but useful, table of the contribution each of the stages has to the overall machine boot latency. These stages are as follows:
  • The latency of the BIOS phase is measured by reading the TSC register at the start of the bootloader code, as recorded by the timestamp.c32 program.

  • The latency of the bootloader actions, including reading the kernel image to RAM and uncompressing the image, can be computed. This is done by subtracting the number of seconds it took the kernel to initialize, as reported by the PRINTK_TIMES feature, from the number of seconds reported by the user-space version of the timestamp program, and the number of seconds recorded at the bootloader launch. In addition, the various timestamps in the log messages throughout the kernel initialization process provide further insight as to the contribution of the various kernel subsystems and device drivers to the overall latency.

  • Finally, the full boot latency -- which is the overall time needed to get the first user program's first line of code running -- is the number of seconds computed from the value of the TSC reported by the user-program version of the timestamp program.
As an example, here are the values from one type of Intel Core-Duo-based board, running at 1.5Ghz:
  • The value reported from the timestamp.c32 program was 90933051222 cycles. Dividing the number of cycles by the number of cycles per second (1,500,000,000 on this board) gives a BIOS latency of 60.6 seconds. (The BIOS on this particular run wasted a long time scanning the IDE bus.)

  • The timestamp provided by the PRINTK_TIMES feature in the kernel log messages indicated it took the kernel 0.44 seconds to initialize.

  • The user program timestamp reported the first line of code ran at 93907247022 cycles, or 62.6 seconds after CPU reset, which is 2.0 seconds since the bootloader started.
If one subtracts the 0.44 seconds it took the kernel to initialize from the over two seconds of latency between bootloader and first user-program line of code, one can conclude that it took the bootloader roughly 1.66 seconds to read the kernel binary into memory from CompactFlash, including decompression time. So, it is clear that a faster storage medium can shave up to 1.66 seconds from the boot process, simply because it will take less time for the bootloader to load the kernel image into memory.

Note that these numbers are not 100 percent accurate. They neglect to attribute some kernel initialization time before the first printk to the kernel initialization phase. In addition, the numbers do not reflect the latency caused by the bootloader code initialization, instead wrongly attributing them to the BIOS phase. However, the numbers are accurate enough to be used as the basis of evaluating the effect of various changes on overall boot-time, as well as to provide a clue as to where optimization efforts need to be focused.

For further information about reducing and measuring boot latency, the embedded Linux wiki entry on BootTime is highly recommended.


This article originally appeared on Tuxology.net, a technology blog maintained by Ben-Yossef. It is reprinted here with permission.



About the author -- Gilad Ben-Yossef is the co-founder and CTO of Codefidence Ltd, and has been helping OEMs make and use free and open source software in commercial products and services since 1998. He is also the co-author of the book "Building Embedded Linux," 2nd Edition, which will be published this summer by O'Reilly. In addition, he is co-founder of Hamakor, an NPO devoted to the promotion of FOSS in Israel, as well as a founding organizer of "August Penguin," an Israeli community FOSS conference.

Gilad is a member of the Israeli chapter of Mensa, the Israeli Information Technology Association and the Israeli chapter of the Internet Society. He holds a B.A. in Computer Science from Tel-Aviv Jaffa Academic College. When not trying to make FOSS software do something the authors never intended, Gilad likes to SCUBA dive, read science fiction, and spend time with his wife Limor and his and two adorable girls, Almog and Yael.




Related Stories:

(Click here for further information)


FUEL Database on MontaVista Linux
Whether building a mobile handset, a car navigation system, a package tracking device, or a home entertainment console, developers need capable software systems, including an operating system, development tools, and supporting libraries, to gain maximum benefit from their hardware platform and to meet aggressive time-to-market goals.

Breaking New Ground: The Evolution of Linux Clustering
With a platform comprising a complete Linux distribution, enhanced for clustering, and tailored for HPC, Penguin Computing¿s Scyld Software provides the building blocks for organizations from enterprises to workgroups to deploy, manage, and maintain Linux clusters, regardless of their size.

Data Monitoring with NightStar LX
Unlike ordinary debuggers, NightStar LX doesn¿t leave you stranded in the dark. It¿s more than just a debugger, it¿s a whole suite of integrated diagnostic tools designed for time-critical Linux applications to reduce test time, increase productivity and lower costs. You can debug, monitor, analyze and tune with minimal intrusion, so you see real execution behavior. And that¿s positively illuminating.

Virtualizing Service Provider Networks with Vyatta
This paper highlights Vyatta's unique ability to virtualize networking functions using Vyatta's secure routing software in service provider environments.

High Availability Messaging Solution Using AXIGEN, Heartbeat and DRBD
This white paper discusses a high-availability messaging solution relying on the AXIGEN Mail Server, Heartbeat and DRBD. Solution architecture and implementation, as well as benefits of using AXIGEN for this setup are all presented in detail.

Understanding the Financial Benefits of Open Source
Will open source pay off? Open source is becoming standard within enterprises, often because of cost savings. Find out how much of a financial impact it can have on your organization. Get this methodology and calculator now, compliments of JBoss.

Embedded Hardware and OS Technology Empower PC-Based Platforms
The modern embedded computer is the jack of all trades appearing in many forms.

Data Management for Real-Time Distributed Systems
This paper provides an overview of the network-centric computing model, data distribution services, and distributed data management. It then describes how the SkyBoard integration and synchronization service, coupled with an implementation of the OMG¿s Data Distribution Service (DDS) standard, can be used to create an efficient data distribution, storage, and retrieval system.

7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

 


Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
Click here for a profile of each sponsor:
PLATINUM SPONSORS
(Become a sponsor)

ADVERTISEMENT
(Advertise here)

Check out the latest Linux powered...

Mobile phones!

MIDs, UMPCs
& tablets

Mobile devices

Other cool
gadgets



BREAKING NEWS

• Nokia alludes to Linux phone plans
• Linux music phone boasts stereo mini-jack
• Nokia spins open Linux home automation gateway
• Qseven module sports Atom
• Open source phone gains "fat" distro
• Android phone launches in China
• Microsoft to bake own phone?
• Tux cashes in on paint
• Linux ported to iPhone
• Blockbuster box runs Linux
• Palm Pilot VM for Linux updated
• Linux-focused smartphone vendor announces cuts
• GCC for kernel hackers
• Linux powers free-to-air (FTA) satellite box
• Linux services firm hires MontaVista exec


Most popular stories -- past 90 days:
• Open source phone goes mass-market
• Tinest Linux system, yet?
• Garmin Nav devices run Gnome Linux
• ARM9 board boots Debian in 0.69 seconds
• Low-cost laptop runs Linpus Linux
• Linux-friendly Beagle fetches $150
• Mini Linux PC breaks $100 barrier
• Open source camera records geotagged video to SATA HDD
• Open set-top box ships
• First $100 laptop runs Linux


DesktopLinux headlines:
• Linux gains FPS
• Ubuntu tops on desktop, but not servers
• Browser swallows OS
• Tutorial reveals desktop Linux security tips
• Fedora 10 first impressions
• 8,000 TuxTop models and counting
• "Moonlight" ready to shine
• Adobe unleashes 64-bit Flash
• Debian Lenny installer arrives
• Ubuntu announces ARM port


Also visit our sister site:


Sign up for LinuxDevices.com's...

news feed

Home  |  News  |  Articles  |  Polls  |  Forum  |  About  |  Contact
 

Ziff Davis Enterprise Home | Contact Us | Advertise | Link to Us | Reprints | Magazine Subscriptions | Newsletters
Tech RSS Feeds | White Papers | ROI Calculators | Tech Podcasts | Tech Video | VARs | Channel News

Baseline | Careers | Channel Insider | CIO Insight | DesktopLinux | DeviceForge | DevSource | eSeminars |
eWEEK | Enterprise Network Security | LinuxDevices | Linux Watch | Microsoft Watch | Mid-market | Networking | PDF Zone |
Publish | Security IT Hub | Strategic Partner | Web Buyer's Guide | Windows for Devices

Developer Shed | Dev Shed | ASP Free | Dev Articles | Dev Hardware | SEO Chat | Tutorialized | Scripts |
Code Walkers | Web Hosters | Dev Mechanic | Dev Archives | igrep

Use of this site is governed by our Terms of Service and Privacy Policy. Except where otherwise specified, the contents of this site are copyright © 1999-2008 Ziff Davis Enterprise Holdings Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Enterprise is prohibited. Linux is a registered trademark of Linus Torvalds. All other marks are the property of their respective owners.