|
|
A comparison of hard real-time Linux alternatives
2004-11-19
[Updated Dec. 22, 2004] -- Foreword: This study compares the real-time capabilities of various Linux kernels. It was part of a project to upgrade the control software in water-wave generators at research institutions around the world. The results of the study were used by Akamina for the selection of a new RTOS for the control system upgrade of Canada's largest hydraulics and coastal...
engineering laboratory, the National Research Council Canadian Hydraulics Centre in Ottawa. This paper is a revised version of the original whitepaper released on November 19, 2004. This version includes updated data for the Linux 2.4 with LXRT configuration. The hard real-time LXRT task, that was originally used to collect the preemption latency data, included a printf() call after the call to make the task hard real-time. Subsequent to release of Linux 24.1.13 and RTAI 3.0x, this has the effect of forcing the task back to soft real-time where, in previous releases, it would not have. The original tests, that showed the LXRT results to be very similar to the Linux 2.4 alone, are what would be expected for LXRT tasks in soft real-time. Thanks go to Paulo Mantegazza, for identifying the circumstances that led to the performance problem. The author chose to re-run the LXRT tests without the print statement rather than taking the alternative approach of setting the LXRTmode to allow it. Consideration is being given to reverting back to a non-zero default for LXRTmode in Release 3.2. Introduction This paper provides an evaluation of a number of options for creating a hard real-time system based on Linux. A hard real-time system is a system that requires a guaranteed response to specific events within a defined time period. The failure of a hard real-time system to meet these requirements typically results in a severe failure of the system. This work was carried out as part of an analysis to determine the suitability of using Linux in a hard real-time digital control system. The control loops in this system are executed at a rate of 100 Hz. A fundamental requirement of the system is that all of the control calculations must be completed within the 10 ms window available -- regardless of any other loading on the system. In order to achieve this, it was felt that a maximum delay from the time the beginning of a control interval is signalled (the event) to the time that the control task is started should be less than 0.5 ms. There are a number of Linux configurations that are available for real-time. Some of these are based on Linux alone, some use Linux with a sub-kernel, and some use kernel patches to improve the real-time behaviour of Linux. This paper focuses on four of the possible configurations. The options selected are all freely available for download from official websites. In this evaluation, Linux and any sub-kernels used were treated as black-boxes. The approach followed was to configure Linux and program the system for the various options, and then measure results. No attempts were made to investigate the kernel specifics to determine why the configurations responded the way they did, nor were any attempts made to improve the performance using custom patches or custom configurations. Tests were carried out on a lightly loaded system and on a system under relatively heavy communications loading. Communications loading was used since it was easy to set up, it provided a way to exercise interface hardware and the device driver for the hardware, and it provided a realistic situation where hard real-time and communications messaging must be possible simultaneously. This evaluation was done on an Intel x86 processor. Some of the options that were tested may not be supported on all processor architectures. OS Options Evaluated The four Linux options evaluated include:
The GE Fanuc, VMIVME-7700 CPU board was selected for use in the control system. The board is a VMEbus single board computer based on the Intel Celeron CPU running at 650 MHz. The board is populated with 512 MB SDRAM and 128 MB CompactFlash. Other features of the board that were used in the evaluation of the real-time options include:
The video graphic controller and keyboard interface provided a convenient interface to monitor and control the target. The remote Ethernet booting capabilities were used to allow the kernel image to be downloaded to the target from a tftp server running on the host PC. The programmable timer provided a mechanism to obtain an indication of the latencies inherent in the various OS options considered. The driver for the timer was provided by GE Fanuc. Measurements An important measure of an operating system's ability to meet the real-time requirements for a control system is the length of time from the instant that the event marking the beginning of a control interval is generated to the time that the control task begins execution. We define this time as the preemption latency. As the preemption latency increases, the time available for the control calculations decreases. The preemption latency is shown in Figure 1. ![]() Figure 1 -- Preemption Latency The preemption latency is the sum of a number of shorter delays including: hardware response time, interrupt service time and context switch time. We chose to measure the preemption latency and the interrupt latency defined according to the more detailed Figure 2 shown below. ![]() Figure 2 -- Components of Latency where:
Time values were measured by reading the value of the timer counter. The difference between the initial counter value and the counter value at t1 or t2 indicate the number of clock ticks that have elapsed. Knowledge of the clock rate and the count provides all of the information required to measure time. The timer counter was set up as follows:
The counts recorded in the ISR and in the control task were read as early in each procedure as was possible. Although the measured counts include some time spent processing within the procedures, this time was short and the measurements were considered to be representative measurements of the actual interrupt latency and the preemption latency. Time series of preemption latency and interrupt latency were recorded for the OS options indicated above (2.4, 2.6, 2.4 with RTAI and 2.4 with LXRT). Measurements were made for both the loaded condition and the unloaded condition. A total of 1,500,000 samples, acquired at a rate of 100 samples per second, were acquired during each of the eight tests (four OS options; loaded and unloaded). The samples were collected and a basic data reduction analysis was completed as part of each test. The data reduction analysis consisted of counting the number of times the latency measurements fell into the range of a series of bins. For the preemption latency, a series of 2,500 bins were used with each bin spanning a latency of 2 µs. This allowed the recording of latency measurements of up to 5,000 µs. For the interrupt latency, a series of 2,000 bins were used, with each bin spanning a latency of 0.5 µs for a maximum latency measurement of 1,000 µs. The final latency bin counts were stored in files for further analysis. Test Set-up The components of the test set-up included the development PC, the target hardware and the network connection between the PC and the target. The development PC provided the following:
![]() Figure 3 -- Measurement Set-up (Click to enlarge) The set-up of the tests involved the following steps:
Process Description comms server and comms client: The comms server and comms client provide communications and processing loading on the target. The comms server process waits for a message to arrive from the comms client, copies the message into an output buffer and sends the output buffer to the client. The messages are blocks of 60,000 bytes of data. data acq: the data acquisition process collects latency information from the timer. The process waits for a signal to indicate the start of a control interval, reads the current counter value (to determine the preemption latency) and then requests the counter value stored by the driver interrupt handler. The two latency measurements are sent to the file I/O process. file I/O: the file I/O process waits for messages to arrive at a mailbox. The latency values that are contained within the message are used to update arrays that keep track of the number of times latencies have been measured in the range of each measurement bin. Once the specified number of samples have been collected, the final latency bin counts are written to output files and the process terminates. Detailed Process Models The high-level process model shown in Figure 3 does not provide sufficient information to adequately describe the processes that run on the target. In some cases, these processes run in user-space, and in other cases processes are run in kernel-space. A more detailed description of each of the cases is provided below. Linux 2.4 and Linux 2.6 The process models for Linux 2.4 and Linux 2.6 are identical. This model is shown in Figure 4, and the components in the model are described in Table 1. ![]() Figure 4 -- Linux 2.4 and Linux 2.6 Process Model (Click to enlarge)
Table 1 - Components of the Linux 2.4 and Linux 2.6 Model Linux 2.4 with RTAI For the Linux 2.4 with RTAI configuration, the data acquisition and file I/O process was split into a hard real-time process (data acquisition) and soft real-time process (file I/O). The data acquisition process was implemented as a kernel-space process using RTAI, and the file I/O process was implemented as a user-space process. The file I/O process was coded as a soft real-time process that included calls to the RTAI API. This functionality was made possible by LXRT. RTAI requires special driver modules if the driver is to be used by an RTAI task. In the case of this investigation, the VMIC timer driver was converted into an RTAI compatible driver. This conversion involved the following:
![]() Figure 5 -- Linux 2.4 with RTAI Process Model (Click to enlarge)
Table 2 - Components of the Linux 2.4 with RTAI Model Linux 2.4 with LXRT The process model for Linux 2.4 with LXRT is very similar to that of Linux 2.4 with RTAI since both use LXRT for the file I/O user-space process and both rely on the same RTAI timer driver. The essential differences between the models are: the data acquisition process has been pulled into user-space and a new proxy module has been added to kernel space. The proxy module extends the timer driver API provided by the RTAI module to user-space. Without this proxy module, the data acquisition process would not be able to call the RTAI timer driver API. The process model for Linux 2.4 with LXRT is shown in Figure 6 below. The figure shows that communication between the user-space data acquisition process and the kernel-space timer driver was through the Proxy API kernel module. We also tried using device file operations between the data acquisition process and the RTAI timer driver to set up the timer/counter and then using a semaphore to allow the data acquisition process to wait for a task switch. The latencies measured using the two different approaches were virtually identical. A description of the components in the model shown in Figure 6 are provided in Table 3. ![]() Figure 6 -- Linux 2.4 with LXRT Process Model (Click to enlarge)
Table 3 - Components of the Linux 2.4 with LXRT Model The data showed the expected results; the vast majority of latencies measured were confined to a small range. The range for the preemption latency was wider than the range for the interrupt latency -- also, as expected. For a hard real-time system it is important to understand the distribution of the maximum latencies. To understand the distribution of the maximum latencies, we chose to look at curves of the cumulative percentage measurements vs latency. At any point on the cumulative percentage curve, the cumulative percentage value (y-value) is the percentage of measurements that had a latency less than or equal to the latency value (x-value). The latency at which the cumulative percentage curve reaches 100 percent represents the worst-case latency measured. The ideal cumulative percentage curve is one that is steep with a minimal decrease in slope as the curve approaches 100 percent. The real-time operating system for a control system must be able to guarantee that the control calculations will be executed every control interval and that there must be sufficient time for the control calculations to complete before the control interval ends. For the system for which this analysis was completed, this means that all control calculations must start and end within a 10 ms window. Of the 10 ms available, the preemption latency between the start of the control interval (interrupt generated) to the start of the control calculations must be less than 0.5 ms (500 µs) or 5 percent of the control interval. Interrupt Latency The interrupt latency curves are shown in Figures 7, 8 and 9. Figure 7 shows the loaded and unloaded interrupt latencies for all of the OS options evaluated. In general, the interrupt latencies for the tests without communications loading are to the left of the graph and are quite steep. The notable exception is Linux 2.6 where the curve flattened considerably, ultimately having higher latencies than some of the tests with communications loading. ![]() Figure 7 -- Interrupt Latency Curves (Click to enlarge) Figure 8 shows the top 2 percent of the cumulative percentage curves for the unloaded tests and Figure 9 shows the top 2 percent for the loaded tests. Note that in Figures 8 and 9, the curves for Linux 2.4 with RTAI and for Linux 2.4 with LXRT are nearly identical. This is expected since the interrupt latencies were measured in the same RTAI timer driver module ISR. ![]() Figure 8 -- Interrupt Latency Without Loading (Click to enlarge) Under loading (Figure 9), Linux 2.6 continued to exhibit higher latencies than the other options. The other 3 curves (Linux 2.4, 2.4 with RTAI and 2.4 with LXRT) are all similar. Even under loading, the interrupt latency curves for Linux 2.4 with RTAI and Linux 2.4 with LXRT were nearly identical. This too is to be expected since the interrupts are processed in kernel-space and the same RTAI timer driver module was used for both OS options. The difference between RTAI and LXRT was only evident in the preemption latency curves. It is interesting to note that the interrupt latencies with loading for Linux 2.4 were generally lower than the other options. ![]() Figure 9 -- Interrupt Latency With Loading (Click to enlarge) The interrupt latency information is summarised in Table 4. The maximum latency as well as the 99.999 percent latency threshold are shown in the table for all test configurations. The 99.999 percent latency threshold is the latency at which 99.999 percent of the measurements had latencies less than or equal to the threshold. At a sample rate of 100 samples per second, we would expect one latency measurement to exceed this threshold every 17 minutes.
Table 4 - Interrupt Latencies Summary Preemption Latency The preemption latency curves are shown in Figures 10, 11 and 12. Figure 10 shows the loaded and unloaded preemption latencies for all OS options on the same graph. In general, the preemption latencies for the tests without communications loading are to the left of the chart and are quite steep. ![]() Figure 10 -- Preemption Latency Curves (Click to enlarge) Figure 11 shows the top 3 percent of the cumulative percentage curves for the unloaded tests. The unloaded test curves for Linux 2.4, Linux 2.4 with RTAI and Linux 2.4 with LXRT are all very similar in shape. In each of these 3 cases, over 99 percent of the measurements were confined to a 2 µs range for the configuration. For Linux 2.4 with RTAI the range was 8 to 10 µs; for Linux 2.4 with LXRT the range was 10 to 12 µs; and for Linux 2.4 the range was 12 to 14 µs. Although some of the preemption latencies measured for the unloaded condition with Linux 2.6 were higher, over 97 percent of the measurements were in the same 12 to 14 µs range observed for Linux 2.4. For the 3 percent of the measurements where Linux 2.6 showed higher preemption latencies, these latencies were often 2 to 3 times that of both Linux 2.4. ![]() Figure 11 -- Preemption Latency Without Loading (Click to enlarge) The difference between the interrupt latency and the preemption latency is largely the context switch time. The table below summarises the range of context switch times for the unloaded conditions for each of the 4 OS options. The table is based on the majority of the measurements (97 percent) where the latency times were within the range of single measurement bin (0.5 µs for the interrupt latency and 2 µs for the preemption latency). The data show that for more than 97 percent of the measurements, the context switch time for Linux 2.4 with RTAI is significantly faster than the other configurations.
Table 5 - Context Switch Times Without Loading (based on 97 percent of the data) Figure 12 shows the top 20 percent of the latency measurements for the loaded tests. Under loading, the results showed that Linux 2.4 with RTAI was the best of the OS options. This curve had the desired characteristics of being very steep without flattening too much as the curve approached 100 percent. The curve for Linux 2.4 with LRTX was very similar to that of Linux 2.4 with RTAI but the latencies were higher for the top 12 percent of the measurements. The curve for Linux 2.4 showed that the majority of the latencies (97 percent) were quite comparable to those of RTAI and LXRT but the curve then flattened out considerably and approached 100 percent much more slowly. The results for Linux 2.6 are interesting in that its performance was better than Linux 2.4 for latencies above 70 µs (top 1.3 percent of the measurements). ![]() Figure 12 -- Preemption Latency With Loading (Click to enlarge) Table 6 below summarises the preemption latency data collected. The maximum latency as well as the 99.999 percent latency threshold are shown in the table for all test configurations. The 99.999 percent latency threshold is the latency at which 99.999 percent of the measurements had latencies less than or equal to the threshold. At a sample rate of 100 samples per second, we would expect one latency measurement to exceed this threshold every 17 minutes.
Table 6 - Preemption Latencies Summary Overall The results show that among the options tested, the best performance was measured with RTAI, followed closely by LXRT. For these two options, the measured interrupt and preemption latencies for both loaded and unloaded conditions were more consistent and shorter than the other options. Both Linux 2.4 with RTAI and Linux 2.4 with LXRT meet the maximum preemption latency requirement of less than 500 µs. Linux 2.6 is the next most suitable option, followed by Linux 2.4. The data collected provided a single data series collected over 4 hours for each of the 8 cases. If these tests were repeated, or if data were collected for a longer period of time, the maximum latencies would likely be higher. It is the small percentage of measurements where the latencies are much higher that drive the choice of an OS for use in systems requiring hard real-time performance. Conclusions Based on the latency measurements made using the hardware described in this paper:
The following resources were used in the work described in this paper:
About the author: Peter Laurich founded Akamina Technologies Inc. in 2004 after starting the software outsourcing initiatives RTS Technologies and BA Technologies. He has been developing software for real-time embedded systems since 1982. Known for his technical leadership and innovation, he worked in a number of key development and management roles in Nortel Networks' Optical organization. Prior to moving to Nortel, Peter developed real-time software for instrumentation and control at the National Research Council's Hydraulics Laboratory. Peter has a Masters Degree in Electrical Engineering (Systems) from Carleton University and a Bachelors Degree in Systems Design Engineering from the University of Waterloo.Related Stories:
|