Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Forum

  Home arrow Linux For Devices Articles arrow An update on RTLinux

An update on RTLinux
By Linux Devices

Rate This Article: Add This Article To:

Foreword: The RTLinux dual-kernel operating system was first introduced back in 1995. Today, RTLinux is well known worldwide as a means to gain "hard real-time" performance from a Linux-based system environment. In this article, Victor Yodaiken, Michael Barabanov, and Cort Dougan -- three key figures in the creation, evolution, and maintenance of RTLinux -- summarize the state of RTLinux...

eight years later. Topics discussed include operating system modularity, real-time, communications, fault-tolerance, and security. (Note that the new term "RTCore" is now used to refer to the small real-time kernel that forms an integral part of the RTLinux operating system.)



Update on the RTLinux/RTCore dual kernel real-time operating system

by Victor Yodaiken, Michael Barabanov, Cort Dougan

Introduction

RTCore is a POSIX 1003.13 PE51 type real-time kernel, something that looks like a multithreaded POSIX process with its own internal scheduler. RTCore can run a secondary operating system as a thread, using a small virtual machine to keep the secondary system from disabling interrupts. This is a peculiar model: a UNIX process with a UNIX operating system as a thread, but it provides a useful avenue to modularity. RTLinux is RTCore with Linux as the secondary kernel. RTCore BSD is, as one might guess, RTCore with BSD UNIX as the secondary kernel. Real-time applications run as real-time threads and signal handlers either within the address space of RTCore or within the address spaces of processes belonging to the secondary kernel. Real-time threads are scheduled by the RTCore scheduler without reference to the process scheduler in the secondary operating system. The secondary operating system is the idle thread for the real-time system.



The virtual machine virtualizes the interrupt controller so the secondary kernel can preserve internal synchronization without interfering with real-time processing. Performance is adequate to allow standard PC and single board computers to replace DSPs in many applications. A one millisecond periodic thread running on a 1.2GHz AMD K7 PC shows worst case scheduling jitter of no more than 12 microseconds when the secondary kernel is under heavy load. The same example for a Compaq iPAQ PDA based on a 200MHz StrongArm shows worst case jitter of no more than 32 microseconds.

Modularity and architecture

The original intent of the RTCore split kernel design was to facilitate development of complex real-time applications needing both precise timing and use of services that are normally only found in a sophisticated and timing-imprecise operating systems. However, a kernel that can monitor and control the operation of a secondary operating system has a wide range of applications including networking, fault tolerance, and security. In order to preserve the benefits of the original design, we have had to emphasize a design rule that any activity that can go in the secondary system, must go in the secondary system. This design rule also appears to work well for recent extensions of the RTCore kernel to handle fault tolerance and security.

Experience with the RTCore kernels shows that there is an alternative to the traditional view of modularity in operating system design (see note 1). RTCore separates components algorithmically as well as by standard functional group. Low latency components can go in the real-time system, while higher latency components are left to the secondary operating system and its processes. An interrupt service handler may be implemented in either system, depending on the purpose of the device -- and that purpose may change dynamically. A disk controller interrupt handler will usually be a component of the secondary operating system, but an A/D device handler will be in the real-time system. On the other hand, a network device may initially be under control of the secondary kernel, but may switch to real-time control. For example, we may let the secondary OS bring the system up, and then switch the network to real-time mode after it completes making connections. Different types of handlers have different implementation constraints even though they are both required to be "fast". For example, even when called from interrupt context, the RTCore semaphore post operation forces an immediate context switch if it wakes a thread with higher priority than the current thread. The result is that if thread A is running and an interrupt causes higher priority thread B to be activated by a semaphore post operation, B will switch in before the interrupt handler returns. The purpose is to minimize the wake-up latency of higher priority threads, but this extremism in defense of low latency is probably out of place for a general purpose operating system.

The real-time kernel remains quite clearly separated from the secondary kernel, but there are overlapping or parallel functional systems. As an example, RTCore implements a small in-memory file system for pipes, shared memory, device I/O and networking. The real-time file system is designed with extensive use of simple lock-free algorithms (see note 2) to reduce interrupt disable periods to minimum. Interaction with the file system of the secondary operating system is needed to allow non-real-time processes to read and write on fifos connecting to real-time threads and the use of non-real-time file systems. The interaction involves a asymmetric communication in which the real-time file system passes commands to the non-real-time file system and receives queued responses back. When real-time code creates a fifo, it can set a permission bit to request that the other end of the fifo be visible to processes running under the secondary operating system. In this case, the secondary operating system is requested to create the fifo inode in its persistent file system. When a process under the secondary operating system writes data to a shared fifo, the operation has standard POSIX semantics, but but is preemptible by the real-time side except during a short interval when buffer space is reserved. When real-time software executes a write command, the operation is non-blocking, non-preemptible by the secondary system, but preemptible on the real-time side.

Communication and failover

Nearly every real-time operating system contains or supports some sort of network stack. Since our secondary kernels provide excellent network stacks, we use them for non-real-time operations (see note 3) and add a simple zero-copy networking interface in the real-time system. When network devices are shared with the secondary operating system, the splitting method is used to provide the secondary operating system with a dummy network driver so that non-real-time packets can be passed through its network stacks.

As an example of how the split packet driver can be used, consider a fast router that runs under the real-time kernel and routes packets based on a small table in shared memory. The real-time driver picks up packets, looks for a match in the table, and forwards them back to the network (perhaps to a second device) on a match. When the match fails, the network driver drops the packet down to the secondary operating system stack and through that to a non-real-time routing program. That program may even consult a database system to determine what should be done with the packet and may then update the shared table (the methods of Degermark et al are important here -- see note 4).



A fast failover method is also easily implemented in this design. Set a periodic thread to generate a "keep alive" packet, say, every 5 milliseconds and to monitor some set of neighbors for such packets on the same schedule. A failure of a packet to arrive should trigger an alarm and cause a probe of the possibly failed site and perhaps a recovery action. Again, the complex components of the recovery can be delegated to processes running under the secondary operating system.

We are currently working on some applications to shaping of TCP/IP connections. The real-time kernel can periodically monitor the queues of network packets and the delays at TCP ports and can disable or discard packets belonging to less critical connections. While TCP itself does not provide for prioritization of traffic, the real-time network can transparently impose a prioritizing protocol on top of the TCP implementation of the secondary operating system, watch for timeouts, and even detect certain DOS type attacks.

As a final networking example, we anticipate being able to take advantage of real-time networking to efficiently use communications on clusters. Anecdotal experience indicates that sufficiently dense clusters suffer from sporadic significant delays when too many sites try to transmit at the same time. Precise timing control allows time domain multiplexing of bandwidth over communications media like Ethernet and even for dynamic adjustments to bandwidth allocations. It would be interesting to determine whether there are time allocation working sets on clusters.

Process Model and Memory Protection

The RTCore process model was originally single process. That is, the RTCore kernel appeared as a single process on a naked machine with applications as threads and signal handlers. The RTCore kernel and its applications all existed within supervisor memory so that we could avoid the costs of memory context switches on system calls and so that all of device space and the data structures of the secondary kernel were available directly. The system extended naturally to SMP systems by considering each processor to have a single RTCore process. While hardware speeds have increased, lowering the penalty of memory context switches, there is never any extra timing precision because additional timing precision makes new applications possible.

Two years ago, we were forced by application requirements to break from the original single shared address space to provide protected memory for some threads. Our original view was that the types of failures caught by memory protection hardware would invariably be fatal to a real-time application. If an application fails to stop a 20 ton hydraulic press at the right moment because of a stray pointer, the ability of the operating system to continue execution will not be comforting. However, as is usual with OS designers, we did not understand anything like the full range of applications. Customers using the system to do machine-in-the-loop simulation and manufacturing test reported that they used large, complex, and untrusted simulation codes from multiple suppliers and needed to be able to handle programming errors in these components. If the power supply simulation of a jet engine fails, it is still important to be able to gracefully turn off the fuel pump that is being used as part of the test. Memory protection was a first step towards such capability, but the need to keep latencies low and keep the real-time kernel simple was a complicating factor. The solution was to shift to a multi-process model in which we permit the creation of real-time threads within the memory space of processes belonging to the secondary operating system. These threads operate under the control of a real-time scheduler, but exist within the address space of a host process. When the real-time scheduler runs such a thread, it restores the host process memory map. Host processes must run on locked memory since paging is not compatible with execution of a real-time thread, but the host process can, for example, use the secondary OS I/O facilities to log data produced by its real-time threads.

Reference that memory protection and a multi-process model does not overcome the essential "single application" nature of a hard real-time operating system. Time is a shared critical resource and every real-time component must be able to affect the timing of every other one: otherwise we cannot say that the first component has any timing guarantees. Use of watchdogs and resource limits can mitigate the effects, but destroying or sidelining a thread means that its timing constraints have been violated. One of the utilities of the RTCore design is that there is a clear distinction between timing critical components and components that are not and the system makes sure that components of the second type cannot delay components of the first type. This decoupling facilitates the construction of sophisticated applications where back end processing is deferred to the non-real-time system.

Security

As a final note, security is becoming an issue for real-time systems in industrial and process control, communications, and medical instruments. The same properties that make RTCore useful for real-time, make it a valuable security kernel. A small real-time kernel that can even be placed in physically protected memory can be used to monitor the security of the secondary system, for example, by validating encrypted checksums on critical data structures. A watchdog implementation can then validate the integrity of the secondary kernel periodically and generate an encrypted packet for an external monitor. The external monitor may be anything from a computer on a local network or dedicated line to a special device on the local I/O bus, but it must be able to check for arrival of valid packets within specified times. Compromising such a watchdog would require compromising both the security of the secondary kernel and the watchdog in the real-time kernel and doing so without violating the externally visible timing constraints. Other variations of this type of monitoring are possible as well.



Notes . . .
  1. That is the system as a collection of functions or "services". But also note: "we have defined the whole system as a society of sequential processes, progressing at undefined speed ratios" (see note 5)
  2. Following the approach of Massalin (see note 6)
  3. For our purposes, UNIX and other secondary kernels are re-usable modules.
  4. Mikael Degermark, Andrej Brodnik, Svante Carlsson, and Stephen Pink: "Small forwarding tables for fast routing lookups," in SIGCOMM, pages 3-14, 1997.
  5. Edsger W. Dijkstra: "The structure of the 'THE'-multiprogramming system," Comm. ACM, 11(5):341-346, 1968.
  6. Henry Massalin and Calton Pu: "A lock-free multiprocessor OS kernel," Technical Report CUCS-005-91, Columbia University, 1991.



About the authors:
  • Victor Yodaiken, CEO and Co-Founder of FSMLabs, came up with the basic RTLinux technology. Yodaiken began his career in 1983 as one of the chief developers of Auragen's distributed fault-tolerant UNIX and he had an active consulting business before starting FSMLabs. He has also worked in academia, as a professor and department chair at New Mexico Tech, and as a research professor and port-doctoral fellow at the University of Massachussetts in Amherst. Currently he is an adjunct faculty member at the University of New Mexico. Yodaiken is a technical advisor to EMBLIX Japan and is on the board of the Embedded Linux Consortium.

  • Michael Barabanov, Principal Engineer at FSMLabs, was the original implementer of RTLinux as part of a masters project between 1995 and 1997. Barabanov rejoined the project in 1998 and since then he has been working on basic system design and instrumental in driving performance and architectural extensions.

  • Cort Dougan, Director of Engineering and Co-Founder of FSMLabs, began working with Linux in 1995 as one of the primary authors of Linux/PowerPC and of an influential paper on optimizing OS performance on that architecture. Dougan was the maintainer for the Linux PowerPC tree for several years and he was a technology consultant for a number of software and hardware companies worldwide prior to starting FSMLabs.



Related stories:

Talk back! Do you have a comment or question on this story? talkback here




Discuss An update on RTLinux
 
>>> Be the FIRST to comment on this article!
 
 
 
>>> More Linux For Devices Articles Articles          >>> More By Linux Devices
 



FUEL Database on MontaVista Linux
Whether building a mobile handset, a car navigation system, a package tracking device, or a home entertainment console, developers need capable software systems, including an operating system, development tools, and supporting libraries, to gain maximum benefit from their hardware platform and to meet aggressive time-to-market goals.

Breaking New Ground: The Evolution of Linux Clustering
With a platform comprising a complete Linux distribution, enhanced for clustering, and tailored for HPC, Penguin Computing¿s Scyld Software provides the building blocks for organizations from enterprises to workgroups to deploy, manage, and maintain Linux clusters, regardless of their size.

Data Monitoring with NightStar LX
Unlike ordinary debuggers, NightStar LX doesn¿t leave you stranded in the dark. It¿s more than just a debugger, it¿s a whole suite of integrated diagnostic tools designed for time-critical Linux applications to reduce test time, increase productivity and lower costs. You can debug, monitor, analyze and tune with minimal intrusion, so you see real execution behavior. And that¿s positively illuminating.

Virtualizing Service Provider Networks with Vyatta
This paper highlights Vyatta's unique ability to virtualize networking functions using Vyatta's secure routing software in service provider environments.

High Availability Messaging Solution Using AXIGEN, Heartbeat and DRBD
This white paper discusses a high-availability messaging solution relying on the AXIGEN Mail Server, Heartbeat and DRBD. Solution architecture and implementation, as well as benefits of using AXIGEN for this setup are all presented in detail.

Understanding the Financial Benefits of Open Source
Will open source pay off? Open source is becoming standard within enterprises, often because of cost savings. Find out how much of a financial impact it can have on your organization. Get this methodology and calculator now, compliments of JBoss.

Embedded Hardware and OS Technology Empower PC-Based Platforms
The modern embedded computer is the jack of all trades appearing in many forms.

Data Management for Real-Time Distributed Systems
This paper provides an overview of the network-centric computing model, data distribution services, and distributed data management. It then describes how the SkyBoard integration and synchronization service, coupled with an implementation of the OMG¿s Data Distribution Service (DDS) standard, can be used to create an efficient data distribution, storage, and retrieval system.

7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
PLATINUM SPONSORS

 
 

 
 

 
 

GOLD SPONSORS


(Become a sponsor)

(Become a sponsor)

ADVERTISEMENT
(Advertise here)

Check out the latest Linux powered...

Mobile phones!

MIDs, UMPCs
& tablets

Mobile devices

Other cool
gadgets

Resource Library

• Unix, Linux Uptime and Reliability Increase: Patch Management Woes Plague Windows Yankee Group survey finds IBM AIX Unix is highest in ...
• Scalable, Fault-Tolerant NAS for Oracle - The Next Generation For several years NAS has been evolving as a storage ...
• Managing Software Intellectual Property in an Open Source World This whitepaper draws on the experiences of the Black Duck ...
• Open Source Security Myths Dispelled Is it risky to trust mission-critical infrastructure to open source ...
• Bringing IT Operations Management to Open Source & Beyond Download this IDC analyst report to learn how open source ...


BREAKING NEWS

• NAS system houses 2.5-inch drives for up to 6TB
• Atom SBC boasts special low-power mode
• Android leaps to rugged handheld, and more phones
• Simulator runs Android apps on Ubuntu
• Fanless industrial PC taps Atom
• Router platform runs OpenWRT Linux
• Feature-packed UMPC survives four-foot drops
• UMPC pioneer gives up the ghost
• Biodegradable, solar-powered netbook runs Linux
• Hypervisor rev'd for higher reliability
• Eurotech spins Atom development kits
• Home media server to demo on Intel Atom platform
• Atom boards feature fanless DC operation
• Low-cost pluggable NAS adds Linux support
• Taiwan open source conference sets agenda


Most popular stories -- past 90 days:
• Linux boots in 2.97 seconds
• Tiniest Linux system, yet?
• Linux powers "cloud" gaming console
• Report: T-Mobile sells out first 1.5 million G1s
• Open set-top box ships
• E17 adapted to Linux devices, demo'd on Treo650
• Android debuts
• First ALP Linux smartphone?
• Cortex-A8 gaming handheld runs Linux
• Ubuntu announces ARM port


DesktopLinux headlines:
• Simulator runs Android apps on Ubuntu
• Hypervisor rev'd for higher reliability
• Pluggable NAS now supports Linux desktops
• Moblin v2 beta targets netbooks
• Linux-ready netbook touted as "Student rugged"
• USB display technology heading for Linux
• Ubuntu One takes baby step to the cloud
• Game over for Linux netbooks?
• Linux Foundation relaunches Linux web site
• Dell spins lower-cost netbook


Also visit our sister site:


Sign up for LinuxForDevices.com's...

news feed


Or, follow us on Twitter...