Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Forum

  Home arrow Linux For Devices Articles arrow ELJonline:
Building a Minimal Glibc with Componentization


ELJonline:
Building a Minimal Glibc with Componentization

By Linux Devices

Rate This Article: Add This Article To:

Use a stripped-down C library to save space or budget for the size of glibc for compatibility? Now there's a third option: build a custom library from the original sources for the best of both.



Glibc componentization is a process to build a custom minimal set of the glibc C libraries, using only the necessary objects required by a specific executable or group of executables. By minimizing the footprint of the libraries, resource-limited embedded targets can maximize resources available for applications and storage. This article discusses the feasibility of componentizing glibc as well as the development of some custom analysis tools. With the help of these tools it was possible to build test executables successfully, each with a custom minimal version of libc.

Embedded systems typically have tighter resource constraints than desktop computers or servers, although they often are expected to perform similar functions such as serving web pages and storing important information. Therefore, the applications they run use much of the same functionality from the system libraries as their desktop and server counterparts. With a reduced expectation of expandability, it is logical to provide a minimal subset of the same libraries.

Independent embedded versions of the system libraries do exist. While these libraries greatly reduce the footprint, they sacrifice functionality (such as pthreads), do not guarantee complete API compatibility with a complete glibc and must be maintained separately.

There are several advantages to building a minimal library from the source of the complete library. The primary advantage is a guaranteed equivalent API. Because there is only one source tree to maintain, whenever glibc is updated so are all the minimal libraries built from glibc. For example, developers don't have to concern themselves with whether or not the embedded library's printf function supports the %f parameter. This enables developers to design applications on a desktop system, with all the amenities they have to offer, and deploy them to an embedded target without concerning themselves with API compatibility. The difficulty of this approach involves how to create a minimal library from such a large source tree without over-complicating the source code. This study investigates the possibility of building a custom libc.so from only the necessary prebuilt object files of a complete glibc build.

When glibc is linked as the final step of the build processes, the various objects (1,756 total) satisfy undefined symbols among themselves. Glibc contains nearly 250,000 implicit dependencies among its various objects. With this many dependencies, manually selecting which objects to include would be tedious at best and impossible at worst. To make this task manageable, a MySQL database containing all the object dependencies for all of glibc was implemented. A detailed description of the library analysis tool can be found in the Sidebar ``Library Analysis Tool". With this tool, a list of all the objects needed to build a custom library can be generated based on the required symbols of a given application set. From the output of this tool, three test executables were successfully built, each with a custom minimal version of glibc. These custom libraries are considerably smaller than the complete versions, as small as 19% of the original size for the simplest case.

Library Analysis Tool

Building Glibc

The first step was to build glibc, understand its build process and note the size of each of its libraries. This analysis was performed on a clean build of a recent version (2.1.3) with the crypt and linuxthreads add-ons. The glibc library set consists of 21 libraries and the linker (ld.so); Table 1 lists all of them and their respective sizes. It should be noted that glibc builds 21 libraries, and of these 21, libc is the largest, accounting for nearly 50% of the total size. For this reason, this research is focused on componentizing libc.so, with the reasoning that the other 20 libraries are already sufficiently modular.

Table 1. Original Libraries and Sizes


By default, glibc builds three versions of its libraries: static, shared and profiled. Only the process of building the shared libraries is relevant to this study. This process consists of five steps:

  1. All the object files (.os) are built with the -fPIC flag to gcc, creating position-independent code.
  2. For each directory, a listing of every object from that directory to be linked into libc is created in a stamp.os file.
  3. An archive, libc_pic.a, is created from these lists using ar.
  4. This archive is made relocatable with the -r flag to gcc.
  5. The relocatable archive is linked into a shared library, libc.so.
Preparing an Application

Prior to building a custom shared library, it is necessary to determine which objects from libc.so will be needed for the target application(s). This is done by compiling and linking the application(s) to the newly built glibc, not the system glibc, and then adding that application to the database managed by the analysis tool. In order to avoid the need to install the newly built glibc, the correct options must be passed at compile time to link against the new library set.

The sample application, test_printf.c, follows:


#include "stdio.h"
int main() {
int i;
for (i = 0; i < 10; i++) {
printf("iteration: %02d\n", i);
}
return 0;
}

It is compiled with the commands shown in Listing 1. Note that the system startup files and default libraries are omitted with the -nostdlib and -nostartfiles options. They are replaced with the startup files from the new glibc build (crt1.o, crti.o, crtn.o, etc.), and the newly built libraries are explicitly specified.

Listing 1. Compiling test_printf.c


This application must be executed with the new loader as well (or it will not find the right libraries). The command in Listing 2 specifies the new loader and library path and executes the application. It can be verified that the appropriate libraries are loaded by prepending strace to the previous command and examining the output (the lines starting with open are of interest).

Listing 2. Specifying the New Loader and Library Path and Executing the Application

The program is then added to the database with the addApplication.pl script:


./addApplication ../projects/testcases/test_printf

Building a Minimal libc.so

A minimal libc.so can be built based on any set of applications in the database. The following example will use a single application (test_printf from above) as the source for required objects. The process, outlined below, consists of the following five steps:

  1. Generate a list of required object files, libc_objects.master.
  2. Generate a customized set of libc_objects files.
  3. Create an archive, libc_pic.a, from these lists using ar.
  4. Make the archive relocatable with the -r flag to gcc.
  5. Link the relocatable archive into a shared library, libc.so.
This process should be executed in the minilib directory, containing only the Makefile and associated scripts. The Makefile variable GLIBCPATH has to be updated to the path where glibc was built; the rest of the process is automated with the make command. The library analysis tool provides a list of the object files that provides the symbols explicitly required by an application, as well as the implicitly required objects. This list, libc_objects.master, is generated by the getAppDeps.pl script and should be copied to the minilib directory.

Running make first executes the script getstamps, which descends into the glibc source directory and recursively copies every stamps.os file to an equivalent tree within the current directory. These stamps.os files are formatted to list one object per line and are then sorted alphabetically. The newly formatted stamp.os files are then joined with libc_objects.master to create an intersection of the two files, effectively removing any unnecessary objects from the list. The full path is appended to the objects in the list, and the result is stored in libc_objects (one per directory). With all the libc_objects files in place, the custom library is ready to be linked.

The various commands needed to link the final shared library were taken from the glibc make process and modified to account for the new build location and object-list filenames (libc_objects). Linking is done in three steps. First, ar is used to link all the objects listed in the libc_objects files into one archive with the command in Listing 3.

Listing 3. Linking the Objects in libc_objects into One Archive

Second, the archive is made relocatable:


gcc -nostdlib -nostartfiles -r -o libc_pic.os -Wl,-d -Wl,--whole-archive libc_pic.a

The -r option here generates relocatable code in the output file, libc_pic.os; -nostdlib and -nostartfiles prevent gcc from linking in the standard system libraries and startup files; --whole-archive instructs gcc to include everything from the archives listed after --whole-archive and before --no-whole-archive, and not just the symbols explicitly required by the other objects scheduled for link.

Finally, the shared library is created, as shown in Listing 4.

Listing 4. The Shared Library

The linker option, --version-script, acts as a filter for exported symbols, providing complete control over which symbols are exported. Even if a symbol exists in the objects and archives linked into the library, they will not be exported by the final shared library unless they are listed in the version-script, libc.map. The -e option forces __libc_main as the library's entry point. The -u option forces the symbol __register_frame to be undefined, forcing a link with libgcc.a, which provides this symbol. And then -rpath-link specifies the first set of directories to search for share libraries specified on the command line, such as ld.so. It should be noted that as these commands were taken from the partially automatically generated commands from the glibc build process, it is likely that there are some unnecessary paths and even unnecessary options listed.

The resulting library is placed in the top-level directory as libc.so, a nonstripped shared library.

When linking the application it is possible that the libc_objects.master list is not complete, and undefined symbol errors are the result. These symbols must be tracked down (using the findsymbol script), and their providing objects should be appended to the libc_objects.master list. Running make clean and make will attempt to rebuild the shared library with the updated object list. In its current state, the library analysis tool provides information assuming that a custom version of every library will be built. Since only libc.so is being rebuilt in this example, if the application requires pthreads, the complete libpthread.so library will be used. If it requires something of libc.so that the application does not, it must be added manually. There are generally one or two objects that must be added to the list. This manual step should be eliminated with future versions of the analysis tool.

Testing the Minimal Library

To test the custom library, the application for which it was built must be relinked, using the new library. The new libc.so must be copied into the glibc source tree, replacing the old one. Running make again recompiles the test application, linking to the new minimal library. This analysis tested three test applications, each with unique requirements of libc.so (see Table 2).

Table 2. Test Cases and Minimal Library Statistics


Conclusion

Glibc componentization offers the most customizable libraries, while requiring very little from the developer. The advantages for componentization include rapid development, API consistency and by using the stock glibc source tree, zero maintenance due to a forked tree. Target devices that are resource limited, but that will be used for varying tasks (such as PDAs), should consider other options such as glibc profiling. A profiled version of glibc could be built so that frequently accessed functions are grouped together in pages. Devices not so restricted as to resources may find the best solution simply is to use the complete library. This approach allows for future development of new and more functional applications, without the need to redeploy the system libraries as well. Componentization finds its application in very specialized devices where resources are at a premium, and the applications it must run are fixed and known prior to deployment.

This process defines dependencies at the object level; it does not offer as high a level of granularity as a system based on symbols could, but it is relatively simple and in no way modifies the glibc source tree. The library could be reduced further by implementing simplified versions of some of the larger components, but this too would require modifying the source code. The test cases show that glibc can be componentized with reasonable granularity at the object level, and although not as fine as at the symbol level, this process is far easier and requires less effort from all parties involved. The process discussed can be used to implement any standards-compliant library proposed by third parties as well as to create completely customized minimal libraries for a specific application set when no standard is appropriate.

Glossary

Resources



About the author: Darren Hart is a 24-year-old senior in Brigham Young University's undergraduate Computer Engineering program. His fields of interest and study include embedded systems and embedded application development as well as operating systems--Linux in particular. He has done three consecutive co-ops with IBM, most recently with the Linux Technology Center where he researched glibc componentization.



Copyright © 2001 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com.




Building-a-Minimal-Glibc-with-Componentization/ class=sidenav>Discuss ELJonline:
Building a Minimal Glibc with Componentization
 
>>> Building-a-Minimal-Glibc-with-Componentization/ class=sidenav>Be the FIRST to comment on this article!
 
 
 
>>> More Linux For Devices Articles Articles          >>> More By Linux Devices
 



FUEL Database on MontaVista Linux
Whether building a mobile handset, a car navigation system, a package tracking device, or a home entertainment console, developers need capable software systems, including an operating system, development tools, and supporting libraries, to gain maximum benefit from their hardware platform and to meet aggressive time-to-market goals.

Breaking New Ground: The Evolution of Linux Clustering
With a platform comprising a complete Linux distribution, enhanced for clustering, and tailored for HPC, Penguin Computing¿s Scyld Software provides the building blocks for organizations from enterprises to workgroups to deploy, manage, and maintain Linux clusters, regardless of their size.

Data Monitoring with NightStar LX
Unlike ordinary debuggers, NightStar LX doesn¿t leave you stranded in the dark. It¿s more than just a debugger, it¿s a whole suite of integrated diagnostic tools designed for time-critical Linux applications to reduce test time, increase productivity and lower costs. You can debug, monitor, analyze and tune with minimal intrusion, so you see real execution behavior. And that¿s positively illuminating.

Virtualizing Service Provider Networks with Vyatta
This paper highlights Vyatta's unique ability to virtualize networking functions using Vyatta's secure routing software in service provider environments.

High Availability Messaging Solution Using AXIGEN, Heartbeat and DRBD
This white paper discusses a high-availability messaging solution relying on the AXIGEN Mail Server, Heartbeat and DRBD. Solution architecture and implementation, as well as benefits of using AXIGEN for this setup are all presented in detail.

Understanding the Financial Benefits of Open Source
Will open source pay off? Open source is becoming standard within enterprises, often because of cost savings. Find out how much of a financial impact it can have on your organization. Get this methodology and calculator now, compliments of JBoss.

Embedded Hardware and OS Technology Empower PC-Based Platforms
The modern embedded computer is the jack of all trades appearing in many forms.

Data Management for Real-Time Distributed Systems
This paper provides an overview of the network-centric computing model, data distribution services, and distributed data management. It then describes how the SkyBoard integration and synchronization service, coupled with an implementation of the OMG¿s Data Distribution Service (DDS) standard, can be used to create an efficient data distribution, storage, and retrieval system.

7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
PLATINUM SPONSORS

 


ADVERTISEMENT


Check out the latest Linux powered...

Mobile phones!

MIDs, UMPCs
& tablets

Mobile devices

Other cool
gadgets

Resource Library

• Unix, Linux Uptime and Reliability Increase: Patch Management Woes Plague Windows Yankee Group survey finds IBM AIX Unix is highest in ...
• Scalable, Fault-Tolerant NAS for Oracle - The Next Generation For several years NAS has been evolving as a storage ...
• Managing Software Intellectual Property in an Open Source World This whitepaper draws on the experiences of the Black Duck ...
• Open Source Security Myths Dispelled Is it risky to trust mission-critical infrastructure to open source ...
• Bringing IT Operations Management to Open Source & Beyond Download this IDC analyst report to learn how open source ...




Most popular stories -- past 90 days:
· Linux boots in 2.97 seconds
· Tiniest Linux system, yet?
· Linux powers "cloud" gaming console
· Report: T-Mobile sells out first 1.5 million G1s
· Open set-top box ships
· E17 adapted to Linux devices, demo'd on Treo650
· Android debuts
· First ALP Linux smartphone?
· Cortex-A8 gaming handheld runs Linux
· Ubuntu announces ARM port


DesktopLinux headlines:
· Simulator runs Android apps on Ubuntu
· Hypervisor rev'd for higher reliability
· Pluggable NAS now supports Linux desktops
· Moblin v2 beta targets netbooks
· Linux-ready netbook touted as "Student rugged"
· USB display technology heading for Linux
· Ubuntu One takes baby step to the cloud
· Game over for Linux netbooks?
· Linux Foundation relaunches Linux web site
· Dell spins lower-cost netbook


Also visit our sister site:


Sign up for LinuxForDevices.com's...

news feed


Or, follow us on Twitter...