Library (computing)

This is an old revision of this page, as edited by CanisRufus (talk | contribs) at 07:54, 4 December 2004 (RedWolf - bypassing redirect: UNIX). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a library is a collection of subprograms used to develop software. Libraries are distinguished from executables in that they are not independent programs; rather, they are "helper" code that provides services to some other independent program.

Well-known libraries include:

Library linking describes the inclusion of one or more of these software libraries into a new program. There are multiple types of linking: static linking and dynamic linking. These are described below.

Static linking

Static linking is linking in which a library is embedded into the program executable at compile time by a linker. A linker is a separate utility which takes one or more libraries and object files (which are previously generated by a compiler or an assembler) and produces an actual executable file.

One of the biggest disadvantages of static linking is that each executable ends up containing its own copy of the library. When many statically linked programs using the same library are simultaneously executed on the same machine, a great deal of memory can be wasted, as each execution loads its own copy of the library's data into memory.

Examples of libraries which are traditionally designed to be statically linked include the ANSI C standard library and the ALIB assembler library. Static linked libraries predate Fortran; Fortran's I/O was designed to use a preexisting package of I/O routines.

Dynamic linking

Dynamic linking is linking in which a library is loaded by the operating system's loader separately from the executable file at loadtime or runtime. The result is called a dynamically linked library.

Most operating systems resolve external dependencies like libraries (called imports) as part of the loading process. For these systems, the executables contain a table called an import directory which is a variable-length array of imports. Each element in the array contains a name of a library. The loader searches the hard disk for the needed library, loads it into memory at an unpredictable ___location and updates the executable with the library's ___location in memory. The executable then uses this information to call functions and access data stored in the library. This type of dynamic linking is called loadtime linking and is used by most operating systems including Windows and Linux. Loadtime linking is one of the most complex routines the loader performs while loading an application.

Other operating systems resolve dependencies at runtime. For these systems, the executable calls an operating system API, passing it the name of a library file, a function number within the library and the function's parameters. The operating system resolves the import immediately and calls the appropriate function on behalf of the application. This type of dynamic linking is called runtime linking. Because of the overhead added to each call, runtime linking is incredibly slow and negatively affects an executable's performance. As a result, runtime linking is rarely used by modern operating systems.

In dynamic linking the library, commonly referred to as a dynamic link library (DLL) or shared library, is a pre-compiled and linked executable file which is stored separately on the computer's hard disk. It is loaded only when needed by an application. In most cases, multiple applications can use the same copy of the library at the same time and there is no need for the operating system to load multiple instances of the library into memory concurrently. In these cases, the libraries are stateless. That is, any data which must be stored by the library is stored by the application(s) it is serving. For this reason, these dynamic libraries are considered in-process.

One of the largest disadvantages of dynamic linking is that the executables depend on the separately stored libraries in order to function properly. If the library is deleted, moved, renamed or replaced with an incompatible version, the executable could malfunction. On Windows this is commonly known as DLL-hell.

Dynamic linking libraries date back to at least MTS (the Michigan Terminal System), built in the late 60s. ("A History of MTS", Information Technology Digest, Vol. 5, No. 5)

Dynamic loading

Additionally, a library may be loaded dynamically during the execution of a program, as opposed to when the program is loaded to main memory or started from main memory. The loading if the library is thus delayed until it is needed, and if it is never needed, it is never loaded. Such a library is referred to as a dynamically loaded library (DL).

This form of library is typically used for plug-in modules and interpreters needing to load certain functionality on demand.

An alternative to dynamic loading is to use some kind of software componentry or remote procedure call.

Naming

  • GNU/Linux, Solaris and BSD variants: libfoo.a and libfoo.so files are placed in folders like /lib, /usr/lib or /usr/local/lib are dynamically linked libraries. The filenames always start with lib, and end with .a (archive, static library) or .so (shared object, dynamically linked library), with an optional interface number. For example libfoo.so.2 is the second major interface revision of the dynamically linked library libfoo. Old Unix versions would use major and minor library revision numbers (libfoo.so.1.2) while contemporary Unixes will only use major revision numbers (libfoo.so.1). Dynamically loaded libraries are placed in /usr/libexec and similar directories.
  • MacOS X and upwards: libraries are named libfoo.dylib, with an optional interface number, such as libfoo.2.dylib.
  • Microsoft Windows: *.DLL files are dynamically linked libraries. The interface revisions are encoded in the files, or abstracted away using COM-object interfaces.

Shared library

Libraries can be linked dynamically. In Microsoft Windows, those are called dynamic linked libraries, or DLLs. Conventional libraries are often called static library to distinguish from shared libraries.

The shared library term is slightly ambiguous, because it covers at least two different concepts. First, it is the sharing of code located on disk by unrelated programs. The second concept is the sharing of code in memory, when programs execute the same physical page of RAM, mapped into different address spaces. RAM sharing can be accomplished by using position independent code as in Unix, which leads to a complex but flexible architecture, or by using normal, ie. not position independent code like in Microsoft Windows and OS/2, but making sure, by various tricks like pre-mapping the address space and reserving slots for each DLL, that code has a great probability of being shared. Windows DLLs are not shared libraries in the Unix sense. The rest of this article concentrates on aspects common to both variants.

A DLL is a software library (often stored in a file) consisting of a collection of resources or routines that are available to other programs. A program that wants to use these routines is linked (see linker) with the DLL at the time it is actually started or later. Oppose this with a static library, the contents of which are copied into the program when the program is compiled and linked.

A program performing the former task is called a loader, while the latter task is accomplished by a linker. However, to link a program against a DLL, thus making the program request that a particular DLL be loaded when it is started, the linker also needs to look into the DLL to verify that all symbols (routines and variables) used by the program are actually provided by the DLL, thus leaving the impression that dynamic linking is performed at compile time, while it actually happens at run time (in most cases, at program start time).

The process of making resources available to other programs is called exporting. Most common forms of exports include procedures (functions, routines, subroutines), variables, and some sorts of static data, e.g. icons. Exported procedures are also called entry points, because invoking them is akin to "entering" the library. In order to allow access to them, the resources receive names, which are written down inside a table, also containing their offsets inside the file. These names (and sometimes, by analogy, the resources they represent) are called symbols. Similarly, the table is called a symbol table.

In most modern operating systems, shared libraries can be of the same format as the "regular" executables. This allows two main advantages: first, it requires making only one loader for them, rather than two. Secondly, it allows the executables also to be used as DLLs, if they have a symbol table (see below). Typical executable/DLL formats are ELF (UNIX) and PE (Microsoft Windows). In Windows, the concept was taken one step further, with even system resources such as fonts being bundled in the DLL file format.

Executables are less likely to have a symbol table (they are not mandatory and are usually stripped down to save space), as opposed to DLLs which need one to serve their purpose. Aside from that, from most other aspects, the difference between DLLs and executables in modern operating systems is mostly conventional, as the other data structures are shared between the two types of files. Both have a record pointing at a main entry point. While an executable's main entry point is used by the operating system to launch it, the operating system uses a DLL's main entry point only when it is loaded by some application, to initialize that DLL. In other words, the user of the operating system cannot directly cause the invokation of the main (or indeed any other) entry point of a DLL.

The term DLL is mostly used on Windows and OS/2 products. On the UNIX platform, the term shared library is more commonly used. This is technically justified in view of the different semantics. More explanations are available in the position independent code article.

In some cases, an operating system can become overloaded with different versions of DLLs, which impedes its performance and stability. Such a scenario is known as DLL-hell.

See also