0% found this document useful (0 votes)
7 views

Linker and Loader

LInker and Loader

Uploaded by

Ishan Shivankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Linker and Loader

LInker and Loader

Uploaded by

Ishan Shivankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 Background on Linkers and Loaders

When you write code in a high-level programming language (like C or C++), it has to go through several
steps before it can actually run on a computer. Here’s an overview of the process:

1. Source Code: This is the human-readable code you write.


2. Compiler: Transforms the high-level code into object code (a lower-level, machine-readable format
but not yet complete or executable).
3. Object Code: Each source code file typically becomes its own object file, containing compiled code
but without the full program assembled.

4. Linker and Loader: These finalize the program so that it can actually run.

2 What is a Linker?
The Linker is a system program that takes one or more object files (output from the compiler) and
combines them into a single, executable program. The main jobs of the linker are:

• Resolve Symbolic References: In code, you often refer to functions or variables that might be
defined in another file. These references are called *symbolic references*. The linker finds where these
symbols (like function names or variables) actually reside in memory and updates the code to point to
the correct locations.
• Combine Object Files: If you have a large program broken into multiple files (like main.c, math.c,
io.c), each file will have its own object file. The linker combines these into one final executable file.
• Address Binding: The linker assigns memory addresses to different parts of the program, such as
where variables and functions will reside in memory during execution.

2.1 Types of Linking


1. Static Linking: All code and libraries needed by the program are combined into a single executable
file during the linking process. This results in larger file sizes but ensures all dependencies are included.
2. Dynamic Linking: Instead of including all code within the executable, dynamic linking uses shared
libraries (.dll files on Windows, .so files on Unix/Linux) that are loaded at runtime. This keeps file
sizes smaller but requires that the libraries be present on the system where the program runs.

2.2 Example of Linking Process


Consider two files:

• main.c (calls a function add from math.c)


• math.c (contains the add function)
The compiler generates two object files, main.o and math.o. The linker:

1. Combines them into one program.


2. Replaces any call to add in main.c with the actual address where add is located in math.c.

1
3 What is a Loader?
Once the linker has created an executable file, the Loader comes into play. The loader is responsible for
loading the program into memory so it can be executed by the CPU. The loader’s tasks include:

• Loading the Program: The loader copies the program’s code from disk into the system’s memory
(RAM).
• Setting Up Memory Space: Allocates memory for the program, including for global variables and
functions. It sets up the *stack* and *heap* for the program’s dynamic memory needs.
• Adjusting Addresses: For dynamically linked programs, the loader may need to adjust memory
addresses for any libraries that are loaded at runtime. This is known as *relocation*.
• Transfer Control: Finally, the loader hands control of the CPU to the program, starting its execution
from the program’s main function.

3.1 Types of Loaders


1. Absolute Loader: Loads the program exactly where specified in the memory. It doesn’t perform any
address modification.
2. Relocating Loader: Adjusts addresses based on where the program is actually loaded into memory.
This is necessary for systems with limited memory where programs might need to be loaded into
different locations.
3. Dynamic Loading: Loads parts of the program only when they are needed, rather than all at once,
which can save memory.

3.2 Step-by-Step Workflow: Linker and Loader in Action


1. Code Compilation:
• The source code in each file is compiled into object code.

2. Linking:
• The linker takes all the object files and combines them into one executable.
• It resolves references to variables and functions that are used across files.
• It performs static or dynamic linking as required.

3. Loading:
• The loader loads the executable into memory.
• It allocates memory for program data and prepares any necessary runtime environment.
• For dynamically linked programs, it loads shared libraries and updates memory addresses.
• Finally, it transfers control to the program’s entry point, often the main function.

3.3 Importance of Linkers and Loaders


• Efficiency: Linkers enable code modularity, so programmers can break down large programs into
smaller, manageable files.

• Memory Management: Loaders allow efficient use of memory by loading only the parts of code that
are needed and by supporting dynamic linking.
• Reusability: Linking enables code reuse by allowing libraries to be shared across programs.

2
4 Binding in Programming and Address Relocation
Binding is a fundamental concept in programming that refers to the process of associating a method or
function call with the actual method or function implementation. The method or function invoked in
response to a call is determined based on when and how this association occurs. Binding can take place at
compile-time (static binding) or at runtime (dynamic binding).

4.1 Types of Binding


1. Static Binding (Early Binding)
2. Dynamic Binding (Late Binding)

Let’s explore both types of binding in detail, and also touch on addressing and relocation, which are
critical in understanding how bindings relate to memory management during execution.

4.2 Static Binding (Early Binding)


Static Binding occurs when the method or function to be invoked is determined at compile-time. This
means that during compilation, the compiler resolves the method call based on the method signature and
the reference type. Static binding is used when the method being called is not overridden (i.e., it doesn’t
involve inheritance or polymorphism).

4.2.1 Characteristics of Static Binding


• Binding Time: Determined during compilation.
• Method Resolution: Based on method signature and reference type.
• Performance: Faster because it is resolved at compile-time.

• Example: Function calls where the method is not overridden (no polymorphism).

4.2.2 Example in Java (Static Binding)

1 class Animal {
2 public void sound () {
3 System . out . println (" Some animal sound ") ;
4 }
5 }
6
7 public class Main {
8 public static void main ( String [] args ) {
9 Animal a = new Animal () ;
10 a . sound () ; // Static binding happens here
11 }
12 }

In this example, the method sound() is bound statically because it is resolved at compile-time when the
reference a is recognized as pointing to an Animal object.

4.3 Dynamic Binding (Late Binding)


Dynamic Binding occurs when the method to be invoked is determined at runtime. This is essential
in cases where method calls can vary depending on the actual object type being referred to, especially in
polymorphic scenarios (method overriding in inheritance). The compiler cannot know which method to
invoke until runtime because the reference could point to different subclass objects.

3
4.3.1 Characteristics of Dynamic Binding
• Binding Time: Determined during runtime.
• Method Resolution: Based on the actual object type at runtime.

• Performance: Slower than static binding since resolution occurs during execution.
• Example: Polymorphic method calls (method overriding).

4.3.2 Example in Java (Dynamic Binding)

1 class Animal {
2 public void sound () {
3 System . out . println (" Some animal sound ") ;
4 }
5 }
6
7 class Dog extends Animal {
8 @Override
9 public void sound () {
10 System . out . println (" Bark ") ;
11 }
12 }
13
14 class Cat extends Animal {
15 @Override
16 public void sound () {
17 System . out . println (" Meow ") ;
18 }
19 }
20
21 public class Main {
22 public static void main ( String [] args ) {
23 Animal a = new Dog () ; // a refers to a Dog object
24 a . sound () ; // Resolves to Dog ’ s sound () method at runtime
25
26 a = new Cat () ; // a now refers to a Cat object
27 a . sound () ; // Resolves to Cat ’ s sound () method at runtime
28 }
29 }

In this example, dynamic binding occurs because at runtime, the sound() method call is resolved to
the method of the actual object (either Dog or Cat) that a is pointing to.

4.4 Addressing and Relocation


In the context of binding, addressing and relocation play crucial roles in memory management and
execution.

4.4.1 Addressing
Addressing refers to the process of associating a memory address with a variable, function, or method. When
a program is compiled and executed, variables, methods, and functions are stored in memory, and each must
have a unique address. During binding, the address of a method or function is determined based on the
reference type or object type (static or dynamic).

• In static binding, the address is determined at compile-time because the method to be called is fixed.
• In dynamic binding, the address is determined at runtime because the method to be called depends
on the actual object type, which is known only during execution.

4
4.4.2 Relocation
Relocation refers to the process of adjusting addresses when a program is loaded into memory. Programs
are often compiled to an intermediate form, and their memory locations may change when they are loaded
into memory for execution. Relocation ensures that all memory addresses are updated correctly, including
those that were determined during binding.
• Static binding: The memory addresses for functions or methods are fixed at compile-time. There is
no need for runtime relocation of addresses.
• Dynamic binding: During runtime, methods are selected dynamically, and the program must adjust
addresses as necessary to ensure that the correct method is invoked.

4.4.3 Example of Relocation in Dynamic Binding


Consider a program that uses dynamic loading of libraries (e.g., dynamic link libraries in C or shared objects
in Linux). The addresses of functions in those libraries might not be known until the program is running,
and relocation is required to correctly map those function calls to their addresses.
1 void (* func ) () ;
2 func = dlsym ( handle , " myFunction ") ;
3 func () ; // The function address is resolved dynamically at runtime

Here, the address of myFunction is not fixed and is dynamically located during the execution, requiring
relocation.

5 Relocation and Linking Concepts in Program Execution


Relocation is a process in computer systems where a program’s instructions and data, originally written to
run at a specific memory address, are adjusted so that they can run correctly from any location in memory.
This is particularly important when programs are loaded into memory at addresses that are not known until
runtime.
In simpler terms, address-sensitive programs assume that their instructions and data will be loaded at
specific, predefined memory locations. If the program is loaded at a different memory location than expected,
it might fail because the program’s instructions will reference the wrong memory addresses. Relocation
ensures that such programs can run correctly, regardless of where they are loaded in memory.

5.1 Address-Sensitive Program


An address-sensitive program is one that relies on absolute memory addresses for both instructions and
data. These programs assume that specific addresses in memory are reserved for certain instructions and
data. If the program is loaded at an address other than the one it expects, the program may fail.
For example, if a program has an instruction that refers to memory location 500, it assumes that the
instruction or data will always be at that location. If the program is loaded at a different memory address,
such as 700, this reference to memory location 500 will cause an error because the program is now referring
to the wrong address.

5.2 Program Relocation


Program relocation refers to the process of adjusting the memory addresses used by a program so that the
program can be executed correctly from any memory location. This process ensures that when a program is
loaded into a different part of memory, all addresses in the program (such as instruction addresses and data
addresses) are adjusted accordingly.
Relocation is typically handled by the linker and loader:
1. Linker: The linker is responsible for modifying the addresses in the program at link time. If the linked
origin (the address where the program is linked) is different from the translated origin (the address
where the program will actually be loaded), the linker performs the relocation.

5
2. Loader: The loader is responsible for loading the program into memory at runtime. If the load origin
(the actual address where the program is loaded) is different from the linked origin, the loader performs
the relocation. In general, the relocation is performed by the linker, but if necessary, the loader can
also handle it.

5.3 Relocation Factor


To perform relocation, we need to calculate the relocation factor. The relocation factor is the difference
between the linked origin and the translated origin. It tells us how much to adjust the memory addresses
used by the program.
The formula for the relocation factor is:

Relocation Factor = Linked Origin − Translated Origin

• Linked Origin: The address where the program was linked (usually a fixed memory location).
• Translated Origin: The address where the program will be loaded into memory at runtime.

5.4 Performing Relocation with an Example


Let’s walk through an example of relocation with an assembly program.
Consider the following assembly program:
1 START 500
2 ENTRY TOTAL
3 EXTRN MAX , ALPHA
4 READ A
5 09 0540 500 ; Address of instruction is 540 ( address - sensitive )
6 MOVER AREG , ALPHA
7 BC ANY , MAX
8 06 6 000
9 BC LT , LOOP
10 06 1 501
11 STOP
12 00 0 000
13 A DS 1
14 TOTAL DS 1
15 541 END

1. Linking and Address-Sensitivity:

• The instruction READ A refers to address 540. This is an address-sensitive instruction because it
assumes that memory location 540 contains the necessary data.
• At the link time, the program is linked at linked origin = 700.
2. Calculating the Relocation Factor:

• Linked Origin = 700


• Translated Origin = 500 (This is the origin where the program was originally linked or expected
to be loaded)
• Relocation Factor = Linked Origin - Translated Origin

Relocation Factor = 700 − 500 = 200

3. Relocation of Address-Sensitive Instructions:

• In the instruction READ A, the address 540 needs to be adjusted.


• The translated address of A is 540 (as per the original program).

6
• The linked address of A is calculated by adding the relocation factor to the translated address:

Linked Address = 540 + 200 = 740

• This means that the program should now use the address 740 for the symbol A instead of 540.
4. Adjusting All Instructions: All the other address-sensitive instructions in the program must also
be adjusted using the relocation factor.

5.5 Formulae for Relocation


Here are the formulae used in the relocation process:

1. Relocation Factor:

Relocation Factor = Linked Origin − Translated Origin

2. Linked Address (after applying relocation):

Linked Address = Translated Address + Relocation Factor

5.6 Assembly Example Walkthrough


Let’s take the example program provided and relocate all addresses using the relocation factor.

1. Original program addresses:


• READ A uses address 540
• MOVER AREG, ALPHA uses address 500
• BC ANY, MAX uses address 000
2. Relocation Factor = 200
3. Adjusted addresses:
• The address 540 for READ A becomes 540 + 200 = 740
• The address 500 for MOVER AREG, ALPHA becomes 500 + 200 = 700
• The address 000 for BC ANY, MAX becomes 000 + 200 = 200
4. Final relocated program:
1 START 500
2 ENTRY TOTAL
3 EXTRN MAX , ALPHA
4 READ A ; Address adjusted to 740
5 09 0740 700 ; Relocated instruction
6 MOVER AREG , ALPHA
7 BC ANY , MAX
8 06 6 200
9 BC LT , LOOP
10 06 1 701
11 STOP
12 00 0 000
13 A DS 1
14 TOTAL DS 1
15 741 END

In this relocated program, all addresses have been adjusted correctly by adding the relocation factor of
200. This ensures that the program can now execute correctly even if it is loaded at a different address in
memory.

7
5.7 Key Steps for Relocation
1. Calculate the Relocation Factor: Subtract the translated origin from the linked origin.
2. Adjust the Addresses: For each address-sensitive instruction, add the relocation factor to the
translated address to obtain the linked address.

3. Recalculate the Memory Layout: Make sure all instructions and data are correctly relocated.
4. Load the Program: The relocated program can now be loaded into memory at any location and
executed correctly.

6 Background on Relocation, Linking, and External References


6.1 Public and External References
• Public Definition: This is a symbol defined in one program unit that is intended to be used by other
program units. Public definitions are declared in the ENTRY statement.
• ENTRY: Lists the symbols that are defined in the program and can be used by other modules. For
example, the symbol TOTAL is a public definition.

• External Reference: An external reference is a reference to a symbol that is not defined within the
current program unit but is defined in another program unit. These references must be resolved during
linking.
• EXTRN: This statement is used to declare external symbols that will be linked later. For example, if
MAX and ALPHA are defined in another program, their references are declared as EXTRN in the current
program.

Example of External References:


In the assembly program, you may see something like:
1 EXTRN MAX , ALPHA

This tells the assembler that the symbols MAX and ALPHA are external references that are defined elsewhere.
The assembler will leave these fields as 0 (or unresolved) and the linker will resolve them later.

6.2 Linking
Linking is the process of combining different program units (modules) and resolving all the external ref-
erences. When a program unit references an external symbol (like MAX or ALPHA), the linker will replace
these references with the correct addresses. If the program unit is linked at a certain memory address (called
the link origin), the addresses of the symbols are modified accordingly.
Linking occurs in two phases:

1. Static Linking: All external references are resolved at link time.


2. Dynamic Linking: External references are resolved at runtime.

Example:
Consider a program Prog-P where MAX is an external symbol, and ALPHA is defined in another program
Prog-Q. When these programs are linked together, the linker resolves MAX and ALPHA by replacing the
references with the correct addresses.

8
6.3 Assembly Example
Let’s now work through a concrete assembly example to see how these concepts are used in practice.
1 START 500 ; Program starts at address 500
2 ENTRY TOTAL ; Public definition
3 EXTRN MAX , ALPHA ; External references
4 READ A
5 090540 500 ; Instruction at address 540
6 MOVER AREG , ALPHA ; Moves data from ALPHA into AREG
7 BC ANY , MAX ; Branch if condition is met with MAX
8 06 6 000 ; Another instruction
9 BC LT , LOOP ; Branch to LOOP if condition is met
10 06 1 501 ; Instruction at address 501
11 STOP ; Stop program execution
12 00 0 000 A DS 1 ; Allocate 1 word for A
13 TOTAL DS 1 ; Allocate 1 word for TOTAL
14 END

In this example:

• ENTRY TOTAL: The symbol TOTAL is a public definition, meaning it can be accessed by other
modules.

• EXTRN MAX, ALPHA: These symbols are external references, meaning they are defined in other
program units.
• The assembler will not know the addresses of MAX and ALPHA yet, so it leaves these addresses as 0.

6.4 Linking Example with Address Calculation


Consider the following scenario where two program units (Prog-P and Prog-Q) are linked.

• Prog-P contains an external reference to the symbol ALPHA, which is defined in Prog-Q.
• The program unit Prog-Q has a public definition for ALPHA at a translation-time address of 231.

Now, when Prog-P is linked to Prog-Q, the linker will adjust the addresses to match the link origin and
resolve the external reference.
Let’s assume:

• The link origin of Prog-P is 700.


• The link origin of Prog-Q is 742.

• The address of ALPHA in Prog-Q is 231 (its translation-time address).

After linking, the link-time address of ALPHA in Prog-P becomes:


1 Linked address of ALPHA = Link origin of Prog - P + ( Address of ALPHA in Prog - Q ) = 700 + (231)
= 773

Thus, the reference to ALPHA in Prog-P will be replaced with 773 during linking.

6.5 Binary Program and Final Assembly


After linking and resolving all external references, the binary program is generated. A binary program is a
machine-readable format consisting of relocated instructions and resolved references. It is ready for loading
into memory and execution.
Final Linking Command:
The linker uses the following command to generate a binary program from the object modules:

Linker <link origin>, <object module names>, <execution start address>

9
• <link origin>: The memory address where the program will be loaded.
• <object module names>: The names of the object modules to be linked.
• <execution start address>: The address where the program execution will begin (if not specified,
it’s assumed to be the link origin).

10

You might also like