Sam Clegg | c94d393 | 2017-11-17 18:14:09 | [diff] [blame] | 1 | The ELF, COFF and Wasm Linkers |
| 2 | ============================== |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 3 | |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 4 | The ELF Linker as a Library |
| 5 | --------------------------- |
| 6 | |
| 7 | You can embed LLD to your program by linking against it and calling the linker's |
Alexandre Ganea | 6f2e92c | 2023-06-19 11:32:34 | [diff] [blame] | 8 | entry point function ``lld::lldMain``. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 9 | |
Nico Weber | efabe42 | 2020-01-10 04:09:48 | [diff] [blame] | 10 | The current policy is that it is your responsibility to give trustworthy object |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 11 | files. The function is guaranteed to return as long as you do not pass corrupted |
| 12 | or malicious object files. A corrupted file could cause a fatal error or SEGV. |
| 13 | That being said, you don't need to worry too much about it if you create object |
| 14 | files in the usual way and give them to the linker. It is naturally expected to |
| 15 | work, or otherwise it's a linker's bug. |
| 16 | |
| 17 | Design |
| 18 | ====== |
| 19 | |
| 20 | We will describe the design of the linkers in the rest of the document. |
| 21 | |
| 22 | Key Concepts |
| 23 | ------------ |
| 24 | |
| 25 | Linkers are fairly large pieces of software. |
| 26 | There are many design choices you have to make to create a complete linker. |
| 27 | |
| 28 | This is a list of design choices we've made for ELF and COFF LLD. |
| 29 | We believe that these high-level design choices achieved a right balance |
| 30 | between speed, simplicity and extensibility. |
| 31 | |
| 32 | * Implement as native linkers |
| 33 | |
| 34 | We implemented the linkers as native linkers for each file format. |
| 35 | |
Sam Clegg | c94d393 | 2017-11-17 18:14:09 | [diff] [blame] | 36 | The linkers share the same design but share very little code. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 37 | Sharing code makes sense if the benefit is worth its cost. |
Sam Clegg | c94d393 | 2017-11-17 18:14:09 | [diff] [blame] | 38 | In our case, the object formats are different enough that we thought the layer |
| 39 | to abstract the differences wouldn't be worth its complexity and run-time |
| 40 | cost. Elimination of the abstract layer has greatly simplified the |
| 41 | implementation. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 42 | |
| 43 | * Speed by design |
| 44 | |
Rui Ueyama | 27a3f01 | 2016-03-14 19:53:52 | [diff] [blame] | 45 | One of the most important things in archiving high performance is to |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 46 | do less rather than do it efficiently. |
| 47 | Therefore, the high-level design matters more than local optimizations. |
| 48 | Since we are trying to create a high-performance linker, |
| 49 | it is very important to keep the design as efficient as possible. |
| 50 | |
| 51 | Broadly speaking, we do not do anything until we have to do it. |
| 52 | For example, we do not read section contents or relocations |
| 53 | until we need them to continue linking. |
| 54 | When we need to do some costly operation (such as looking up |
| 55 | a hash table for each symbol), we do it only once. |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 56 | We obtain a handle (which is typically just a pointer to actual data) |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 57 | on the first operation and use it throughout the process. |
| 58 | |
| 59 | * Efficient archive file handling |
| 60 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 61 | LLD's handling of archive files (the files with ".a" file extension) is |
| 62 | different from the traditional Unix linkers and similar to Windows linkers. |
| 63 | We'll describe how the traditional Unix linker handles archive files, what the |
| 64 | problem is, and how LLD approached the problem. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 65 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 66 | The traditional Unix linker maintains a set of undefined symbols during |
| 67 | linking. The linker visits each file in the order as they appeared in the |
| 68 | command line until the set becomes empty. What the linker would do depends on |
| 69 | file type. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 70 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 71 | - If the linker visits an object file, the linker links object files to the |
| 72 | result, and undefined symbols in the object file are added to the set. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 73 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 74 | - If the linker visits an archive file, it checks for the archive file's |
| 75 | symbol table and extracts all object files that have definitions for any |
| 76 | symbols in the set. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 77 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 78 | This algorithm sometimes leads to a counter-intuitive behavior. If you give |
| 79 | archive files before object files, nothing will happen because when the linker |
| 80 | visits archives, there is no undefined symbols in the set. As a result, no |
| 81 | files are extracted from the first archive file, and the link is done at that |
| 82 | point because the set is empty after it visits one file. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 83 | |
| 84 | You can fix the problem by reordering the files, |
| 85 | but that cannot fix the issue of mutually-dependent archive files. |
| 86 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 87 | Linking mutually-dependent archive files is tricky. You may specify the same |
| 88 | archive file multiple times to let the linker visit it more than once. Or, |
| 89 | you may use the special command line options, `--start-group` and |
| 90 | `--end-group`, to let the linker loop over the files between the options until |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 91 | no new symbols are added to the set. |
| 92 | |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 93 | Visiting the same archive files multiple times makes the linker slower. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 94 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 95 | Here is how LLD approaches the problem. Instead of memorizing only undefined |
| 96 | symbols, we program LLD so that it memorizes all symbols. When it sees an |
| 97 | undefined symbol that can be resolved by extracting an object file from an |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 98 | archive file it previously visited, it immediately extracts the file and links |
| 99 | it. It is doable because LLD does not forget symbols it has seen in archive |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 100 | files. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 101 | |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 102 | We believe that LLD's way is efficient and easy to justify. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 103 | |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 104 | The semantics of LLD's archive handling are different from the traditional |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 105 | Unix's. You can observe it if you carefully craft archive files to exploit |
| 106 | it. However, in reality, we don't know any program that cannot link with our |
| 107 | algorithm so far, so it's not going to cause trouble. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 108 | |
Rui Ueyama | 88daab9 | 2016-03-14 18:43:01 | [diff] [blame] | 109 | Numbers You Want to Know |
| 110 | ------------------------ |
| 111 | |
| 112 | To give you intuition about what kinds of data the linker is mainly working on, |
Rui Ueyama | 27a3f01 | 2016-03-14 19:53:52 | [diff] [blame] | 113 | I'll give you the list of objects and their numbers LLD has to read and process |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 114 | in order to link a very large executable. In order to link Chrome with debug |
| 115 | info, which is roughly 2 GB in output size, LLD reads |
Rui Ueyama | 88daab9 | 2016-03-14 18:43:01 | [diff] [blame] | 116 | |
Rui Ueyama | 27a3f01 | 2016-03-14 19:53:52 | [diff] [blame] | 117 | - 17,000 files, |
| 118 | - 1,800,000 sections, |
| 119 | - 6,300,000 symbols, and |
| 120 | - 13,000,000 relocations. |
Rui Ueyama | 88daab9 | 2016-03-14 18:43:01 | [diff] [blame] | 121 | |
Rui Ueyama | 27a3f01 | 2016-03-14 19:53:52 | [diff] [blame] | 122 | LLD produces the 2 GB executable in 15 seconds. |
Rui Ueyama | 88daab9 | 2016-03-14 18:43:01 | [diff] [blame] | 123 | |
| 124 | These numbers vary depending on your program, but in general, |
| 125 | you have a lot of relocations and symbols for each file. |
| 126 | If your program is written in C++, symbol names are likely to be |
| 127 | pretty long because of name mangling. |
| 128 | |
| 129 | It is important to not waste time on relocations and symbols. |
| 130 | |
| 131 | In the above case, the total amount of symbol strings is 450 MB, |
| 132 | and inserting all of them to a hash table takes 1.5 seconds. |
| 133 | Therefore, if you causally add a hash table lookup for each symbol, |
| 134 | it would slow down the linker by 10%. So, don't do that. |
| 135 | |
| 136 | On the other hand, you don't have to pursue efficiency |
| 137 | when handling files. |
| 138 | |
Sean Silva | 69d65ac | 2017-01-22 03:28:56 | [diff] [blame] | 139 | Important Data Structures |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 140 | ------------------------- |
| 141 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 142 | We will describe the key data structures in LLD in this section. The linker can |
| 143 | be understood as the interactions between them. Once you understand their |
| 144 | functions, the code of the linker should look obvious to you. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 145 | |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 146 | * Symbol |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 147 | |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 148 | This class represents a symbol. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 149 | They are created for symbols in object files or archive files. |
| 150 | The linker creates linker-defined symbols as well. |
| 151 | |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 152 | There are basically three types of Symbols: Defined, Undefined, or Lazy. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 153 | |
| 154 | - Defined symbols are for all symbols that are considered as "resolved", |
| 155 | including real defined symbols, COMDAT symbols, common symbols, |
| 156 | absolute symbols, linker-created symbols, etc. |
| 157 | - Undefined symbols represent undefined symbols, which need to be replaced by |
| 158 | Defined symbols by the resolver until the link is complete. |
| 159 | - Lazy symbols represent symbols we found in archive file headers |
Nico Weber | 6a5d94cc | 2019-01-14 00:45:27 | [diff] [blame] | 160 | which can turn into Defined if we read archive members. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 161 | |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 162 | There's only one Symbol instance for each unique symbol name. This uniqueness |
| 163 | is guaranteed by the symbol table. As the resolver reads symbols from input |
| 164 | files, it replaces an existing Symbol with the "best" Symbol for its symbol |
| 165 | name using the placement new. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 166 | |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 167 | The above mechanism allows you to use pointers to Symbols as a very cheap way |
| 168 | to access name resolution results. Assume for example that you have a pointer |
| 169 | to an undefined symbol before name resolution. If the symbol is resolved to a |
| 170 | defined symbol by the resolver, the pointer will "automatically" point to the |
| 171 | defined symbol, because the undefined symbol the pointer pointed to will have |
| 172 | been replaced by the defined symbol in-place. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 173 | |
| 174 | * SymbolTable |
| 175 | |
| 176 | SymbolTable is basically a hash table from strings to Symbols |
Sylvestre Ledru | b307a0b | 2017-03-02 10:40:24 | [diff] [blame] | 177 | with logic to resolve symbol conflicts. It resolves conflicts by symbol type. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 178 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 179 | - If we add Defined and Undefined symbols, the symbol table will keep the |
| 180 | former. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 181 | - If we add Defined and Lazy symbols, it will keep the former. |
| 182 | - If we add Lazy and Undefined, it will keep the former, |
| 183 | but it will also trigger the Lazy symbol to load the archive member |
| 184 | to actually resolve the symbol. |
| 185 | |
| 186 | * Chunk (COFF specific) |
| 187 | |
| 188 | Chunk represents a chunk of data that will occupy space in an output. |
| 189 | Each regular section becomes a chunk. |
| 190 | Chunks created for common or BSS symbols are not backed by sections. |
| 191 | The linker may create chunks to append additional data to an output as well. |
| 192 | |
| 193 | Chunks know about their size, how to copy their data to mmap'ed outputs, |
| 194 | and how to apply relocations to them. |
| 195 | Specifically, section-based chunks know how to read relocation tables |
| 196 | and how to apply them. |
| 197 | |
| 198 | * InputSection (ELF specific) |
| 199 | |
| 200 | Since we have less synthesized data for ELF, we don't abstract slices of |
| 201 | input files as Chunks for ELF. Instead, we directly use the input section |
| 202 | as an internal data type. |
| 203 | |
| 204 | InputSection knows about their size and how to copy themselves to |
| 205 | mmap'ed outputs, just like COFF Chunks. |
| 206 | |
| 207 | * OutputSection |
| 208 | |
| 209 | OutputSection is a container of InputSections (ELF) or Chunks (COFF). |
| 210 | An InputSection or Chunk belongs to at most one OutputSection. |
| 211 | |
| 212 | There are mainly three actors in this linker. |
| 213 | |
| 214 | * InputFile |
| 215 | |
| 216 | InputFile is a superclass of file readers. |
| 217 | We have a different subclass for each input file type, |
| 218 | such as regular object file, archive file, etc. |
Rui Ueyama | f1f0084 | 2017-10-31 16:07:41 | [diff] [blame] | 219 | They are responsible for creating and owning Symbols and InputSections/Chunks. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 220 | |
| 221 | * Writer |
| 222 | |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 223 | The writer is responsible for writing file headers and InputSections/Chunks to |
| 224 | a file. It creates OutputSections, put all InputSections/Chunks into them, |
| 225 | assign unique, non-overlapping addresses and file offsets to them, and then |
| 226 | write them down to a file. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 227 | |
| 228 | * Driver |
| 229 | |
Sylvestre Ledru | b307a0b | 2017-03-02 10:40:24 | [diff] [blame] | 230 | The linking process is driven by the driver. The driver: |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 231 | |
| 232 | - processes command line options, |
| 233 | - creates a symbol table, |
Sam Clegg | c3e16d6 | 2017-12-05 23:28:04 | [diff] [blame] | 234 | - creates an InputFile for each input file and puts all symbols within into |
| 235 | the symbol table, |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 236 | - checks if there's no remaining undefined symbols, |
| 237 | - creates a writer, |
| 238 | - and passes the symbol table to the writer to write the result to a file. |
| 239 | |
| 240 | Link-Time Optimization |
| 241 | ---------------------- |
| 242 | |
| 243 | LTO is implemented by handling LLVM bitcode files as object files. |
| 244 | The linker resolves symbols in bitcode files normally. If all symbols |
Rui Ueyama | 7a32d9a | 2016-03-22 20:58:15 | [diff] [blame] | 245 | are successfully resolved, it then runs LLVM passes |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 246 | with all bitcode files to convert them to one big regular ELF/COFF file. |
| 247 | Finally, the linker replaces bitcode symbols with ELF/COFF symbols, |
Rui Ueyama | 27a3f01 | 2016-03-14 19:53:52 | [diff] [blame] | 248 | so that they are linked as if they were in the native format from the beginning. |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 249 | |
| 250 | The details are described in this document. |
Sylvestre Ledru | 72fd103 | 2020-03-22 21:42:03 | [diff] [blame] | 251 | https://llvm.org/docs/LinkTimeOptimization.html |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 252 | |
| 253 | Glossary |
| 254 | -------- |
| 255 | |
| 256 | * RVA (COFF) |
| 257 | |
| 258 | Short for Relative Virtual Address. |
| 259 | |
| 260 | Windows executables or DLLs are not position-independent; they are |
| 261 | linked against a fixed address called an image base. RVAs are |
| 262 | offsets from an image base. |
| 263 | |
| 264 | Default image bases are 0x140000000 for executables and 0x18000000 |
| 265 | for DLLs. For example, when we are creating an executable, we assume |
| 266 | that the executable will be loaded at address 0x140000000 by the |
| 267 | loader, so we apply relocations accordingly. Result texts and data |
| 268 | will contain raw absolute addresses. |
| 269 | |
| 270 | * VA |
| 271 | |
| 272 | Short for Virtual Address. For COFF, it is equivalent to RVA + image base. |
| 273 | |
| 274 | * Base relocations (COFF) |
| 275 | |
| 276 | Relocation information for the loader. If the loader decides to map |
| 277 | an executable or a DLL to a different address than their image |
| 278 | bases, it fixes up binaries using information contained in the base |
| 279 | relocation table. A base relocation table consists of a list of |
| 280 | locations containing addresses. The loader adds a difference between |
| 281 | RVA and actual load address to all locations listed there. |
| 282 | |
| 283 | Note that this run-time relocation mechanism is much simpler than ELF. |
| 284 | There's no PLT or GOT. Images are relocated as a whole just |
| 285 | by shifting entire images in memory by some offsets. Although doing |
| 286 | this breaks text sharing, I think this mechanism is not actually bad |
| 287 | on today's computers. |
| 288 | |
| 289 | * ICF |
| 290 | |
| 291 | Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). |
| 292 | |
| 293 | ICF is an optimization to reduce output size by merging read-only sections |
| 294 | by not only their names but by their contents. If two read-only sections |
| 295 | happen to have the same metadata, actual contents and relocations, |
| 296 | they are merged by ICF. It is known as an effective technique, |
| 297 | and it usually reduces C++ program's size by a few percent or more. |
| 298 | |
Nico Weber | d7d98aa | 2017-08-04 20:06:03 | [diff] [blame] | 299 | Note that this is not an entirely sound optimization. C/C++ require |
Rui Ueyama | 02aa766 | 2016-03-12 06:06:40 | [diff] [blame] | 300 | different functions have different addresses. If a program depends on |
| 301 | that property, it would fail at runtime. |
| 302 | |
| 303 | On Windows, that's not really an issue because MSVC link.exe enabled |
| 304 | the optimization by default. As long as your program works |
| 305 | with the linker's default settings, your program should be safe with ICF. |
| 306 | |
| 307 | On Unix, your program is generally not guaranteed to be safe with ICF, |
| 308 | although large programs happen to work correctly. |
| 309 | LLD works fine with ICF for example. |
James Y Knight | 94b9709 | 2019-02-01 17:06:41 | [diff] [blame] | 310 | |
| 311 | Other Info |
| 312 | ---------- |
| 313 | |
| 314 | .. toctree:: |
| 315 | :maxdepth: 1 |
| 316 | |
| 317 | missingkeyfunction |