blob: 5b079e2446770bccb565caf7c4adc6cdc735675f [file] [log] [blame]
Chris Lattner22eb9722006-06-18 05:43:121//===----------------------------------------------------------------------===//
2// C Language Family Front-end
3//===----------------------------------------------------------------------===//
Chris Lattner19acaad2006-10-06 05:20:104 Chris Lattner
Chris Lattner22eb9722006-06-18 05:43:125
6I. Introduction:
7
8 clang: noun
9 1. A loud, resonant, metallic sound.
10 2. The strident call of a crane or goose.
11 3. C-language front-end toolkit.
Chris Lattner22eb9722006-06-18 05:43:1212
Chris Lattner87d229a2006-10-06 04:10:2513 The world needs better compiler tools, tools which are built as libraries. This
14 design point allows reuse of the tools in new and novel ways. However, building
15 the tools as libraries isn't enough: they must have clean APIs, be as
16 decoupled from each other as possible, and be easy to modify/extend. This
17 requires clean layering, decent design, and avoiding tying the libraries to a
18 specific use. Oh yeah, did I mention that we want the resultant libraries to
19 be as fast as possible? :)
20
21 This front-end is built as a component of the LLVM toolkit (which really really
22 needs a better name) that can be used with the LLVM backend or independently of
23 it. In this spirit, the API has been carefully designed to include the
24 following components:
25
26 libsupport - Basic support library, reused from LLVM.
27 libsystem - System abstraction library, reused from LLVM.
Chris Lattner19acaad2006-10-06 05:20:1028 libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction,
29 file system caching for input source files.
30 liblex - C/C++/ObjC lexing and preprocessing, identifier hash table,
31 pragma handling, tokens, and macros.
32 libparse - C99 (for now) parsing and local semantic analysis. This library
33 invokes coarse-grained 'Actions' provided by the client to do
34 stuff (great idea shamelessly stolen from Devkit). ObjC/C90
35 need to be added soon, K&R C and C++ can be added in the
36 future, but are not a high priority.
Chris Lattner87d229a2006-10-06 04:10:2537 libast - Provides a set of parser actions to build a standardized AST
38 for programs. AST can be built in two forms: streamlined and
39 'complete' mode, which captures *full* location info for every
Chris Lattner19acaad2006-10-06 05:20:1040 token in the AST. AST's are 'streamed' out a top-level
41 declaration at a time, allowing clients to use decl-at-a-time
42 processing, build up entire translation units, or even build
Chris Lattner56c7a552006-10-14 05:19:0043 'whole program' ASTs depending on how they use the APIs.
Chris Lattner87d229a2006-10-06 04:10:2544 libast2llvm - [Planned] Lower the AST to LLVM IR for optimization & codegen.
Chris Lattnere1f4e212006-10-06 04:16:4145 clang - An example client of the libraries at various levels.
Chris Lattner87d229a2006-10-06 04:10:2546
47 This front-end has been intentionally built as a stack, making it trivial
48 to replace anything below a particular point. For example, if you want a
49 preprocessor, you take the Basic and Lexer libraries. If you want an indexer,
50 you take those plus the Parser library and provide some actions for indexing.
51 If you want a refactoring, static analysis, or source-to-source compiler tool,
52 it makes sense to take those plus the AST building library. Finally, if you
53 want to use this with the LLVM backend, you'd take these components plus the
54 AST to LLVM lowering code.
55
56 In the future I hope this toolkit will grow to include new and interesting
57 components, including a C++ front-end, ObjC support, AST pretty printing
58 support, and a whole lot of other things.
59
60 Finally, it should be pointed out that the goal here is to build something that
61 is high-quality and industrial-strength: all the obnoxious features of the C
62 family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
63 prototypes, GCC/MS extensions, etc). It cannot be used if it's not 'real'.
Chris Lattner22eb9722006-06-18 05:43:1264
Chris Lattnerd504f7d2006-10-06 05:56:1465
66II. Usage of clang driver:
67
68 * Basic Command-Line Options:
69 - Help: clang --help
Chris Lattner110da6972006-10-17 05:20:3070 - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
Chris Lattnerd504f7d2006-10-06 05:56:1471 - Make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
Chris Lattner56c7a552006-10-14 05:19:0072 - Enable metric printing: -stats
Chris Lattnerd504f7d2006-10-06 05:56:1473
74 * -parse-noop is the default mode.
75
76 * -E mode gives output nearly identical to GCC, though not all bugs in
77 whitespace calculation have been emulated.
78
79 * -parse-print-callbacks doesn't print all callbacks yet.
80
81 * -parse-print-ast isn't complete, it currently prints decls and stuff nested
82 in parens. This will improve as more AST nodes are implemented.
83
84 * -fsyntax-only is currently identical to -parse-noop.
85
86III. Current advantages over GCC:
Chris Lattner22eb9722006-06-18 05:43:1287
Chris Lattner3ba544e2006-08-12 18:43:5488 * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
89 * All diagnostics have column numbers, includes 'caret diagnostics'.
Chris Lattner22eb9722006-06-18 05:43:1290 * Full diagnostic customization by client (can format diagnostics however they
Chris Lattnerd504f7d2006-10-06 05:56:1491 like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
Chris Lattner22eb9722006-06-18 05:43:1292 * Built as a framework, can be reused by multiple tools.
93 * All languages supported linked into same library (no cc1,cc1obj, ...).
94 * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
95 * BSD License, can be linked into non-GPL projects.
Chris Lattnerae411572006-07-05 00:55:0896 * Full diagnostic control, per diagnostic.
Chris Lattnereb401b12006-08-17 05:20:5097 * Faster than GCC at parsing, lexing, and preprocessing.
Chris Lattner22eb9722006-06-18 05:43:1298
99Future Features:
Chris Lattnerf96a1662006-08-14 00:13:31100
Chris Lattnerae411572006-07-05 00:55:08101 * Fine grained diag control within the source (#pragma enable/disable warning).
Chris Lattner56c7a552006-10-14 05:19:00102 * Faster than GCC at AST generation [measure when complete].
Chris Lattner22eb9722006-06-18 05:43:12103 * Better token tracking within macros? (Token came from this line, which is
104 a macro argument instantiated here, recursively instantiated here).
Chris Lattner2b18b7f2006-08-10 18:48:21105 * Fast #import!
106 * Dependency tracking: change to header file doesn't recompile every function
Chris Lattner87d229a2006-10-06 04:10:25107 that texually depends on it: recompile only those functions that need it.
Chris Lattner2b18b7f2006-08-10 18:48:21108 * Defers exposing platform-specific stuff to as late as possible, tracks use of
Chris Lattner87d229a2006-10-06 04:10:25109 platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
Chris Lattner22eb9722006-06-18 05:43:12110
111
Chris Lattnerd504f7d2006-10-06 05:56:14112IV. Missing Functionality / Improvements
113
114clang driver:
115 * predefined macros/search paths are hard-coded into the driver.
Chris Lattner22eb9722006-06-18 05:43:12116
Chris Lattnerc5cd2d62006-07-19 03:39:58117File Manager:
118 * We currently do a lot of stat'ing for files that don't exist, particularly
119 when lots of -I paths exist (e.g. see the <iostream> example, check for
120 failures in stat in FileManager::getFile). It would be far better to make
121 the following changes:
122 1. FileEntry contains a sys::Path instead of a std::string for Name.
123 2. sys::Path contains timestamp and size, lazily computed. Eliminate from
124 FileEntry.
125 3. File UIDs are created on request, not when files are opened.
126 These changes make it possible to efficiently have FileEntry objects for
127 files that exist on the file system, but have not been used yet.
128
129 Once this is done:
130 1. DirectoryEntry gets a boolean value "has read entries". When false, not
131 all entries in the directory are in the file mgr, when true, they are.
132 2. Instead of stat'ing the file in FileManager::getFile, check to see if
133 the dir has been read. If so, fail immediately, if not, read the dir,
134 then retry.
135 3. Reading the dir uses the getdirentries syscall, creating an FileEntry
136 for all files found.
137
Chris Lattner22eb9722006-06-18 05:43:12138Lexer:
139 * Source character mapping. GCC supports ASCII and UTF-8.
140 See GCC options: -ftarget-charset and -ftarget-wide-charset.
141 * Universal character support. Experimental in GCC, enabled with
142 -fextended-identifiers.
Chris Lattner22eb9722006-06-18 05:43:12143 * -fpreprocessed mode.
144
145Preprocessor:
Chris Lattnera5722f52006-07-29 17:59:42146 * Know enough about darwin filesystem to search frameworks.
Chris Lattner2be41152006-07-29 06:29:39147 * #assert/#unassert
Chris Lattner1f627772006-07-04 17:34:01148 * #line / #file directives
Chris Lattner9e220172006-07-10 02:49:22149 * MSExtension: "L#param" stringizes to a wide string literal.
Chris Lattner4856a422006-10-15 22:34:29150 * Consider merging the parser's expression parser into the preprocessor to
151 eliminate duplicate code.
Chris Lattner22eb9722006-06-18 05:43:12152
153Traditional Preprocessor:
154 * All.
Chris Lattner24fad1a2006-07-28 05:25:01155
Chris Lattner36a48b12006-08-10 20:00:01156Parser:
Chris Lattner87d229a2006-10-06 04:10:25157 * C90/K&R modes. Need to get a copy of the C90 spec.
158 * __extension__, __attribute__ [currently just skipped and ignored].
Chris Lattnerea2f7062006-10-06 05:40:42159 * A lot of semantic analysis is missing.
Chris Lattner36a48b12006-08-10 20:00:01160
Chris Lattner22eb9722006-06-18 05:43:12161Parser Actions:
Chris Lattnerea2f7062006-10-06 05:40:42162 * All that are missing.
Chris Lattner12a81782006-07-14 05:26:56163 * Would like to either lazily resolve types [refactoring] or aggressively
164 resolve them [c compiler]. Need to know whether something is a type or not
165 to compile, but don't need to know what it is.
Chris Lattnerea2f7062006-10-06 05:40:42166 * Implement a little devkit-style "indexer".
167
168AST Builder:
169 * Implement more nodes as actions are available.
170 * Types.
Chris Lattner4856a422006-10-15 22:34:29171 * Allow the AST Builder to be subclassed. This will allow clients to extend it
172 and create their own specialized nodes for specific scenarios. Maybe the
173 "full loc info" use case is just one extension.
Chris Lattnerde0b7f62006-06-18 14:03:39174
175Fast #Import:
176 * All.
177 * Get frameworks that don't use #import to do so, e.g.
Chris Lattner87d229a2006-10-06 04:10:25178 DirectoryService, AudioToolbox, CoreFoundation, etc. Why not using #import?
179 Because they work in C mode? C has #import.
Chris Lattnerde0b7f62006-06-18 14:03:39180 * Have the lexer return a token for #import instead of handling it itself.
181 - Create a new preprocessor object with no external state (no -D/U options
182 from the command line, etc). Alternatively, keep track of exactly which
183 external state is used by a #import: declare it somehow.
184 * When having reading a #import file, keep track of whether we have (and/or
185 which) seen any "configuration" macros. Various cases:
186 - Uses of target args (__POWERPC__, __i386): Header has to be parsed
187 multiple times, per-target. What about #ifndef checks? How do we know?
188 - "Configuration" preprocessor macros not defined: POWERPC, etc. What about
189 things like __STDC__ etc? What is and what isn't allowed.
190 * Special handling for "umbrella" headers, which just contain #import stmts:
191 - Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests
192 themselves? Foundation.h isn't pure umbrella!
193 * Frameworks digests:
194 - Can put "digest" of a framework-worth of headers into the framework
195 itself. To open AppKit, just mmap
196 /System/Library/Frameworks/AppKit.framework/"digest", which provides a
197 symbol table in a well defined format. Lazily unstream stuff that is
198 needed. Contains declarations, macros, and debug information.
199 - System frameworks ship with digests. How do we handle configuration
200 information? How do we handle stuff like:
201 #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2
202 which guards a bunch of decls? Should there be a couple of default
203 configs, then have the UI fall back to building/caching its own?
204 - GUI automatically builds digests when UI is idle, both of system
205 frameworks if they aren't not available in the right config, and of app
206 frameworks.
207 - GUI builds dependence graph of frameworks/digests based on #imports. If a
208 digest is out date, dependent digests are automatically invalidated.
209
210 * New constraints on #import for objc-v3:
211 - #imported file must not define non-inline function bodies.
212 - Alternatively, they can, and these bodies get compiled/linked *once*
213 per app into a dylib. What about building user dylibs?
214 - Restrictions on ObjC grammar: can't #import the body of a for stmt or fn.
215 - Compiler must detect and reject these cases.
216 - #defines defined within a #import have two behaviors:
217 - By default, they escape the header. These macros *cannot* be #undef'd
218 by other code: this is enforced by the front-end.
219 - Optionally, user can specify what macros escape (whitelist) or can use
220 #undef.
221
222New language feature: Configuration queries:
223 - Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or
Chris Lattner56c7a552006-10-14 05:19:00224 some other, better, syntax.
Chris Lattner87d229a2006-10-06 04:10:25225 - Use it to increase the number of "architecture-clean" #import'd files,
226 allowing a single index to be used for all fat slices.
Chris Lattnerde0b7f62006-06-18 14:03:39227
228Cocoa GUI Front-end:
229 * All.
230 * Start with very simple "textedit" GUI.
231 * Trivial project model: list of files, list of cmd line options.
232 * Build simple developer examples.
233 * Tight integration with compiler components.
234 * Primary advantage: batch compiles, keeping digests in memory, dependency mgmt
235 between app frameworks, building code/digests in the background, etc.
Chris Lattnerae411572006-07-05 00:55:08236 * Interesting idea: http://nickgravgaard.com/elastictabstops/
237