blob: 611dc9d2c01c18bdefeaec1d43595116183be055 [file] [log] [blame]
Chris Lattner22eb9722006-06-18 05:43:121//===----------------------------------------------------------------------===//
2// C Language Family Front-end
3//===----------------------------------------------------------------------===//
Chris Lattner19acaad2006-10-06 05:20:104 Chris Lattner
Chris Lattner22eb9722006-06-18 05:43:125
6I. Introduction:
7
8 clang: noun
9 1. A loud, resonant, metallic sound.
10 2. The strident call of a crane or goose.
Chris Lattner125c9f82007-02-11 08:19:3611 3. C-language family front-end toolkit.
Chris Lattner22eb9722006-06-18 05:43:1212
Chris Lattner87d229a2006-10-06 04:10:2513 The world needs better compiler tools, tools which are built as libraries. This
14 design point allows reuse of the tools in new and novel ways. However, building
15 the tools as libraries isn't enough: they must have clean APIs, be as
16 decoupled from each other as possible, and be easy to modify/extend. This
17 requires clean layering, decent design, and avoiding tying the libraries to a
18 specific use. Oh yeah, did I mention that we want the resultant libraries to
19 be as fast as possible? :)
20
Chris Lattnerb5217852007-05-30 17:01:3121 This front-end is built as a component of the LLVM toolkit that can be used
22 with the LLVM backend or independently of it. In this spirit, the API has been
23 carefully designed as the following components:
Chris Lattner87d229a2006-10-06 04:10:2524
25 libsupport - Basic support library, reused from LLVM.
Ted Kremenek4d180b362008-05-09 17:12:4526
Chris Lattner87d229a2006-10-06 04:10:2527 libsystem - System abstraction library, reused from LLVM.
Chris Lattner245368e2007-05-21 17:47:4228
Chris Lattner19acaad2006-10-06 05:20:1029 libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction,
Chris Lattner245368e2007-05-21 17:47:4230 file system caching for input source files. This depends on
31 libsupport and libsystem.
Ted Kremenek4d180b362008-05-09 17:12:4532
Chris Lattner245368e2007-05-21 17:47:4233 libast - Provides classes to represent the C AST, the C type system,
34 builtin functions, and various helpers for analyzing and
35 manipulating the AST (visitors, pretty printers, etc). This
36 library depends on libbasic.
Ted Kremenek4d180b362008-05-09 17:12:4537
38
Chris Lattner19acaad2006-10-06 05:20:1039 liblex - C/C++/ObjC lexing and preprocessing, identifier hash table,
Chris Lattner245368e2007-05-21 17:47:4240 pragma handling, tokens, and macros. This depends on libbasic.
Ted Kremenek4d180b362008-05-09 17:12:4541
Chris Lattnerb5217852007-05-30 17:01:3142 libparse - C (for now) parsing and local semantic analysis. This library
Chris Lattner19acaad2006-10-06 05:20:1043 invokes coarse-grained 'Actions' provided by the client to do
Chris Lattnerb5217852007-05-30 17:01:3144 stuff (e.g. libsema builds ASTs). This depends on liblex.
Ted Kremenek4d180b362008-05-09 17:12:4545
Chris Lattner245368e2007-05-21 17:47:4246 libsema - Provides a set of parser actions to build a standardized AST
47 for programs. AST's are 'streamed' out a top-level declaration
48 at a time, allowing clients to use decl-at-a-time processing,
49 build up entire translation units, or even build 'whole
50 program' ASTs depending on how they use the APIs. This depends
51 on libast and libparse.
Ted Kremenek4d180b362008-05-09 17:12:4552
53 librewrite - Fast, scalable rewriting of source code. This operates on
Ted Kremenek82e8d072008-05-09 17:53:5754 the raw syntactic text of source code, allowing a client
Ted Kremenek4d180b362008-05-09 17:12:4555 to insert and delete text in very large source files using
56 the same source location information embedded in ASTs. This
57 is intended to be a low-level API that is useful for
58 higher-level clients and libraries such as code refactoring.
59
60 libanalysis - Source-level dataflow analysis useful for performing analyses
61 such as computing live variables. It also includes a
62 path-sensitive "graph-reachability" engine for writing
63 analyses that reason about different possible paths of
64 execution through source code. This is currently being
Ted Kremenekbbd46952008-05-09 17:13:1865 employed to write a set of checks for finding bugs in software.
Ted Kremenek4d180b362008-05-09 17:12:4566
Chris Lattnerb5217852007-05-30 17:01:3167 libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends
68 on libast.
Ted Kremenek4d180b362008-05-09 17:12:4569
Chris Lattner125c9f82007-02-11 08:19:3670 clang - An example driver, client of the libraries at various levels.
Chris Lattnerb5217852007-05-30 17:01:3171 This depends on all these libraries, and on LLVM VMCore.
Chris Lattner87d229a2006-10-06 04:10:2572
Chris Lattner49ec4b62007-07-11 18:58:1973 This front-end has been intentionally built as a DAG of libraries, making it
74 easy to reuse individual parts or replace pieces if desired. For example, to
75 build a preprocessor, you take the Basic and Lexer libraries. If you want an
76 indexer, you take those plus the Parser library and provide some actions for
77 indexing. If you want a refactoring, static analysis, or source-to-source
78 compiler tool, it makes sense to take those plus the AST building and semantic
79 analyzer library. Finally, if you want to use this with the LLVM backend,
80 you'd take these components plus the AST to LLVM lowering code.
Chris Lattner87d229a2006-10-06 04:10:2581
82 In the future I hope this toolkit will grow to include new and interesting
Chris Lattner125c9f82007-02-11 08:19:3683 components, including a C++ front-end, ObjC support, and a whole lot of other
84 things.
Chris Lattner87d229a2006-10-06 04:10:2585
86 Finally, it should be pointed out that the goal here is to build something that
87 is high-quality and industrial-strength: all the obnoxious features of the C
88 family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
Chris Lattnera6198b42006-11-05 18:05:2189 prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'.
Chris Lattner22eb9722006-06-18 05:43:1290
Chris Lattnerd504f7d2006-10-06 05:56:1491
92II. Usage of clang driver:
93
94 * Basic Command-Line Options:
95 - Help: clang --help
Chris Lattner110da6972006-10-17 05:20:3096 - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
Chris Lattnera6198b42006-11-05 18:05:2197 - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
Chris Lattner56c7a552006-10-14 05:19:0098 - Enable metric printing: -stats
Chris Lattnerd504f7d2006-10-06 05:56:1499
Chris Lattner49ec4b62007-07-11 18:58:19100 * -fsyntax-only is currently the default mode.
Chris Lattnerd504f7d2006-10-06 05:56:14101
Chris Lattner49ec4b62007-07-11 18:58:19102 * -E mode works the same way as GCC.
Chris Lattnerb5217852007-05-30 17:01:31103
Ted Kremenek7142e182007-08-29 23:26:37104 * -Eonly mode does all preprocessing, but does not print the output,
105 useful for timing the preprocessor.
Chris Lattnerca96b892006-11-05 18:00:10106
Ted Kremenek7142e182007-08-29 23:26:37107 * -fsyntax-only is currently partially implemented, lacking some
108 semantic analysis (some errors and warnings are not produced).
Chris Lattner49ec4b62007-07-11 18:58:19109
Ted Kremenek7142e182007-08-29 23:26:37110 * -parse-noop parses code without building an AST. This is useful
111 for timing the cost of the parser without including AST building
112 time.
Chris Lattnerca96b892006-11-05 18:00:10113
Ted Kremenek7142e182007-08-29 23:26:37114 * -parse-ast builds ASTs, but doesn't print them. This is most
115 useful for timing AST building vs -parse-noop.
Chris Lattnerca96b892006-11-05 18:00:10116
Chris Lattner49ec4b62007-07-11 18:58:19117 * -parse-ast-print pretty prints most expression and statements nodes.
Chris Lattnerd504f7d2006-10-06 05:56:14118
Ted Kremenek7142e182007-08-29 23:26:37119 * -parse-ast-check checks that diagnostic messages that are expected
120 are reported and that those which are reported are expected.
121
122 * -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed.
123
124 * -view-cfg builds ASTs and then CFGs. CFGs are then visualized by
125 invoking Graphviz.
126
127 For more information on getting Graphviz to work with clang/LLVM,
128 see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
Chris Lattnera6198b42006-11-05 18:05:21129
Chris Lattner49ec4b62007-07-11 18:58:19130
Chris Lattnerd504f7d2006-10-06 05:56:14131III. Current advantages over GCC:
Chris Lattner22eb9722006-06-18 05:43:12132
Chris Lattner3ba544e2006-08-12 18:43:54133 * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
Chris Lattnerb5217852007-05-30 17:01:31134 * All diagnostics have column numbers, includes 'caret diagnostics', and they
135 highlight regions of interesting code (e.g. the LHS and RHS of a binop).
Chris Lattner22eb9722006-06-18 05:43:12136 * Full diagnostic customization by client (can format diagnostics however they
Chris Lattnerd504f7d2006-10-06 05:56:14137 like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
Chris Lattner22eb9722006-06-18 05:43:12138 * Built as a framework, can be reused by multiple tools.
139 * All languages supported linked into same library (no cc1,cc1obj, ...).
140 * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
Chris Lattnerb5217852007-05-30 17:01:31141 * LLVM License, can be linked into non-GPL projects.
142 * Full diagnostic control, per diagnostic. Diagnostics are identified by ID.
143 * Significantly faster than GCC at semantic analysis, parsing, preprocessing
144 and lexing.
Chris Lattnerdb9c7f22006-10-31 00:54:25145 * Defers exposing platform-specific stuff to as late as possible, tracks use of
146 platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
Chris Lattner52f3dc12006-11-19 01:17:45147 * The lexer doesn't rely on the "lexer hack": it has no notion of scope and
Chris Lattner125c9f82007-02-11 08:19:36148 does not categorize identifiers as types or variables -- this is up to the
Chris Lattner52f3dc12006-11-19 01:17:45149 parser to decide.
Chris Lattnerdb9c7f22006-10-31 00:54:25150
Chris Lattnerb5217852007-05-30 17:01:31151Potential Future Features:
Chris Lattnerf96a1662006-08-14 00:13:31152
Chris Lattnerae411572006-07-05 00:55:08153 * Fine grained diag control within the source (#pragma enable/disable warning).
Chris Lattner22eb9722006-06-18 05:43:12154 * Better token tracking within macros? (Token came from this line, which is
155 a macro argument instantiated here, recursively instantiated here).
Chris Lattnerb5217852007-05-30 17:01:31156 * Fast #import with a module system.
Chris Lattner2b18b7f2006-08-10 18:48:21157 * Dependency tracking: change to header file doesn't recompile every function
Chris Lattner87d229a2006-10-06 04:10:25158 that texually depends on it: recompile only those functions that need it.
Chris Lattner49ec4b62007-07-11 18:58:19159 This is aka 'incremental parsing'.
Chris Lattner22eb9722006-06-18 05:43:12160
161
Chris Lattnerd504f7d2006-10-06 05:56:14162IV. Missing Functionality / Improvements
163
Chris Lattner22eb9722006-06-18 05:43:12164Lexer:
165 * Source character mapping. GCC supports ASCII and UTF-8.
166 See GCC options: -ftarget-charset and -ftarget-wide-charset.
167 * Universal character support. Experimental in GCC, enabled with
168 -fextended-identifiers.
Chris Lattner22eb9722006-06-18 05:43:12169 * -fpreprocessed mode.
170
171Preprocessor:
Chris Lattner2be41152006-07-29 06:29:39172 * #assert/#unassert
Chris Lattner9e220172006-07-10 02:49:22173 * MSExtension: "L#param" stringizes to a wide string literal.
Chris Lattnerf78e6032006-11-05 17:54:43174 * Add support for -M*
Chris Lattner22eb9722006-06-18 05:43:12175
176Traditional Preprocessor:
Chris Lattner49ec4b62007-07-11 18:58:19177 * Currently, we have none. :)
Chris Lattner24fad1a2006-07-28 05:25:01178