Merge pull request #11 from esorton/bugfix/add-constexpr-keyword-to-arduino-ctags
[arduino-ctags.git] / EXTENDING.html
blob7a7b86537bed3bf6de18b9608c2d50767ca50186
1 <!-- $Id: EXTENDING.html 198 2002-09-04 01:17:32Z darren $ -->
2 <html>
3 <head>
4 <title>Exuberant Ctags: Adding support for a new language</title>
5 </head>
6 <body>
8 <h1>How to Add Support for a New Language to Exuberant Ctags</h1>
10 <p>
11 <b>Exuberant Ctags</b> has been designed to make it very easy to add your own
12 custom language parser. As an exercise, let us assume that I want to add
13 support for my new language, <em>Swine</em>, the successor to Perl (i.e. Perl
14 before Swine &lt;wince&gt;). This language consists of simple definitions of
15 labels in the form "<code>def my_label</code>". Let us now examine the various
16 ways to do this.
17 </p>
19 <h2>Operational background</h2>
21 <p>
22 As ctags considers each file name, it tries to determine the language of the
23 file by applying the following three tests in order: if the file extension has
24 been mapped to a language, if the file name matches a shell pattern mapped to
25 a language, and finally if the file is executable and its first line specifies
26 an interpreter using the Unix-style "#!" specification (if supported on the
27 platform). If a language was identified, the file is opened and then the
28 appropriate language parser is called to operate on the currently open file.
29 The parser parses through the file and whenever it finds some interesting
30 token, calls a function to define a tag entry.
31 </p>
33 <h2>Creating a user-defined language</h2>
35 <p>
36 The quickest and easiest way to do this is by defining a new language using
37 the program options. In order to have Swine support available every time I
38 start ctags, I will place the following lines into the file
39 <code>$HOME/.ctags</code>, which is read in every time ctags starts:
41 <code>
42 <pre>
43 --langdef=swine
44 --langmap=swine:.swn
45 --regex-swine=/^def[ \t]*([a-zA-Z0-9_]+)/\1/d,definition/
46 </pre>
47 </code>
48 The first line defines the new language, the second maps a file extension to
49 it, and the third defines a regular expression to identify a language
50 definition and generate a tag file entry for it.
51 </p>
53 <h2>Integrating a new language parser</h2>
55 <p>
56 Now suppose that I want to truly integrate compiled-in support for Swine into
57 ctags. First, I create a new module, <code>swine.c</code>, and add one
58 externally visible function to it, <code>extern parserDefinition
59 *SwineParser(void)</code>, and add its name to the table in
60 <code>parsers.h</code>. The job of this parser definition function is to
61 create an instance of the <code>parserDefinition</code> structure (using
62 <code>parserNew()</code>) and populate it with information defining how files
63 of this language are recognized, what kinds of tags it can locate, and the
64 function used to invoke the parser on the currently open file.
65 </p>
67 <p>
68 The structure <code>parserDefinition</code> allows assignment of the following
69 fields:
71 <code>
72 <pre>
73 const char *name; /* name of language */
74 kindOption *kinds; /* tag kinds handled by parser */
75 unsigned int kindCount; /* size of `kinds' list */
76 const char *const *extensions; /* list of default extensions */
77 const char *const *patterns; /* list of default file name patterns */
78 parserInitialize initialize; /* initialization routine, if needed */
79 simpleParser parser; /* simple parser (common case) */
80 rescanParser parser2; /* rescanning parser (unusual case) */
81 boolean regex; /* is this a regex parser? */
82 </pre>
83 </code>
84 </p>
86 <p>
87 The <code>name</code> field must be set to a non-empty string. Also, unless
88 <code>regex</code> is set true (see below), either <code>parser</code> or
89 <code>parser2</code> must set to point to a parsing routine which will
90 generate the tag entries. All other fields are optional.
92 <p>
93 Now all that is left is to implement the parser. In order to do its job, the
94 parser should read the file stream using using one of the two I/O interfaces:
95 either the character-oriented <code>fileGetc()</code>, or the line-oriented
96 <code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser
97 can put back a character using <code>fileUngetc()</code>. How our Swine parser
98 actually parses the contents of the file is entirely up to the writer of the
99 parser--it can be as crude or elegant as desired. You will note a variety of
100 examples from the most complex (c.c) to the simplest (make.c).
101 </p>
104 When the Swine parser identifies an interesting token for which it wants to
105 add a tag to the tag file, it should create a <code>tagEntryInfo</code>
106 structure and initialize it by calling <code>initTagEntry()</code>, which
107 initializes defaults and fills information about the current line number and
108 the file position of the beginning of the line. After filling in information
109 defining the current entry (and possibly overriding the file position or other
110 defaults), the parser passes this structure to <code>makeTagEntry()</code>.
111 </p>
114 Instead of writing a character-oriented parser, it may be possible to specify
115 regular expressions which define the tags. In this case, instead of defining a
116 parsing function, <code>SwineParser()</code>, sets <code>regex</code> to true,
117 and points <code>initialize</code> to a function which calls
118 <code>addTagRegex()</code> to install the regular expressions which define its
119 tags. The regular expressions thus installed are compared against each line
120 of the input file and generate a specified tag when matched. It is usually
121 much easier to write a regex-based parser, although they can be slower (one
122 parser example was 4 times slower). Whether the speed difference matters to
123 you depends upon how much code you have to parse. It is probably a good
124 strategy to implement a regex-based parser first, and if it is too slow for
125 you, then invest the time and effort to write a character-based parser.
126 </p>
129 A regex-based parser is inherently line-oriented (i.e. the entire tag must be
130 recognizable from looking at a single line) and context-insensitive (i.e the
131 generation of the tag is entirely based upon when the regular expression
132 matches a single line). However, a regex-based callback mechanism is also
133 available, installed via the function <code>addCallbackRegex()</code>. This
134 allows a specified function to be invoked whenever a specific regular
135 expression is matched. This allows a character-oriented parser to operate
136 based upon context of what happened on a previous line (e.g. the start or end
137 of a multi-line comment). Note that regex callbacks are called just before the
138 first character of that line can is read via either <code>fileGetc()</code> or
139 using <code>fileGetc()</code>. The effect of this is that before either of
140 these routines return, a callback routine may be invoked because the line
141 matched a regex callback. A callback function to be installed is defined by
142 these types:
144 <code>
145 <pre>
146 typedef void (*regexCallback) (const char *line, const regexMatch *matches, unsigned int count);
148 typedef struct {
149 size_t start; /* character index in line where match starts */
150 size_t length; /* length of match */
151 } regexMatch;
152 </pre>
153 </code>
154 </p>
157 The callback function is passed the line matching the regular expression and
158 an array of <code>count</code> structures defining the subexpression matches
159 of the regular expression, starting from \0 (the entire line).
160 </p>
163 Lastly, be sure to add your the name of the file containing your parser (e.g.
164 swine.c) to the macro <code>SOURCES</code> in the file <code>source.mak</code>
165 and an entry for the object file to the macro <code>OBJECTS</code> in the same
166 file, so that your new module will be compiled into the program.
167 </p>
170 This is all there is to it. All other details are specific to the parser and
171 how it wants to do its job. There are some support functions which can take
172 care of some commonly needed parsing tasks, such as keyword table lookups (see
173 keyword.c), which you can make use of if desired (examples of its use can be
174 found in c.c, eiffel.c, and fortran.c). Almost everything is already taken care
175 of automatically for you by the infrastructure. Writing the actual parsing
176 algorithm is the hardest part, but is not constrained by any need to conform
177 to anything in ctags other than that mentioned above.
178 </p>
181 There are several different approaches used in the parsers inside <b>Exuberant
182 Ctags</b> and you can browse through these as examples of how to go about
183 creating your own.
184 </p>
186 <h2>Examples</h2>
189 Below you will find several example parsers demonstrating most of the
190 facilities available. These include three alternative implementations
191 of a Swine parser, which generate tags for lines beginning with
192 "<CODE>def</CODE>" followed by some name.
193 </p>
195 <code>
196 <pre>
197 /***************************************************************************
198 * swine.c
199 * Character-based parser for Swine definitions
200 **************************************************************************/
201 /* INCLUDE FILES */
202 #include "general.h" /* always include first */
204 #include &lt;string.h&gt; /* to declare strxxx() functions */
205 #include &lt;ctype.h&gt; /* to define isxxx() macros */
207 #include "parse.h" /* always include */
208 #include "read.h" /* to define file fileReadLine() */
210 /* DATA DEFINITIONS */
211 typedef enum eSwineKinds {
212 K_DEFINE
213 } swineKind;
215 static kindOption SwineKinds [] = {
216 { TRUE, 'd', "definition", "pig definition" }
219 /* FUNCTION DEFINITIONS */
221 static void findSwineTags (void)
223 vString *name = vStringNew ();
224 const unsigned char *line;
226 while ((line = fileReadLine ()) != NULL)
228 /* Look for a line beginning with "def" followed by name */
229 if (strncmp ((const char*) line, "def", (size_t) 3) == 0 &amp;&amp;
230 isspace ((int) line [3]))
232 const unsigned char *cp = line + 4;
233 while (isspace ((int) *cp))
234 ++cp;
235 while (isalnum ((int) *cp) || *cp == '_')
237 vStringPut (name, (int) *cp);
238 ++cp;
240 vStringTerminate (name);
241 makeSimpleTag (name, SwineKinds, K_DEFINE);
242 vStringClear (name);
245 vStringDelete (name);
248 /* Create parser definition stucture */
249 extern parserDefinition* SwineParser (void)
251 static const char *const extensions [] = { "swn", NULL };
252 parserDefinition* def = parserNew ("Swine");
253 def-&gt;kinds = SwineKinds;
254 def-&gt;kindCount = KIND_COUNT (SwineKinds);
255 def-&gt;extensions = extensions;
256 def-&gt;parser = findSwineTags;
257 return def;
259 </pre>
260 </code>
263 <pre>
264 <code>
265 /***************************************************************************
266 * swine.c
267 * Regex-based parser for Swine
268 **************************************************************************/
269 /* INCLUDE FILES */
270 #include "general.h" /* always include first */
271 #include "parse.h" /* always include */
273 /* FUNCTION DEFINITIONS */
275 static void installSwineRegex (const langType language)
277 addTagRegex (language, "^def[ \t]*([a-zA-Z0-9_]+)", "\\1", "d,definition", NULL);
280 /* Create parser definition stucture */
281 extern parserDefinition* SwineParser (void)
283 static const char *const extensions [] = { "swn", NULL };
284 parserDefinition* def = parserNew ("Swine");
285 parserDefinition* const def = parserNew ("Makefile");
286 def-&gt;patterns = patterns;
287 def-&gt;extensions = extensions;
288 def-&gt;initialize = installMakefileRegex;
289 def-&gt;regex = TRUE;
290 return def;
292 </code>
293 </pre>
296 <pre>
297 /***************************************************************************
298 * swine.c
299 * Regex callback-based parser for Swine definitions
300 **************************************************************************/
301 /* INCLUDE FILES */
302 #include "general.h" /* always include first */
304 #include "parse.h" /* always include */
305 #include "read.h" /* to define file fileReadLine() */
307 /* DATA DEFINITIONS */
308 typedef enum eSwineKinds {
309 K_DEFINE
310 } swineKind;
312 static kindOption SwineKinds [] = {
313 { TRUE, 'd', "definition", "pig definition" }
316 /* FUNCTION DEFINITIONS */
318 static void definition (const char *const line, const regexMatch *const matches,
319 const unsigned int count)
321 if (count &gt; 1) /* should always be true per regex */
323 vString *const name = vStringNew ();
324 vStringNCopyS (name, line + matches [1].start, matches [1].length);
325 makeSimpleTag (name, SwineKinds, K_DEFINE);
329 static void findSwineTags (void)
331 while (fileReadLine () != NULL)
332 ; /* don't need to do anything here since callback is sufficient */
335 static void installSwine (const langType language)
337 addCallbackRegex (language, "^def[ \t]+([a-zA-Z0-9_]+)", NULL, definition);
340 /* Create parser definition stucture */
341 extern parserDefinition* SwineParser (void)
343 static const char *const extensions [] = { "swn", NULL };
344 parserDefinition* def = parserNew ("Swine");
345 def-&gt;kinds = SwineKinds;
346 def-&gt;kindCount = KIND_COUNT (SwineKinds);
347 def-&gt;extensions = extensions;
348 def-&gt;parser = findSwineTags;
349 def-&gt;initialize = installSwine;
350 return def;
352 </pre>
355 <pre>
356 /***************************************************************************
357 * make.c
358 * Regex-based parser for makefile macros
359 **************************************************************************/
360 /* INCLUDE FILES */
361 #include "general.h" /* always include first */
362 #include "parse.h" /* always include */
364 /* FUNCTION DEFINITIONS */
366 static void installMakefileRegex (const langType language)
368 addTagRegex (language, "(^|[ \t])([A-Z0-9_]+)[ \t]*:?=", "\\2", "m,macro", "i");
371 /* Create parser definition stucture */
372 extern parserDefinition* MakefileParser (void)
374 static const char *const patterns [] = { "[Mm]akefile", NULL };
375 static const char *const extensions [] = { "mak", NULL };
376 parserDefinition* const def = parserNew ("Makefile");
377 def-&gt;patterns = patterns;
378 def-&gt;extensions = extensions;
379 def-&gt;initialize = installMakefileRegex;
380 def-&gt;regex = TRUE;
381 return def;
383 </pre>
385 </body>
386 </html>