Spelling fixes.
[emacs.git] / lisp / cedet / semantic / lex.el
bloba9a5ab586d0ce86dad0b9f0f42f9f12ba718f845
1 ;;; semantic/lex.el --- Lexical Analyzer builder
3 ;; Copyright (C) 1999-2011 Free Software Foundation, Inc.
5 ;; Author: Eric M. Ludlam <zappo@gnu.org>
7 ;; This file is part of GNU Emacs.
9 ;; GNU Emacs is free software: you can redistribute it and/or modify
10 ;; it under the terms of the GNU General Public License as published by
11 ;; the Free Software Foundation, either version 3 of the License, or
12 ;; (at your option) any later version.
14 ;; GNU Emacs is distributed in the hope that it will be useful,
15 ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 ;; GNU General Public License for more details.
19 ;; You should have received a copy of the GNU General Public License
20 ;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>.
22 ;;; Commentary:
24 ;; This file handles the creation of lexical analyzers for different
25 ;; languages in Emacs Lisp. The purpose of a lexical analyzer is to
26 ;; convert a buffer into a list of lexical tokens. Each token
27 ;; contains the token class (such as 'number, 'symbol, 'IF, etc) and
28 ;; the location in the buffer it was found. Optionally, a token also
29 ;; contains a string representing what is at the designated buffer
30 ;; location.
32 ;; Tokens are pushed onto a token stream, which is basically a list of
33 ;; all the lexical tokens from the analyzed region. The token stream
34 ;; is then handed to the grammar which parsers the file.
36 ;;; How it works
38 ;; Each analyzer specifies a condition and forms. These conditions
39 ;; and forms are assembled into a function by `define-lex' that does
40 ;; the lexical analysis.
42 ;; In the lexical analyzer created with `define-lex', each condition
43 ;; is tested for a given point. When the condition is true, the forms
44 ;; run.
46 ;; The forms can push a lexical token onto the token stream. The
47 ;; analyzer forms also must move the current analyzer point. If the
48 ;; analyzer point is moved without pushing a token, then the matched
49 ;; syntax is effectively ignored, or skipped.
51 ;; Thus, starting at the beginning of a region to be analyzed, each
52 ;; condition is tested. One will match, and a lexical token might be
53 ;; pushed, and the point is moved to the end of the lexical token
54 ;; identified. At the new position, the process occurs again until
55 ;; the end of the specified region is reached.
57 ;;; How to use semantic-lex
59 ;; To create a lexer for a language, use the `define-lex' macro.
61 ;; The `define-lex' macro accepts a list of lexical analyzers. Each
62 ;; analyzer is created with `define-lex-analyzer', or one of the
63 ;; derivative macros. A single analyzer defines a regular expression
64 ;; to match text in a buffer, and a short segment of code to create
65 ;; one lexical token.
67 ;; Each analyzer has a NAME, DOC, a CONDITION, and possibly some
68 ;; FORMS. The NAME is the name used in `define-lex'. The DOC
69 ;; describes what the analyzer should do.
71 ;; The CONDITION evaluates the text at the current point in the
72 ;; current buffer. If CONDITION is true, then the FORMS will be
73 ;; executed.
75 ;; The purpose of the FORMS is to push new lexical tokens onto the
76 ;; list of tokens for the current buffer, and to move point after the
77 ;; matched text.
79 ;; Some macros for creating one analyzer are:
81 ;; define-lex-analyzer - A generic analyzer associating any style of
82 ;; condition to forms.
83 ;; define-lex-regex-analyzer - Matches a regular expression.
84 ;; define-lex-simple-regex-analyzer - Matches a regular expressions,
85 ;; and pushes the match.
86 ;; define-lex-block-analyzer - Matches list syntax, and defines
87 ;; handles open/close delimiters.
89 ;; These macros are used by the grammar compiler when lexical
90 ;; information is specified in a grammar:
91 ;; define-lex- * -type-analyzer - Matches syntax specified in
92 ;; a grammar, and pushes one token for it. The * would
93 ;; be `sexp' for things like lists or strings, and
94 ;; `string' for things that need to match some special
95 ;; string, such as "\\." where a literal match is needed.
97 ;;; Lexical Tables
99 ;; There are tables of different symbols managed in semantic-lex.el.
100 ;; They are:
102 ;; Lexical keyword table - A Table of symbols declared in a grammar
103 ;; file with the %keyword declaration.
104 ;; Keywords are used by `semantic-lex-symbol-or-keyword'
105 ;; to create lexical tokens based on the keyword.
107 ;; Lexical type table - A table of symbols declared in a grammar
108 ;; file with the %type declaration.
109 ;; The grammar compiler uses the type table to create new
110 ;; lexical analyzers. These analyzers are then used to when
111 ;; a new lexical analyzer is made for a language.
113 ;;; Lexical Types
115 ;; A lexical type defines a kind of lexical analyzer that will be
116 ;; automatically generated from a grammar file based on some
117 ;; predetermined attributes. For now these two attributes are
118 ;; recognized :
120 ;; * matchdatatype : define the kind of lexical analyzer. That is :
122 ;; - regexp : define a regexp analyzer (see
123 ;; `define-lex-regex-type-analyzer')
125 ;; - string : define a string analyzer (see
126 ;; `define-lex-string-type-analyzer')
128 ;; - block : define a block type analyzer (see
129 ;; `define-lex-block-type-analyzer')
131 ;; - sexp : define a sexp analyzer (see
132 ;; `define-lex-sexp-type-analyzer')
134 ;; - keyword : define a keyword analyzer (see
135 ;; `define-lex-keyword-type-analyzer')
137 ;; * syntax : define the syntax that matches a syntactic
138 ;; expression. When syntax is matched the corresponding type
139 ;; analyzer is entered and the resulting match data will be
140 ;; interpreted based on the kind of analyzer (see matchdatatype
141 ;; above).
143 ;; The following lexical types are predefined :
145 ;; +-------------+---------------+--------------------------------+
146 ;; | type | matchdatatype | syntax |
147 ;; +-------------+---------------+--------------------------------+
148 ;; | punctuation | string | "\\(\\s.\\|\\s$\\|\\s'\\)+" |
149 ;; | keyword | keyword | "\\(\\sw\\|\\s_\\)+" |
150 ;; | symbol | regexp | "\\(\\sw\\|\\s_\\)+" |
151 ;; | string | sexp | "\\s\"" |
152 ;; | number | regexp | semantic-lex-number-expression |
153 ;; | block | block | "\\s(\\|\\s)" |
154 ;; +-------------+---------------+--------------------------------+
156 ;; In a grammar you must use a %type expression to automatically generate
157 ;; the corresponding analyzers of that type.
159 ;; Here is an example to auto-generate punctuation analyzers
160 ;; with 'matchdatatype and 'syntax predefined (see table above)
162 ;; %type <punctuation> ;; will auto-generate this kind of analyzers
164 ;; It is equivalent to write :
166 ;; %type <punctuation> syntax "\\(\\s.\\|\\s$\\|\\s'\\)+" matchdatatype string
168 ;; ;; Some punctuations based on the type defines above
170 ;; %token <punctuation> NOT "!"
171 ;; %token <punctuation> NOTEQ "!="
172 ;; %token <punctuation> MOD "%"
173 ;; %token <punctuation> MODEQ "%="
176 ;;; On the Semantic 1.x lexer
178 ;; In semantic 1.x, the lexical analyzer was an all purpose routine.
179 ;; To boost efficiency, the analyzer is now a series of routines that
180 ;; are constructed at build time into a single routine. This will
181 ;; eliminate unneeded if statements to speed the lexer.
183 (require 'semantic/fw)
185 ;;; Code:
187 ;;; Semantic 2.x lexical analysis
189 (defun semantic-lex-map-symbols (fun table &optional property)
190 "Call function FUN on every symbol in TABLE.
191 If optional PROPERTY is non-nil, call FUN only on every symbol which
192 as a PROPERTY value. FUN receives a symbol as argument."
193 (if (arrayp table)
194 (mapatoms
195 #'(lambda (symbol)
196 (if (or (null property) (get symbol property))
197 (funcall fun symbol)))
198 table)))
200 ;;; Lexical keyword table handling.
202 ;; These keywords are keywords defined for using in a grammar with the
203 ;; %keyword declaration, and are not keywords used in Emacs Lisp.
205 (defvar semantic-flex-keywords-obarray nil
206 "Buffer local keyword obarray for the lexical analyzer.
207 These keywords are matched explicitly, and converted into special symbols.")
208 (make-variable-buffer-local 'semantic-flex-keywords-obarray)
210 (defmacro semantic-lex-keyword-invalid (name)
211 "Signal that NAME is an invalid keyword name."
212 `(signal 'wrong-type-argument '(semantic-lex-keyword-p ,name)))
214 (defsubst semantic-lex-keyword-symbol (name)
215 "Return keyword symbol with NAME or nil if not found."
216 (and (arrayp semantic-flex-keywords-obarray)
217 (stringp name)
218 (intern-soft name semantic-flex-keywords-obarray)))
220 (defsubst semantic-lex-keyword-p (name)
221 "Return non-nil if a keyword with NAME exists in the keyword table.
222 Return nil otherwise."
223 (and (setq name (semantic-lex-keyword-symbol name))
224 (symbol-value name)))
226 (defsubst semantic-lex-keyword-set (name value)
227 "Set value of keyword with NAME to VALUE and return VALUE."
228 (set (intern name semantic-flex-keywords-obarray) value))
230 (defsubst semantic-lex-keyword-value (name)
231 "Return value of keyword with NAME.
232 Signal an error if a keyword with NAME does not exist."
233 (let ((keyword (semantic-lex-keyword-symbol name)))
234 (if keyword
235 (symbol-value keyword)
236 (semantic-lex-keyword-invalid name))))
238 (defsubst semantic-lex-keyword-put (name property value)
239 "For keyword with NAME, set its PROPERTY to VALUE."
240 (let ((keyword (semantic-lex-keyword-symbol name)))
241 (if keyword
242 (put keyword property value)
243 (semantic-lex-keyword-invalid name))))
245 (defsubst semantic-lex-keyword-get (name property)
246 "For keyword with NAME, return its PROPERTY value."
247 (let ((keyword (semantic-lex-keyword-symbol name)))
248 (if keyword
249 (get keyword property)
250 (semantic-lex-keyword-invalid name))))
252 (defun semantic-lex-make-keyword-table (specs &optional propspecs)
253 "Convert keyword SPECS into an obarray and return it.
254 SPECS must be a list of (NAME . TOKSYM) elements, where:
256 NAME is the name of the keyword symbol to define.
257 TOKSYM is the lexical token symbol of that keyword.
259 If optional argument PROPSPECS is non nil, then interpret it, and
260 apply those properties.
261 PROPSPECS must be a list of (NAME PROPERTY VALUE) elements."
262 ;; Create the symbol hash table
263 (let ((semantic-flex-keywords-obarray (make-vector 13 0))
264 spec)
265 ;; fill it with stuff
266 (while specs
267 (setq spec (car specs)
268 specs (cdr specs))
269 (semantic-lex-keyword-set (car spec) (cdr spec)))
270 ;; Apply all properties
271 (while propspecs
272 (setq spec (car propspecs)
273 propspecs (cdr propspecs))
274 (semantic-lex-keyword-put (car spec) (nth 1 spec) (nth 2 spec)))
275 semantic-flex-keywords-obarray))
277 (defsubst semantic-lex-map-keywords (fun &optional property)
278 "Call function FUN on every lexical keyword.
279 If optional PROPERTY is non-nil, call FUN only on every keyword which
280 as a PROPERTY value. FUN receives a lexical keyword as argument."
281 (semantic-lex-map-symbols
282 fun semantic-flex-keywords-obarray property))
284 (defun semantic-lex-keywords (&optional property)
285 "Return a list of lexical keywords.
286 If optional PROPERTY is non-nil, return only keywords which have a
287 PROPERTY set."
288 (let (keywords)
289 (semantic-lex-map-keywords
290 #'(lambda (symbol) (setq keywords (cons symbol keywords)))
291 property)
292 keywords))
294 ;;; Inline functions:
296 (defvar semantic-lex-unterminated-syntax-end-function)
297 (defvar semantic-lex-analysis-bounds)
298 (defvar semantic-lex-end-point)
300 (defsubst semantic-lex-token-bounds (token)
301 "Fetch the start and end locations of the lexical token TOKEN.
302 Return a pair (START . END)."
303 (if (not (numberp (car (cdr token))))
304 (cdr (cdr token))
305 (cdr token)))
307 (defsubst semantic-lex-token-start (token)
308 "Fetch the start position of the lexical token TOKEN.
309 See also the function `semantic-lex-token'."
310 (car (semantic-lex-token-bounds token)))
312 (defsubst semantic-lex-token-end (token)
313 "Fetch the end position of the lexical token TOKEN.
314 See also the function `semantic-lex-token'."
315 (cdr (semantic-lex-token-bounds token)))
317 (defsubst semantic-lex-unterminated-syntax-detected (syntax)
318 "Inside a lexical analyzer, use this when unterminated syntax was found.
319 Argument SYNTAX indicates the type of syntax that is unterminated.
320 The job of this function is to move (point) to a new logical location
321 so that analysis can continue, if possible."
322 (goto-char
323 (funcall semantic-lex-unterminated-syntax-end-function
324 syntax
325 (car semantic-lex-analysis-bounds)
326 (cdr semantic-lex-analysis-bounds)
328 (setq semantic-lex-end-point (point)))
330 ;;; Type table handling.
332 ;; The lexical type table manages types that occur in a grammar file
333 ;; with the %type declaration. Types represent different syntaxes.
334 ;; See code for `semantic-lex-preset-default-types' for the classic
335 ;; types of syntax.
336 (defvar semantic-lex-types-obarray nil
337 "Buffer local types obarray for the lexical analyzer.")
338 (make-variable-buffer-local 'semantic-lex-types-obarray)
340 (defmacro semantic-lex-type-invalid (type)
341 "Signal that TYPE is an invalid lexical type name."
342 `(signal 'wrong-type-argument '(semantic-lex-type-p ,type)))
344 (defsubst semantic-lex-type-symbol (type)
345 "Return symbol with TYPE or nil if not found."
346 (and (arrayp semantic-lex-types-obarray)
347 (stringp type)
348 (intern-soft type semantic-lex-types-obarray)))
350 (defsubst semantic-lex-type-p (type)
351 "Return non-nil if a symbol with TYPE name exists."
352 (and (setq type (semantic-lex-type-symbol type))
353 (symbol-value type)))
355 (defsubst semantic-lex-type-set (type value)
356 "Set value of symbol with TYPE name to VALUE and return VALUE."
357 (set (intern type semantic-lex-types-obarray) value))
359 (defsubst semantic-lex-type-value (type &optional noerror)
360 "Return value of symbol with TYPE name.
361 If optional argument NOERROR is non-nil return nil if a symbol with
362 TYPE name does not exist. Otherwise signal an error."
363 (let ((sym (semantic-lex-type-symbol type)))
364 (if sym
365 (symbol-value sym)
366 (unless noerror
367 (semantic-lex-type-invalid type)))))
369 (defsubst semantic-lex-type-put (type property value &optional add)
370 "For symbol with TYPE name, set its PROPERTY to VALUE.
371 If optional argument ADD is non-nil, create a new symbol with TYPE
372 name if it does not already exist. Otherwise signal an error."
373 (let ((sym (semantic-lex-type-symbol type)))
374 (unless sym
375 (or add (semantic-lex-type-invalid type))
376 (semantic-lex-type-set type nil)
377 (setq sym (semantic-lex-type-symbol type)))
378 (put sym property value)))
380 (defsubst semantic-lex-type-get (type property &optional noerror)
381 "For symbol with TYPE name, return its PROPERTY value.
382 If optional argument NOERROR is non-nil return nil if a symbol with
383 TYPE name does not exist. Otherwise signal an error."
384 (let ((sym (semantic-lex-type-symbol type)))
385 (if sym
386 (get sym property)
387 (unless noerror
388 (semantic-lex-type-invalid type)))))
390 (defun semantic-lex-preset-default-types ()
391 "Install useful default properties for well known types."
392 (semantic-lex-type-put "punctuation" 'matchdatatype 'string t)
393 (semantic-lex-type-put "punctuation" 'syntax "\\(\\s.\\|\\s$\\|\\s'\\)+")
394 (semantic-lex-type-put "keyword" 'matchdatatype 'keyword t)
395 (semantic-lex-type-put "keyword" 'syntax "\\(\\sw\\|\\s_\\)+")
396 (semantic-lex-type-put "symbol" 'matchdatatype 'regexp t)
397 (semantic-lex-type-put "symbol" 'syntax "\\(\\sw\\|\\s_\\)+")
398 (semantic-lex-type-put "string" 'matchdatatype 'sexp t)
399 (semantic-lex-type-put "string" 'syntax "\\s\"")
400 (semantic-lex-type-put "number" 'matchdatatype 'regexp t)
401 (semantic-lex-type-put "number" 'syntax 'semantic-lex-number-expression)
402 (semantic-lex-type-put "block" 'matchdatatype 'block t)
403 (semantic-lex-type-put "block" 'syntax "\\s(\\|\\s)")
406 (defun semantic-lex-make-type-table (specs &optional propspecs)
407 "Convert type SPECS into an obarray and return it.
408 SPECS must be a list of (TYPE . TOKENS) elements, where:
410 TYPE is the name of the type symbol to define.
411 TOKENS is an list of (TOKSYM . MATCHER) elements, where:
413 TOKSYM is any lexical token symbol.
414 MATCHER is a string or regexp a text must match to be a such
415 lexical token.
417 If optional argument PROPSPECS is non nil, then interpret it, and
418 apply those properties.
419 PROPSPECS must be a list of (TYPE PROPERTY VALUE)."
420 ;; Create the symbol hash table
421 (let* ((semantic-lex-types-obarray (make-vector 13 0))
422 spec type tokens token alist default)
423 ;; fill it with stuff
424 (while specs
425 (setq spec (car specs)
426 specs (cdr specs)
427 type (car spec)
428 tokens (cdr spec)
429 default nil
430 alist nil)
431 (while tokens
432 (setq token (car tokens)
433 tokens (cdr tokens))
434 (if (cdr token)
435 (setq alist (cons token alist))
436 (setq token (car token))
437 (if default
438 (message
439 "*Warning* default value of <%s> tokens changed to %S, was %S"
440 type default token))
441 (setq default token)))
442 ;; Ensure the default matching spec is the first one.
443 (semantic-lex-type-set type (cons default (nreverse alist))))
444 ;; Install useful default types & properties
445 (semantic-lex-preset-default-types)
446 ;; Apply all properties
447 (while propspecs
448 (setq spec (car propspecs)
449 propspecs (cdr propspecs))
450 ;; Create the type if necessary.
451 (semantic-lex-type-put (car spec) (nth 1 spec) (nth 2 spec) t))
452 semantic-lex-types-obarray))
454 (defsubst semantic-lex-map-types (fun &optional property)
455 "Call function FUN on every lexical type.
456 If optional PROPERTY is non-nil, call FUN only on every type symbol
457 which as a PROPERTY value. FUN receives a type symbol as argument."
458 (semantic-lex-map-symbols
459 fun semantic-lex-types-obarray property))
461 (defun semantic-lex-types (&optional property)
462 "Return a list of lexical type symbols.
463 If optional PROPERTY is non-nil, return only type symbols which have
464 PROPERTY set."
465 (let (types)
466 (semantic-lex-map-types
467 #'(lambda (symbol) (setq types (cons symbol types)))
468 property)
469 types))
471 ;;; Lexical Analyzer framework settings
474 (defvar semantic-lex-analyzer 'semantic-flex
475 "The lexical analyzer used for a given buffer.
476 See `semantic-lex' for documentation.
477 For compatibility with Semantic 1.x it defaults to `semantic-flex'.")
478 (make-variable-buffer-local 'semantic-lex-analyzer)
480 (defvar semantic-lex-tokens
482 (bol)
483 (charquote)
484 (close-paren)
485 (comment)
486 (newline)
487 (open-paren)
488 (punctuation)
489 (semantic-list)
490 (string)
491 (symbol)
492 (whitespace)
494 "An alist of of semantic token types.
495 As of December 2001 (semantic 1.4beta13), this variable is not used in
496 any code. The only use is to refer to the doc-string from elsewhere.
498 The key to this alist is the symbol representing token type that
499 \\[semantic-flex] returns. These are
501 - bol: Empty string matching a beginning of line.
502 This token is produced with
503 `semantic-lex-beginning-of-line'.
505 - charquote: String sequences that match `\\s\\+' regexp.
506 This token is produced with `semantic-lex-charquote'.
508 - close-paren: Characters that match `\\s)' regexp.
509 These are typically `)', `}', `]', etc.
510 This token is produced with
511 `semantic-lex-close-paren'.
513 - comment: A comment chunk. These token types are not
514 produced by default.
515 This token is produced with `semantic-lex-comments'.
516 Comments are ignored with `semantic-lex-ignore-comments'.
517 Comments are treated as whitespace with
518 `semantic-lex-comments-as-whitespace'.
520 - newline Characters matching `\\s-*\\(\n\\|\\s>\\)' regexp.
521 This token is produced with `semantic-lex-newline'.
523 - open-paren: Characters that match `\\s(' regexp.
524 These are typically `(', `{', `[', etc.
525 If `semantic-lex-paren-or-list' is used,
526 then `open-paren' is not usually generated unless
527 the `depth' argument to \\[semantic-lex] is
528 greater than 0.
529 This token is always produced if the analyzer
530 `semantic-lex-open-paren' is used.
532 - punctuation: Characters matching `{\\(\\s.\\|\\s$\\|\\s'\\)'
533 regexp.
534 This token is produced with `semantic-lex-punctuation'.
535 Always specify this analyzer after the comment
536 analyzer.
538 - semantic-list: String delimited by matching parenthesis, braces,
539 etc. that the lexer skipped over, because the
540 `depth' parameter to \\[semantic-flex] was not high
541 enough.
542 This token is produced with `semantic-lex-paren-or-list'.
544 - string: Quoted strings, i.e., string sequences that start
545 and end with characters matching `\\s\"'
546 regexp. The lexer relies on @code{forward-sexp} to
547 find the matching end.
548 This token is produced with `semantic-lex-string'.
550 - symbol: String sequences that match `\\(\\sw\\|\\s_\\)+'
551 regexp.
552 This token is produced with
553 `semantic-lex-symbol-or-keyword'. Always add this analyzer
554 after `semantic-lex-number', or other analyzers that
555 match its regular expression.
557 - whitespace: Characters that match `\\s-+' regexp.
558 This token is produced with `semantic-lex-whitespace'.")
560 (defvar semantic-lex-syntax-modifications nil
561 "Changes to the syntax table for this buffer.
562 These changes are active only while the buffer is being flexed.
563 This is a list where each element has the form:
564 (CHAR CLASS)
565 CHAR is the char passed to `modify-syntax-entry',
566 and CLASS is the string also passed to `modify-syntax-entry' to define
567 what syntax class CHAR has.")
568 (make-variable-buffer-local 'semantic-lex-syntax-modifications)
570 (defvar semantic-lex-syntax-table nil
571 "Syntax table used by lexical analysis.
572 See also `semantic-lex-syntax-modifications'.")
573 (make-variable-buffer-local 'semantic-lex-syntax-table)
575 (defvar semantic-lex-comment-regex nil
576 "Regular expression for identifying comment start during lexical analysis.
577 This may be automatically set when semantic initializes in a mode, but
578 may need to be overridden for some special languages.")
579 (make-variable-buffer-local 'semantic-lex-comment-regex)
581 (defvar semantic-lex-number-expression
582 ;; This expression was written by David Ponce for Java, and copied
583 ;; here for C and any other similar language.
584 (eval-when-compile
585 (concat "\\("
586 "\\<[0-9]+[.][0-9]+\\([eE][-+]?[0-9]+\\)?[fFdD]?\\>"
587 "\\|"
588 "\\<[0-9]+[.][eE][-+]?[0-9]+[fFdD]?\\>"
589 "\\|"
590 "\\<[0-9]+[.][fFdD]\\>"
591 "\\|"
592 "\\<[0-9]+[.]"
593 "\\|"
594 "[.][0-9]+\\([eE][-+]?[0-9]+\\)?[fFdD]?\\>"
595 "\\|"
596 "\\<[0-9]+[eE][-+]?[0-9]+[fFdD]?\\>"
597 "\\|"
598 "\\<0[xX][0-9a-fA-F]+[lL]?\\>"
599 "\\|"
600 "\\<[0-9]+[lLfFdD]?\\>"
601 "\\)"
603 "Regular expression for matching a number.
604 If this value is nil, no number extraction is done during lex.
605 This expression tries to match C and Java like numbers.
607 DECIMAL_LITERAL:
608 [1-9][0-9]*
610 HEX_LITERAL:
611 0[xX][0-9a-fA-F]+
613 OCTAL_LITERAL:
614 0[0-7]*
616 INTEGER_LITERAL:
617 <DECIMAL_LITERAL>[lL]?
618 | <HEX_LITERAL>[lL]?
619 | <OCTAL_LITERAL>[lL]?
621 EXPONENT:
622 [eE][+-]?[09]+
624 FLOATING_POINT_LITERAL:
625 [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
626 | [.][0-9]+<EXPONENT>?[fFdD]?
627 | [0-9]+<EXPONENT>[fFdD]?
628 | [0-9]+<EXPONENT>?[fFdD]
630 (make-variable-buffer-local 'semantic-lex-number-expression)
632 (defvar semantic-lex-depth 0
633 "Default lexing depth.
634 This specifies how many lists to create tokens in.")
635 (make-variable-buffer-local 'semantic-lex-depth)
637 (defvar semantic-lex-unterminated-syntax-end-function
638 (lambda (syntax syntax-start lex-end) lex-end)
639 "Function called when unterminated syntax is encountered.
640 This should be set to one function. That function should take three
641 parameters. The SYNTAX, or type of syntax which is unterminated.
642 SYNTAX-START where the broken syntax begins.
643 LEX-END is where the lexical analysis was asked to end.
644 This function can be used for languages that can intelligently fix up
645 broken syntax, or the exit lexical analysis via `throw' or `signal'
646 when finding unterminated syntax.")
648 ;;; Interactive testing commands
650 (declare-function semantic-elapsed-time "semantic")
652 (defun semantic-lex-test (arg)
653 "Test the semantic lexer in the current buffer.
654 If universal argument ARG, then try the whole buffer."
655 (interactive "P")
656 (require 'semantic)
657 (let* ((start (current-time))
658 (result (semantic-lex
659 (if arg (point-min) (point))
660 (point-max)))
661 (end (current-time)))
662 (message "Elapsed Time: %.2f seconds."
663 (semantic-elapsed-time start end))
664 (pop-to-buffer "*Lexer Output*")
665 (require 'pp)
666 (erase-buffer)
667 (insert (pp-to-string result))
668 (goto-char (point-min))
671 (defvar semantic-lex-debug nil
672 "When non-nil, debug the local lexical analyzer.")
674 (defun semantic-lex-debug (arg)
675 "Debug the semantic lexer in the current buffer.
676 Argument ARG specifies of the analyze the whole buffer, or start at point.
677 While engaged, each token identified by the lexer will be highlighted
678 in the target buffer A description of the current token will be
679 displayed in the minibuffer. Press SPC to move to the next lexical token."
680 (interactive "P")
681 (require 'semantic/debug)
682 (let ((semantic-lex-debug t))
683 (semantic-lex-test arg)))
685 (defun semantic-lex-highlight-token (token)
686 "Highlight the lexical TOKEN.
687 TOKEN is a lexical token with a START And END position.
688 Return the overlay."
689 (let ((o (semantic-make-overlay (semantic-lex-token-start token)
690 (semantic-lex-token-end token))))
691 (semantic-overlay-put o 'face 'highlight)
694 (defsubst semantic-lex-debug-break (token)
695 "Break during lexical analysis at TOKEN."
696 (when semantic-lex-debug
697 (let ((o nil))
698 (unwind-protect
699 (progn
700 (when token
701 (setq o (semantic-lex-highlight-token token)))
702 (semantic-read-event
703 (format "%S :: SPC - continue" token))
705 (when o
706 (semantic-overlay-delete o))))))
708 ;;; Lexical analyzer creation
710 ;; Code for creating a lex function from lists of analyzers.
712 ;; A lexical analyzer is created from a list of individual analyzers.
713 ;; Each individual analyzer specifies a single match, and code that
714 ;; goes with it.
716 ;; Creation of an analyzer assembles these analyzers into a new function
717 ;; with the behaviors of all the individual analyzers.
719 (defmacro semantic-lex-one-token (analyzers)
720 "Calculate one token from the current buffer at point.
721 Uses locally bound variables from `define-lex'.
722 Argument ANALYZERS is the list of analyzers being used."
723 (cons 'cond (mapcar #'symbol-value analyzers)))
725 (defvar semantic-lex-end-point nil
726 "The end point as tracked through lexical functions.")
728 (defvar semantic-lex-current-depth nil
729 "The current depth as tracked through lexical functions.")
731 (defvar semantic-lex-maximum-depth nil
732 "The maximum depth of parenthesis as tracked through lexical functions.")
734 (defvar semantic-lex-token-stream nil
735 "The current token stream we are collecting.")
737 (defvar semantic-lex-analysis-bounds nil
738 "The bounds of the current analysis.")
740 (defvar semantic-lex-block-streams nil
741 "Streams of tokens inside collapsed blocks.
742 This is an alist of (ANCHOR . STREAM) elements where ANCHOR is the
743 start position of the block, and STREAM is the list of tokens in that
744 block.")
746 (defvar semantic-lex-reset-hooks nil
747 "Abnormal hook used by major-modes to reset lexical analyzers.
748 Hook functions are called with START and END values for the
749 current lexical pass. Should be set with `add-hook', specifying
750 a LOCAL option.")
752 ;; Stack of nested blocks.
753 (defvar semantic-lex-block-stack nil)
754 ;;(defvar semantic-lex-timeout 5
755 ;; "*Number of sections of lexing before giving up.")
757 (defmacro define-lex (name doc &rest analyzers)
758 "Create a new lexical analyzer with NAME.
759 DOC is a documentation string describing this analyzer.
760 ANALYZERS are small code snippets of analyzers to use when
761 building the new NAMED analyzer. Only use analyzers which
762 are written to be used in `define-lex'.
763 Each analyzer should be an analyzer created with `define-lex-analyzer'.
764 Note: The order in which analyzers are listed is important.
765 If two analyzers can match the same text, it is important to order the
766 analyzers so that the one you want to match first occurs first. For
767 example, it is good to put a number analyzer in front of a symbol
768 analyzer which might mistake a number for as a symbol."
769 `(defun ,name (start end &optional depth length)
770 ,(concat doc "\nSee `semantic-lex' for more information.")
771 ;; Make sure the state of block parsing starts over.
772 (setq semantic-lex-block-streams nil)
773 ;; Allow specialty reset items.
774 (run-hook-with-args 'semantic-lex-reset-hooks start end)
775 ;; Lexing state.
776 (let* (;(starttime (current-time))
777 (starting-position (point))
778 (semantic-lex-token-stream nil)
779 (semantic-lex-block-stack nil)
780 (tmp-start start)
781 (semantic-lex-end-point start)
782 (semantic-lex-current-depth 0)
783 ;; Use the default depth when not specified.
784 (semantic-lex-maximum-depth
785 (or depth semantic-lex-depth))
786 ;; Bounds needed for unterminated syntax
787 (semantic-lex-analysis-bounds (cons start end))
788 ;; This entry prevents text properties from
789 ;; confusing our lexical analysis. See Emacs 22 (CVS)
790 ;; version of C++ mode with template hack text properties.
791 (parse-sexp-lookup-properties nil)
793 ;; Maybe REMOVE THIS LATER.
794 ;; Trying to find incremental parser bug.
795 (when (> end (point-max))
796 (error ,(format "%s: end (%%d) > point-max (%%d)" name)
797 end (point-max)))
798 (with-syntax-table semantic-lex-syntax-table
799 (goto-char start)
800 (while (and (< (point) end)
801 (or (not length)
802 (<= (length semantic-lex-token-stream) length)))
803 (semantic-lex-one-token ,analyzers)
804 (when (eq semantic-lex-end-point tmp-start)
805 (error ,(format "%s: endless loop at %%d, after %%S" name)
806 tmp-start (car semantic-lex-token-stream)))
807 (setq tmp-start semantic-lex-end-point)
808 (goto-char semantic-lex-end-point)
809 ;;(when (> (semantic-elapsed-time starttime (current-time))
810 ;; semantic-lex-timeout)
811 ;; (error "Timeout during lex at char %d" (point)))
812 (semantic-throw-on-input 'lex)
813 (semantic-lex-debug-break (car semantic-lex-token-stream))
815 ;; Check that there is no unterminated block.
816 (when semantic-lex-block-stack
817 (let* ((last (pop semantic-lex-block-stack))
818 (blk last))
819 (while blk
820 (message
821 ,(format "%s: `%%s' block from %%S is unterminated" name)
822 (car blk) (cadr blk))
823 (setq blk (pop semantic-lex-block-stack)))
824 (semantic-lex-unterminated-syntax-detected (car last))))
825 ;; Return to where we started.
826 ;; Do not wrap in protective stuff so that if there is an error
827 ;; thrown, the user knows where.
828 (goto-char starting-position)
829 ;; Return the token stream
830 (nreverse semantic-lex-token-stream))))
832 ;;; Collapsed block tokens delimited by any tokens.
834 (defun semantic-lex-start-block (syntax)
835 "Mark the last read token as the beginning of a SYNTAX block."
836 (if (or (not semantic-lex-maximum-depth)
837 (< semantic-lex-current-depth semantic-lex-maximum-depth))
838 (setq semantic-lex-current-depth (1+ semantic-lex-current-depth))
839 (push (list syntax (car semantic-lex-token-stream))
840 semantic-lex-block-stack)))
842 (defun semantic-lex-end-block (syntax)
843 "Process the end of a previously marked SYNTAX block.
844 That is, collapse the tokens inside that block, including the
845 beginning and end of block tokens, into a high level block token of
846 class SYNTAX.
847 The token at beginning of block is the one marked by a previous call
848 to `semantic-lex-start-block'. The current token is the end of block.
849 The collapsed tokens are saved in `semantic-lex-block-streams'."
850 (if (null semantic-lex-block-stack)
851 (setq semantic-lex-current-depth (1- semantic-lex-current-depth))
852 (let* ((stream semantic-lex-token-stream)
853 (blk (pop semantic-lex-block-stack))
854 (bstream (cdr blk))
855 (first (car bstream))
856 (last (pop stream)) ;; The current token mark the EOBLK
857 tok)
858 (if (not (eq (car blk) syntax))
859 ;; SYNTAX doesn't match the syntax of the current block in
860 ;; the stack. So we encountered the end of the SYNTAX block
861 ;; before the end of the current one in the stack which is
862 ;; signaled unterminated.
863 (semantic-lex-unterminated-syntax-detected (car blk))
864 ;; Move tokens found inside the block from the main stream
865 ;; into a separate block stream.
866 (while (and stream (not (eq (setq tok (pop stream)) first)))
867 (push tok bstream))
868 ;; The token marked as beginning of block was not encountered.
869 ;; This should not happen!
870 (or (eq tok first)
871 (error "Token %S not found at beginning of block `%s'"
872 first syntax))
873 ;; Save the block stream for future reuse, to avoid to redo
874 ;; the lexical analysis of the block content!
875 ;; Anchor the block stream with its start position, so we can
876 ;; use: (cdr (assq start semantic-lex-block-streams)) to
877 ;; quickly retrieve the lexical stream associated to a block.
878 (setcar blk (semantic-lex-token-start first))
879 (setcdr blk (nreverse bstream))
880 (push blk semantic-lex-block-streams)
881 ;; In the main stream, replace the tokens inside the block by
882 ;; a high level block token of class SYNTAX.
883 (setq semantic-lex-token-stream stream)
884 (semantic-lex-push-token
885 (semantic-lex-token
886 syntax (car blk) (semantic-lex-token-end last)))
887 ))))
889 ;;; Lexical token API
891 ;; Functions for accessing parts of a token. Use these functions
892 ;; instead of accessing the list structure directly because the
893 ;; contents of the lexical may change.
895 (defmacro semantic-lex-token (symbol start end &optional str)
896 "Create a lexical token.
897 SYMBOL is a symbol representing the class of syntax found.
898 START and END define the bounds of the token in the current buffer.
899 Optional STR is the string for the token only if the bounds in
900 the buffer do not cover the string they represent. (As from
901 macro expansion.)"
902 ;; This if statement checks the existence of a STR argument at
903 ;; compile time, where STR is some symbol or constant. If the
904 ;; variable STr (runtime) is nil, this will make an incorrect decision.
906 ;; It is like this to maintain the original speed of the compiled
907 ;; code.
908 (if str
909 `(cons ,symbol (cons ,str (cons ,start ,end)))
910 `(cons ,symbol (cons ,start ,end))))
912 (defun semantic-lex-token-p (thing)
913 "Return non-nil if THING is a semantic lex token.
914 This is an exhaustively robust check."
915 (and (consp thing)
916 (symbolp (car thing))
917 (or (and (numberp (nth 1 thing))
918 (numberp (nthcdr 2 thing)))
919 (and (stringp (nth 1 thing))
920 (numberp (nth 2 thing))
921 (numberp (nthcdr 3 thing)))
925 (defun semantic-lex-token-with-text-p (thing)
926 "Return non-nil if THING is a semantic lex token.
927 This is an exhaustively robust check."
928 (and (consp thing)
929 (symbolp (car thing))
930 (= (length thing) 4)
931 (stringp (nth 1 thing))
932 (numberp (nth 2 thing))
933 (numberp (nth 3 thing)))
936 (defun semantic-lex-token-without-text-p (thing)
937 "Return non-nil if THING is a semantic lex token.
938 This is an exhaustively robust check."
939 (and (consp thing)
940 (symbolp (car thing))
941 (= (length thing) 3)
942 (numberp (nth 1 thing))
943 (numberp (nth 2 thing)))
946 (eval-and-compile
948 (defun semantic-lex-expand-block-specs (specs)
949 "Expand block specifications SPECS into a Lisp form.
950 SPECS is a list of (BLOCK BEGIN END) elements where BLOCK, BEGIN, and
951 END are token class symbols that indicate to produce one collapsed
952 BLOCK token from tokens found between BEGIN and END ones.
953 BLOCK must be a non-nil symbol, and at least one of the BEGIN or END
954 symbols must be non-nil too.
955 When BEGIN is non-nil, generate a call to `semantic-lex-start-block'
956 when a BEGIN token class is encountered.
957 When END is non-nil, generate a call to `semantic-lex-end-block' when
958 an END token class is encountered."
959 (let ((class (make-symbol "class"))
960 (form nil))
961 (dolist (spec specs)
962 (when (car spec)
963 (when (nth 1 spec)
964 (push `((eq ',(nth 1 spec) ,class)
965 (semantic-lex-start-block ',(car spec)))
966 form))
967 (when (nth 2 spec)
968 (push `((eq ',(nth 2 spec) ,class)
969 (semantic-lex-end-block ',(car spec)))
970 form))))
971 (when form
972 `((let ((,class (semantic-lex-token-class
973 (car semantic-lex-token-stream))))
974 (cond ,@(nreverse form))))
978 (defmacro semantic-lex-push-token (token &rest blockspecs)
979 "Push TOKEN in the lexical analyzer token stream.
980 Return the lexical analysis current end point.
981 If optional arguments BLOCKSPECS is non-nil, it specifies to process
982 collapsed block tokens. See `semantic-lex-expand-block-specs' for
983 more details.
984 This macro should only be called within the bounds of
985 `define-lex-analyzer'. It changes the values of the lexical analyzer
986 variables `token-stream' and `semantic-lex-end-point'. If you need to
987 move `semantic-lex-end-point' somewhere else, just modify this
988 variable after calling `semantic-lex-push-token'."
989 `(progn
990 (push ,token semantic-lex-token-stream)
991 ,@(semantic-lex-expand-block-specs blockspecs)
992 (setq semantic-lex-end-point
993 (semantic-lex-token-end (car semantic-lex-token-stream)))
996 (defsubst semantic-lex-token-class (token)
997 "Fetch the class of the lexical token TOKEN.
998 See also the function `semantic-lex-token'."
999 (car token))
1001 (defsubst semantic-lex-token-text (token)
1002 "Fetch the text associated with the lexical token TOKEN.
1003 See also the function `semantic-lex-token'."
1004 (if (stringp (car (cdr token)))
1005 (car (cdr token))
1006 (buffer-substring-no-properties
1007 (semantic-lex-token-start token)
1008 (semantic-lex-token-end token))))
1010 (defun semantic-lex-init ()
1011 "Initialize any lexical state for this buffer."
1012 (unless semantic-lex-comment-regex
1013 (setq semantic-lex-comment-regex
1014 (if comment-start-skip
1015 (concat "\\(\\s<\\|" comment-start-skip "\\)")
1016 "\\(\\s<\\)")))
1017 ;; Setup the lexer syntax-table
1018 (setq semantic-lex-syntax-table (copy-syntax-table (syntax-table)))
1019 (dolist (mod semantic-lex-syntax-modifications)
1020 (modify-syntax-entry
1021 (car mod) (nth 1 mod) semantic-lex-syntax-table)))
1023 ;;;###autoload
1024 (define-overloadable-function semantic-lex (start end &optional depth length)
1025 "Lexically analyze text in the current buffer between START and END.
1026 Optional argument DEPTH indicates at what level to scan over entire
1027 lists. The last argument, LENGTH specifies that `semantic-lex'
1028 should only return LENGTH tokens. The return value is a token stream.
1029 Each element is a list, such of the form
1030 (symbol start-expression . end-expression)
1031 where SYMBOL denotes the token type.
1032 See `semantic-lex-tokens' variable for details on token types. END
1033 does not mark the end of the text scanned, only the end of the
1034 beginning of text scanned. Thus, if a string extends past END, the
1035 end of the return token will be larger than END. To truly restrict
1036 scanning, use `narrow-to-region'."
1037 (funcall semantic-lex-analyzer start end depth length))
1039 (defsubst semantic-lex-buffer (&optional depth)
1040 "Lex the current buffer.
1041 Optional argument DEPTH is the depth to scan into lists."
1042 (semantic-lex (point-min) (point-max) depth))
1044 (defsubst semantic-lex-list (semlist depth)
1045 "Lex the body of SEMLIST to DEPTH."
1046 (semantic-lex (semantic-lex-token-start semlist)
1047 (semantic-lex-token-end semlist)
1048 depth))
1050 ;;; Analyzer creation macros
1052 ;; An individual analyzer is a condition and code that goes with it.
1054 ;; Created analyzers become variables with the code associated with them
1055 ;; as the symbol value. These analyzers are assembled into a lexer
1056 ;; to create new lexical analyzers.
1058 (defcustom semantic-lex-debug-analyzers nil
1059 "Non nil means to debug analyzers with syntax protection.
1060 Only in effect if `debug-on-error' is also non-nil."
1061 :group 'semantic
1062 :type 'boolean)
1064 (defmacro semantic-lex-unterminated-syntax-protection (syntax &rest forms)
1065 "For SYNTAX, execute FORMS with protection for unterminated syntax.
1066 If FORMS throws an error, treat this as a syntax problem, and
1067 execute the unterminated syntax code. FORMS should return a position.
1068 Irregardless of an error, the cursor should be moved to the end of
1069 the desired syntax, and a position returned.
1070 If `debug-on-error' is set, errors are not caught, so that you can
1071 debug them.
1072 Avoid using a large FORMS since it is duplicated."
1073 `(if (and debug-on-error semantic-lex-debug-analyzers)
1074 (progn ,@forms)
1075 (condition-case nil
1076 (progn ,@forms)
1077 (error
1078 (semantic-lex-unterminated-syntax-detected ,syntax)))))
1079 (put 'semantic-lex-unterminated-syntax-protection
1080 'lisp-indent-function 1)
1082 (defmacro define-lex-analyzer (name doc condition &rest forms)
1083 "Create a single lexical analyzer NAME with DOC.
1084 When an analyzer is called, the current buffer and point are
1085 positioned in a buffer at the location to be analyzed.
1086 CONDITION is an expression which returns t if FORMS should be run.
1087 Within the bounds of CONDITION and FORMS, the use of backquote
1088 can be used to evaluate expressions at compile time.
1089 While forms are running, the following variables will be locally bound:
1090 `semantic-lex-analysis-bounds' - The bounds of the current analysis.
1091 of the form (START . END)
1092 `semantic-lex-maximum-depth' - The maximum depth of semantic-list
1093 for the current analysis.
1094 `semantic-lex-current-depth' - The current depth of `semantic-list' that has
1095 been descended.
1096 `semantic-lex-end-point' - End Point after match.
1097 Analyzers should set this to a buffer location if their
1098 match string does not represent the end of the matched text.
1099 `semantic-lex-token-stream' - The token list being collected.
1100 Add new lexical tokens to this list.
1101 Proper action in FORMS is to move the value of `semantic-lex-end-point' to
1102 after the location of the analyzed entry, and to add any discovered tokens
1103 at the beginning of `semantic-lex-token-stream'.
1104 This can be done by using `semantic-lex-push-token'."
1105 `(eval-and-compile
1106 (defvar ,name nil ,doc)
1107 (defun ,name nil)
1108 ;; Do this part separately so that re-evaluation rebuilds this code.
1109 (setq ,name '(,condition ,@forms))
1110 ;; Build a single lexical analyzer function, so the doc for
1111 ;; function help is automatically provided, and perhaps the
1112 ;; function could be useful for testing and debugging one
1113 ;; analyzer.
1114 (fset ',name (lambda () ,doc
1115 (let ((semantic-lex-token-stream nil)
1116 (semantic-lex-end-point (point))
1117 (semantic-lex-analysis-bounds
1118 (cons (point) (point-max)))
1119 (semantic-lex-current-depth 0)
1120 (semantic-lex-maximum-depth
1121 semantic-lex-depth)
1123 (when ,condition ,@forms)
1124 semantic-lex-token-stream)))
1127 (defmacro define-lex-regex-analyzer (name doc regexp &rest forms)
1128 "Create a lexical analyzer with NAME and DOC that will match REGEXP.
1129 FORMS are evaluated upon a successful match.
1130 See `define-lex-analyzer' for more about analyzers."
1131 `(define-lex-analyzer ,name
1132 ,doc
1133 (looking-at ,regexp)
1134 ,@forms
1137 (defmacro define-lex-simple-regex-analyzer (name doc regexp toksym
1138 &optional index
1139 &rest forms)
1140 "Create a lexical analyzer with NAME and DOC that match REGEXP.
1141 TOKSYM is the symbol to use when creating a semantic lexical token.
1142 INDEX is the index into the match that defines the bounds of the token.
1143 Index should be a plain integer, and not specified in the macro as an
1144 expression.
1145 FORMS are evaluated upon a successful match BEFORE the new token is
1146 created. It is valid to ignore FORMS.
1147 See `define-lex-analyzer' for more about analyzers."
1148 `(define-lex-analyzer ,name
1149 ,doc
1150 (looking-at ,regexp)
1151 ,@forms
1152 (semantic-lex-push-token
1153 (semantic-lex-token ,toksym
1154 (match-beginning ,(or index 0))
1155 (match-end ,(or index 0))))
1158 (defmacro define-lex-block-analyzer (name doc spec1 &rest specs)
1159 "Create a lexical analyzer NAME for paired delimiters blocks.
1160 It detects a paired delimiters block or the corresponding open or
1161 close delimiter depending on the value of the variable
1162 `semantic-lex-current-depth'. DOC is the documentation string of the lexical
1163 analyzer. SPEC1 and SPECS specify the token symbols and open, close
1164 delimiters used. Each SPEC has the form:
1166 \(BLOCK-SYM (OPEN-DELIM OPEN-SYM) (CLOSE-DELIM CLOSE-SYM))
1168 where BLOCK-SYM is the symbol returned in a block token. OPEN-DELIM
1169 and CLOSE-DELIM are respectively the open and close delimiters
1170 identifying a block. OPEN-SYM and CLOSE-SYM are respectively the
1171 symbols returned in open and close tokens."
1172 (let ((specs (cons spec1 specs))
1173 spec open olist clist)
1174 (while specs
1175 (setq spec (car specs)
1176 specs (cdr specs)
1177 open (nth 1 spec)
1178 ;; build alist ((OPEN-DELIM OPEN-SYM BLOCK-SYM) ...)
1179 olist (cons (list (car open) (cadr open) (car spec)) olist)
1180 ;; build alist ((CLOSE-DELIM CLOSE-SYM) ...)
1181 clist (cons (nth 2 spec) clist)))
1182 `(define-lex-analyzer ,name
1183 ,doc
1184 (and
1185 (looking-at "\\(\\s(\\|\\s)\\)")
1186 (let ((text (match-string 0)) match)
1187 (cond
1188 ((setq match (assoc text ',olist))
1189 (if (or (not semantic-lex-maximum-depth)
1190 (< semantic-lex-current-depth semantic-lex-maximum-depth))
1191 (progn
1192 (setq semantic-lex-current-depth (1+ semantic-lex-current-depth))
1193 (semantic-lex-push-token
1194 (semantic-lex-token
1195 (nth 1 match)
1196 (match-beginning 0) (match-end 0))))
1197 (semantic-lex-push-token
1198 (semantic-lex-token
1199 (nth 2 match)
1200 (match-beginning 0)
1201 (save-excursion
1202 (semantic-lex-unterminated-syntax-protection (nth 2 match)
1203 (forward-list 1)
1204 (point)))
1207 ((setq match (assoc text ',clist))
1208 (setq semantic-lex-current-depth (1- semantic-lex-current-depth))
1209 (semantic-lex-push-token
1210 (semantic-lex-token
1211 (nth 1 match)
1212 (match-beginning 0) (match-end 0)))))))
1215 ;;; Analyzers
1217 ;; Pre-defined common analyzers.
1219 (define-lex-analyzer semantic-lex-default-action
1220 "The default action when no other lexical actions match text.
1221 This action will just throw an error."
1223 (error "Unmatched Text during Lexical Analysis"))
1225 (define-lex-analyzer semantic-lex-beginning-of-line
1226 "Detect and create a beginning of line token (BOL)."
1227 (and (bolp)
1228 ;; Just insert a (bol N . N) token in the token stream,
1229 ;; without moving the point. N is the point at the
1230 ;; beginning of line.
1231 (semantic-lex-push-token (semantic-lex-token 'bol (point) (point)))
1232 nil) ;; CONTINUE
1233 ;; We identify and add the BOL token onto the stream, but since
1234 ;; semantic-lex-end-point doesn't move, we always fail CONDITION, and have no
1235 ;; FORMS body.
1236 nil)
1238 (define-lex-simple-regex-analyzer semantic-lex-newline
1239 "Detect and create newline tokens."
1240 "\\s-*\\(\n\\|\\s>\\)" 'newline 1)
1242 (define-lex-regex-analyzer semantic-lex-newline-as-whitespace
1243 "Detect and create newline tokens.
1244 Use this ONLY if newlines are not whitespace characters (such as when
1245 they are comment end characters) AND when you want whitespace tokens."
1246 "\\s-*\\(\n\\|\\s>\\)"
1247 ;; Language wants whitespaces. Create a token for it.
1248 (if (eq (semantic-lex-token-class (car semantic-lex-token-stream))
1249 'whitespace)
1250 ;; Merge whitespace tokens together if they are adjacent. Two
1251 ;; whitespace tokens may be sperated by a comment which is not in
1252 ;; the token stream.
1253 (setcdr (semantic-lex-token-bounds (car semantic-lex-token-stream))
1254 (match-end 0))
1255 (semantic-lex-push-token
1256 (semantic-lex-token
1257 'whitespace (match-beginning 0) (match-end 0)))))
1259 (define-lex-regex-analyzer semantic-lex-ignore-newline
1260 "Detect and ignore newline tokens.
1261 Use this ONLY if newlines are not whitespace characters (such as when
1262 they are comment end characters)."
1263 "\\s-*\\(\n\\|\\s>\\)"
1264 (setq semantic-lex-end-point (match-end 0)))
1266 (define-lex-regex-analyzer semantic-lex-whitespace
1267 "Detect and create whitespace tokens."
1268 ;; catch whitespace when needed
1269 "\\s-+"
1270 ;; Language wants whitespaces. Create a token for it.
1271 (if (eq (semantic-lex-token-class (car semantic-lex-token-stream))
1272 'whitespace)
1273 ;; Merge whitespace tokens together if they are adjacent. Two
1274 ;; whitespace tokens may be sperated by a comment which is not in
1275 ;; the token stream.
1276 (progn
1277 (setq semantic-lex-end-point (match-end 0))
1278 (setcdr (semantic-lex-token-bounds (car semantic-lex-token-stream))
1279 semantic-lex-end-point))
1280 (semantic-lex-push-token
1281 (semantic-lex-token
1282 'whitespace (match-beginning 0) (match-end 0)))))
1284 (define-lex-regex-analyzer semantic-lex-ignore-whitespace
1285 "Detect and skip over whitespace tokens."
1286 ;; catch whitespace when needed
1287 "\\s-+"
1288 ;; Skip over the detected whitespace, do not create a token for it.
1289 (setq semantic-lex-end-point (match-end 0)))
1291 (define-lex-simple-regex-analyzer semantic-lex-number
1292 "Detect and create number tokens.
1293 See `semantic-lex-number-expression' for details on matching numbers,
1294 and number formats."
1295 semantic-lex-number-expression 'number)
1297 (define-lex-regex-analyzer semantic-lex-symbol-or-keyword
1298 "Detect and create symbol and keyword tokens."
1299 "\\(\\sw\\|\\s_\\)+"
1300 (semantic-lex-push-token
1301 (semantic-lex-token
1302 (or (semantic-lex-keyword-p (match-string 0)) 'symbol)
1303 (match-beginning 0) (match-end 0))))
1305 (define-lex-simple-regex-analyzer semantic-lex-charquote
1306 "Detect and create charquote tokens."
1307 ;; Character quoting characters (ie, \n as newline)
1308 "\\s\\+" 'charquote)
1310 (define-lex-simple-regex-analyzer semantic-lex-punctuation
1311 "Detect and create punctuation tokens."
1312 "\\(\\s.\\|\\s$\\|\\s'\\)" 'punctuation)
1314 (define-lex-analyzer semantic-lex-punctuation-type
1315 "Detect and create a punctuation type token.
1316 Recognized punctuations are defined in the current table of lexical
1317 types, as the value of the `punctuation' token type."
1318 (and (looking-at "\\(\\s.\\|\\s$\\|\\s'\\)+")
1319 (let* ((key (match-string 0))
1320 (pos (match-beginning 0))
1321 (end (match-end 0))
1322 (len (- end pos))
1323 (lst (semantic-lex-type-value "punctuation" t))
1324 (def (car lst)) ;; default lexical symbol or nil
1325 (lst (cdr lst)) ;; alist of (LEX-SYM . PUNCT-STRING)
1326 (elt nil))
1327 (if lst
1328 ;; Starting with the longest one, search if the
1329 ;; punctuation string is defined for this language.
1330 (while (and (> len 0) (not (setq elt (rassoc key lst))))
1331 (setq len (1- len)
1332 key (substring key 0 len))))
1333 (if elt ;; Return the punctuation token found
1334 (semantic-lex-push-token
1335 (semantic-lex-token (car elt) pos (+ pos len)))
1336 (if def ;; Return a default generic token
1337 (semantic-lex-push-token
1338 (semantic-lex-token def pos end))
1339 ;; Nothing match
1340 )))))
1342 (define-lex-regex-analyzer semantic-lex-paren-or-list
1343 "Detect open parenthesis.
1344 Return either a paren token or a semantic list token depending on
1345 `semantic-lex-current-depth'."
1346 "\\s("
1347 (if (or (not semantic-lex-maximum-depth)
1348 (< semantic-lex-current-depth semantic-lex-maximum-depth))
1349 (progn
1350 (setq semantic-lex-current-depth (1+ semantic-lex-current-depth))
1351 (semantic-lex-push-token
1352 (semantic-lex-token
1353 'open-paren (match-beginning 0) (match-end 0))))
1354 (semantic-lex-push-token
1355 (semantic-lex-token
1356 'semantic-list (match-beginning 0)
1357 (save-excursion
1358 (semantic-lex-unterminated-syntax-protection 'semantic-list
1359 (forward-list 1)
1360 (point))
1364 (define-lex-simple-regex-analyzer semantic-lex-open-paren
1365 "Detect and create an open parenthisis token."
1366 "\\s(" 'open-paren 0 (setq semantic-lex-current-depth (1+ semantic-lex-current-depth)))
1368 (define-lex-simple-regex-analyzer semantic-lex-close-paren
1369 "Detect and create a close paren token."
1370 "\\s)" 'close-paren 0 (setq semantic-lex-current-depth (1- semantic-lex-current-depth)))
1372 (define-lex-regex-analyzer semantic-lex-string
1373 "Detect and create a string token."
1374 "\\s\""
1375 ;; Zing to the end of this string.
1376 (semantic-lex-push-token
1377 (semantic-lex-token
1378 'string (point)
1379 (save-excursion
1380 (semantic-lex-unterminated-syntax-protection 'string
1381 (forward-sexp 1)
1382 (point))
1383 ))))
1385 (define-lex-regex-analyzer semantic-lex-comments
1386 "Detect and create a comment token."
1387 semantic-lex-comment-regex
1388 (save-excursion
1389 (forward-comment 1)
1390 ;; Generate newline token if enabled
1391 (if (bolp) (backward-char 1))
1392 (setq semantic-lex-end-point (point))
1393 ;; Language wants comments or want them as whitespaces,
1394 ;; link them together.
1395 (if (eq (semantic-lex-token-class (car semantic-lex-token-stream)) 'comment)
1396 (setcdr (semantic-lex-token-bounds (car semantic-lex-token-stream))
1397 semantic-lex-end-point)
1398 (semantic-lex-push-token
1399 (semantic-lex-token
1400 'comment (match-beginning 0) semantic-lex-end-point)))))
1402 (define-lex-regex-analyzer semantic-lex-comments-as-whitespace
1403 "Detect comments and create a whitespace token."
1404 semantic-lex-comment-regex
1405 (save-excursion
1406 (forward-comment 1)
1407 ;; Generate newline token if enabled
1408 (if (bolp) (backward-char 1))
1409 (setq semantic-lex-end-point (point))
1410 ;; Language wants comments or want them as whitespaces,
1411 ;; link them together.
1412 (if (eq (semantic-lex-token-class (car semantic-lex-token-stream)) 'whitespace)
1413 (setcdr (semantic-lex-token-bounds (car semantic-lex-token-stream))
1414 semantic-lex-end-point)
1415 (semantic-lex-push-token
1416 (semantic-lex-token
1417 'whitespace (match-beginning 0) semantic-lex-end-point)))))
1419 (define-lex-regex-analyzer semantic-lex-ignore-comments
1420 "Detect and create a comment token."
1421 semantic-lex-comment-regex
1422 (let ((comment-start-point (point)))
1423 (forward-comment 1)
1424 (if (eq (point) comment-start-point)
1425 ;; In this case our start-skip string failed
1426 ;; to work properly. Lets try and move over
1427 ;; whatever white space we matched to begin
1428 ;; with.
1429 (skip-syntax-forward "-.'" (point-at-eol))
1430 ;; We may need to back up so newlines or whitespace is generated.
1431 (if (bolp)
1432 (backward-char 1)))
1433 (if (eq (point) comment-start-point)
1434 (error "Strange comment syntax prevents lexical analysis"))
1435 (setq semantic-lex-end-point (point))))
1437 ;;; Comment lexer
1439 ;; Predefined lexers that could be used instead of creating new
1440 ;; analyzers.
1442 (define-lex semantic-comment-lexer
1443 "A simple lexical analyzer that handles comments.
1444 This lexer will only return comment tokens. It is the default lexer
1445 used by `semantic-find-doc-snarf-comment' to snarf up the comment at
1446 point."
1447 semantic-lex-ignore-whitespace
1448 semantic-lex-ignore-newline
1449 semantic-lex-comments
1450 semantic-lex-default-action)
1452 ;;; Test Lexer
1454 (define-lex semantic-simple-lexer
1455 "A simple lexical analyzer that handles simple buffers.
1456 This lexer ignores comments and whitespace, and will return
1457 syntax as specified by the syntax table."
1458 semantic-lex-ignore-whitespace
1459 semantic-lex-ignore-newline
1460 semantic-lex-number
1461 semantic-lex-symbol-or-keyword
1462 semantic-lex-charquote
1463 semantic-lex-paren-or-list
1464 semantic-lex-close-paren
1465 semantic-lex-string
1466 semantic-lex-ignore-comments
1467 semantic-lex-punctuation
1468 semantic-lex-default-action)
1470 ;;; Analyzers generated from grammar.
1472 ;; Some analyzers are hand written. Analyzers created with these
1473 ;; functions are generated from the grammar files.
1475 (defmacro define-lex-keyword-type-analyzer (name doc syntax)
1476 "Define a keyword type analyzer NAME with DOC string.
1477 SYNTAX is the regexp that matches a keyword syntactic expression."
1478 (let ((key (make-symbol "key")))
1479 `(define-lex-analyzer ,name
1480 ,doc
1481 (and (looking-at ,syntax)
1482 (let ((,key (semantic-lex-keyword-p (match-string 0))))
1483 (when ,key
1484 (semantic-lex-push-token
1485 (semantic-lex-token
1486 ,key (match-beginning 0) (match-end 0)))))))
1489 (defmacro define-lex-sexp-type-analyzer (name doc syntax token)
1490 "Define a sexp type analyzer NAME with DOC string.
1491 SYNTAX is the regexp that matches the beginning of the s-expression.
1492 TOKEN is the lexical token returned when SYNTAX matches."
1493 `(define-lex-regex-analyzer ,name
1494 ,doc
1495 ,syntax
1496 (semantic-lex-push-token
1497 (semantic-lex-token
1498 ,token (point)
1499 (save-excursion
1500 (semantic-lex-unterminated-syntax-protection ,token
1501 (forward-sexp 1)
1502 (point))))))
1505 (defmacro define-lex-regex-type-analyzer (name doc syntax matches default)
1506 "Define a regexp type analyzer NAME with DOC string.
1507 SYNTAX is the regexp that matches a syntactic expression.
1508 MATCHES is an alist of lexical elements used to refine the syntactic
1509 expression.
1510 DEFAULT is the default lexical token returned when no MATCHES."
1511 (if matches
1512 (let* ((val (make-symbol "val"))
1513 (lst (make-symbol "lst"))
1514 (elt (make-symbol "elt"))
1515 (pos (make-symbol "pos"))
1516 (end (make-symbol "end")))
1517 `(define-lex-analyzer ,name
1518 ,doc
1519 (and (looking-at ,syntax)
1520 (let* ((,val (match-string 0))
1521 (,pos (match-beginning 0))
1522 (,end (match-end 0))
1523 (,lst ,matches)
1524 ,elt)
1525 (while (and ,lst (not ,elt))
1526 (if (string-match (cdar ,lst) ,val)
1527 (setq ,elt (caar ,lst))
1528 (setq ,lst (cdr ,lst))))
1529 (semantic-lex-push-token
1530 (semantic-lex-token (or ,elt ,default) ,pos ,end))))
1532 `(define-lex-simple-regex-analyzer ,name
1533 ,doc
1534 ,syntax ,default)
1537 (defmacro define-lex-string-type-analyzer (name doc syntax matches default)
1538 "Define a string type analyzer NAME with DOC string.
1539 SYNTAX is the regexp that matches a syntactic expression.
1540 MATCHES is an alist of lexical elements used to refine the syntactic
1541 expression.
1542 DEFAULT is the default lexical token returned when no MATCHES."
1543 (if matches
1544 (let* ((val (make-symbol "val"))
1545 (lst (make-symbol "lst"))
1546 (elt (make-symbol "elt"))
1547 (pos (make-symbol "pos"))
1548 (end (make-symbol "end"))
1549 (len (make-symbol "len")))
1550 `(define-lex-analyzer ,name
1551 ,doc
1552 (and (looking-at ,syntax)
1553 (let* ((,val (match-string 0))
1554 (,pos (match-beginning 0))
1555 (,end (match-end 0))
1556 (,len (- ,end ,pos))
1557 (,lst ,matches)
1558 ,elt)
1559 ;; Starting with the longest one, search if a lexical
1560 ;; value match a token defined for this language.
1561 (while (and (> ,len 0) (not (setq ,elt (rassoc ,val ,lst))))
1562 (setq ,len (1- ,len)
1563 ,val (substring ,val 0 ,len)))
1564 (when ,elt ;; Adjust token end position.
1565 (setq ,elt (car ,elt)
1566 ,end (+ ,pos ,len)))
1567 (semantic-lex-push-token
1568 (semantic-lex-token (or ,elt ,default) ,pos ,end))))
1570 `(define-lex-simple-regex-analyzer ,name
1571 ,doc
1572 ,syntax ,default)
1575 (defmacro define-lex-block-type-analyzer (name doc syntax matches)
1576 "Define a block type analyzer NAME with DOC string.
1578 SYNTAX is the regexp that matches block delimiters, typically the
1579 open (`\\\\s(') and close (`\\\\s)') parenthesis syntax classes.
1581 MATCHES is a pair (OPEN-SPECS . CLOSE-SPECS) that defines blocks.
1583 OPEN-SPECS is a list of (OPEN-DELIM OPEN-TOKEN BLOCK-TOKEN) elements
1584 where:
1586 OPEN-DELIM is a string: the block open delimiter character.
1588 OPEN-TOKEN is the lexical token class associated to the OPEN-DELIM
1589 delimiter.
1591 BLOCK-TOKEN is the lexical token class associated to the block
1592 that starts at the OPEN-DELIM delimiter.
1594 CLOSE-SPECS is a list of (CLOSE-DELIM CLOSE-TOKEN) elements where:
1596 CLOSE-DELIM is a string: the block end delimiter character.
1598 CLOSE-TOKEN is the lexical token class associated to the
1599 CLOSE-DELIM delimiter.
1601 Each element in OPEN-SPECS must have a corresponding element in
1602 CLOSE-SPECS.
1604 The lexer will return a BLOCK-TOKEN token when the value of
1605 `semantic-lex-current-depth' is greater than or equal to the maximum
1606 depth of parenthesis tracking (see also the function `semantic-lex').
1607 Otherwise it will return OPEN-TOKEN and CLOSE-TOKEN tokens.
1609 TO DO: Put the following in the developer's guide and just put a
1610 reference here.
1612 In the grammar:
1614 The value of a block token must be a string that contains a readable
1615 sexp of the form:
1617 \"(OPEN-TOKEN CLOSE-TOKEN)\"
1619 OPEN-TOKEN and CLOSE-TOKEN represent the block delimiters, and must be
1620 lexical tokens of respectively `open-paren' and `close-paren' types.
1621 Their value is the corresponding delimiter character as a string.
1623 Here is a small example to analyze a parenthesis block:
1625 %token <block> PAREN_BLOCK \"(LPAREN RPAREN)\"
1626 %token <open-paren> LPAREN \"(\"
1627 %token <close-paren> RPAREN \")\"
1629 When the lexer encounters the open-paren delimiter \"(\":
1631 - If the maximum depth of parenthesis tracking is not reached (that
1632 is, current depth < max depth), it returns a (LPAREN start . end)
1633 token, then continue analysis inside the block. Later, when the
1634 corresponding close-paren delimiter \")\" will be encountered, it
1635 will return a (RPAREN start . end) token.
1637 - If the maximum depth of parenthesis tracking is reached (current
1638 depth >= max depth), it returns the whole parenthesis block as
1639 a (PAREN_BLOCK start . end) token."
1640 (let* ((val (make-symbol "val"))
1641 (lst (make-symbol "lst"))
1642 (elt (make-symbol "elt")))
1643 `(define-lex-analyzer ,name
1644 ,doc
1645 (and
1646 (looking-at ,syntax) ;; "\\(\\s(\\|\\s)\\)"
1647 (let ((,val (match-string 0))
1648 (,lst ,matches)
1649 ,elt)
1650 (cond
1651 ((setq ,elt (assoc ,val (car ,lst)))
1652 (if (or (not semantic-lex-maximum-depth)
1653 (< semantic-lex-current-depth semantic-lex-maximum-depth))
1654 (progn
1655 (setq semantic-lex-current-depth (1+ semantic-lex-current-depth))
1656 (semantic-lex-push-token
1657 (semantic-lex-token
1658 (nth 1 ,elt)
1659 (match-beginning 0) (match-end 0))))
1660 (semantic-lex-push-token
1661 (semantic-lex-token
1662 (nth 2 ,elt)
1663 (match-beginning 0)
1664 (save-excursion
1665 (semantic-lex-unterminated-syntax-protection (nth 2 ,elt)
1666 (forward-list 1)
1667 (point)))))))
1668 ((setq ,elt (assoc ,val (cdr ,lst)))
1669 (setq semantic-lex-current-depth (1- semantic-lex-current-depth))
1670 (semantic-lex-push-token
1671 (semantic-lex-token
1672 (nth 1 ,elt)
1673 (match-beginning 0) (match-end 0))))
1674 ))))
1677 ;;; Lexical Safety
1679 ;; The semantic lexers, unlike other lexers, can throw errors on
1680 ;; unbalanced syntax. Since editing is all about changeging test
1681 ;; we need to provide a convenient way to protect against syntactic
1682 ;; inequalities.
1684 (defmacro semantic-lex-catch-errors (symbol &rest forms)
1685 "Using SYMBOL, execute FORMS catching lexical errors.
1686 If FORMS results in a call to the parser that throws a lexical error,
1687 the error will be caught here without the buffer's cache being thrown
1688 out of date.
1689 If there is an error, the syntax that failed is returned.
1690 If there is no error, then the last value of FORMS is returned."
1691 (let ((ret (make-symbol "ret"))
1692 (syntax (make-symbol "syntax"))
1693 (start (make-symbol "start"))
1694 (end (make-symbol "end")))
1695 `(let* ((semantic-lex-unterminated-syntax-end-function
1696 (lambda (,syntax ,start ,end)
1697 (throw ',symbol ,syntax)))
1698 ;; Delete the below when semantic-flex is fully retired.
1699 (semantic-flex-unterminated-syntax-end-function
1700 semantic-lex-unterminated-syntax-end-function)
1701 (,ret (catch ',symbol
1702 (save-excursion
1703 ,@forms
1704 nil))))
1705 ;; Great Sadness. Assume that FORMS execute within the
1706 ;; confines of the current buffer only! Mark this thing
1707 ;; unparseable iff the special symbol was thrown. This
1708 ;; will prevent future calls from parsing, but will allow
1709 ;; then to still return the cache.
1710 (when ,ret
1711 ;; Leave this message off. If an APP using this fcn wants
1712 ;; a message, they can do it themselves. This cleans up
1713 ;; problems with the idle scheduler obscuring useful data.
1714 ;;(message "Buffer not currently parsable (%S)." ,ret)
1715 (semantic-parse-tree-unparseable))
1716 ,ret)))
1717 (put 'semantic-lex-catch-errors 'lisp-indent-function 1)
1720 ;;; Interfacing with edebug
1722 (add-hook
1723 'edebug-setup-hook
1724 #'(lambda ()
1726 (def-edebug-spec define-lex
1727 (&define name stringp (&rest symbolp))
1729 (def-edebug-spec define-lex-analyzer
1730 (&define name stringp form def-body)
1732 (def-edebug-spec define-lex-regex-analyzer
1733 (&define name stringp form def-body)
1735 (def-edebug-spec define-lex-simple-regex-analyzer
1736 (&define name stringp form symbolp [ &optional form ] def-body)
1738 (def-edebug-spec define-lex-block-analyzer
1739 (&define name stringp form (&rest form))
1741 (def-edebug-spec semantic-lex-catch-errors
1742 (symbolp def-body)
1747 ;;; Compatibility with Semantic 1.x lexical analysis
1749 ;; NOTE: DELETE THIS SOMEDAY SOON
1751 (semantic-alias-obsolete 'semantic-flex-start 'semantic-lex-token-start "23.2")
1752 (semantic-alias-obsolete 'semantic-flex-end 'semantic-lex-token-end "23.2")
1753 (semantic-alias-obsolete 'semantic-flex-text 'semantic-lex-token-text "23.2")
1754 (semantic-alias-obsolete 'semantic-flex-make-keyword-table 'semantic-lex-make-keyword-table "23.2")
1755 (semantic-alias-obsolete 'semantic-flex-keyword-p 'semantic-lex-keyword-p "23.2")
1756 (semantic-alias-obsolete 'semantic-flex-keyword-put 'semantic-lex-keyword-put "23.2")
1757 (semantic-alias-obsolete 'semantic-flex-keyword-get 'semantic-lex-keyword-get "23.2")
1758 (semantic-alias-obsolete 'semantic-flex-map-keywords 'semantic-lex-map-keywords "23.2")
1759 (semantic-alias-obsolete 'semantic-flex-keywords 'semantic-lex-keywords "23.2")
1760 (semantic-alias-obsolete 'semantic-flex-buffer 'semantic-lex-buffer "23.2")
1761 (semantic-alias-obsolete 'semantic-flex-list 'semantic-lex-list "23.2")
1763 ;; This simple scanner uses the syntax table to generate a stream of
1764 ;; simple tokens of the form:
1766 ;; (SYMBOL START . END)
1768 ;; Where symbol is the type of thing it is. START and END mark that
1769 ;; objects boundary.
1771 (defvar semantic-flex-tokens semantic-lex-tokens
1772 "An alist of of semantic token types.
1773 See variable `semantic-lex-tokens'.")
1775 (defvar semantic-flex-unterminated-syntax-end-function
1776 (lambda (syntax syntax-start flex-end) flex-end)
1777 "Function called when unterminated syntax is encountered.
1778 This should be set to one function. That function should take three
1779 parameters. The SYNTAX, or type of syntax which is unterminated.
1780 SYNTAX-START where the broken syntax begins.
1781 FLEX-END is where the lexical analysis was asked to end.
1782 This function can be used for languages that can intelligently fix up
1783 broken syntax, or the exit lexical analysis via `throw' or `signal'
1784 when finding unterminated syntax.")
1786 (defvar semantic-flex-extensions nil
1787 "Buffer local extensions to the lexical analyzer.
1788 This should contain an alist with a key of a regex and a data element of
1789 a function. The function should both move point, and return a lexical
1790 token of the form:
1791 ( TYPE START . END)
1792 nil is also a valid return value.
1793 TYPE can be any type of symbol, as long as it doesn't occur as a
1794 nonterminal in the language definition.")
1795 (make-variable-buffer-local 'semantic-flex-extensions)
1797 (defvar semantic-flex-syntax-modifications nil
1798 "Changes to the syntax table for this buffer.
1799 These changes are active only while the buffer is being flexed.
1800 This is a list where each element has the form:
1801 (CHAR CLASS)
1802 CHAR is the char passed to `modify-syntax-entry',
1803 and CLASS is the string also passed to `modify-syntax-entry' to define
1804 what syntax class CHAR has.")
1805 (make-variable-buffer-local 'semantic-flex-syntax-modifications)
1807 (defvar semantic-ignore-comments t
1808 "Default comment handling.
1809 The value t means to strip comments when flexing; nil means
1810 to keep comments as part of the token stream.")
1811 (make-variable-buffer-local 'semantic-ignore-comments)
1813 (defvar semantic-flex-enable-newlines nil
1814 "When flexing, report 'newlines as syntactic elements.
1815 Useful for languages where the newline is a special case terminator.
1816 Only set this on a per mode basis, not globally.")
1817 (make-variable-buffer-local 'semantic-flex-enable-newlines)
1819 (defvar semantic-flex-enable-whitespace nil
1820 "When flexing, report 'whitespace as syntactic elements.
1821 Useful for languages where the syntax is whitespace dependent.
1822 Only set this on a per mode basis, not globally.")
1823 (make-variable-buffer-local 'semantic-flex-enable-whitespace)
1825 (defvar semantic-flex-enable-bol nil
1826 "When flexing, report beginning of lines as syntactic elements.
1827 Useful for languages like python which are indentation sensitive.
1828 Only set this on a per mode basis, not globally.")
1829 (make-variable-buffer-local 'semantic-flex-enable-bol)
1831 (defvar semantic-number-expression semantic-lex-number-expression
1832 "See variable `semantic-lex-number-expression'.")
1833 (make-variable-buffer-local 'semantic-number-expression)
1835 (defvar semantic-flex-depth 0
1836 "Default flexing depth.
1837 This specifies how many lists to create tokens in.")
1838 (make-variable-buffer-local 'semantic-flex-depth)
1840 (defun semantic-flex (start end &optional depth length)
1841 "Using the syntax table, do something roughly equivalent to flex.
1842 Semantically check between START and END. Optional argument DEPTH
1843 indicates at what level to scan over entire lists.
1844 The return value is a token stream. Each element is a list, such of
1845 the form (symbol start-expression . end-expression) where SYMBOL
1846 denotes the token type.
1847 See `semantic-flex-tokens' variable for details on token types.
1848 END does not mark the end of the text scanned, only the end of the
1849 beginning of text scanned. Thus, if a string extends past END, the
1850 end of the return token will be larger than END. To truly restrict
1851 scanning, use `narrow-to-region'.
1852 The last argument, LENGTH specifies that `semantic-flex' should only
1853 return LENGTH tokens."
1854 (message "`semantic-flex' is an obsolete function. Use `define-lex' to create lexers.")
1855 (if (not semantic-flex-keywords-obarray)
1856 (setq semantic-flex-keywords-obarray [ nil ]))
1857 (let ((ts nil)
1858 (pos (point))
1859 (ep nil)
1860 (curdepth 0)
1861 (cs (if comment-start-skip
1862 (concat "\\(\\s<\\|" comment-start-skip "\\)")
1863 (concat "\\(\\s<\\)")))
1864 (newsyntax (copy-syntax-table (syntax-table)))
1865 (mods semantic-flex-syntax-modifications)
1866 ;; Use the default depth if it is not specified.
1867 (depth (or depth semantic-flex-depth)))
1868 ;; Update the syntax table
1869 (while mods
1870 (modify-syntax-entry (car (car mods)) (car (cdr (car mods))) newsyntax)
1871 (setq mods (cdr mods)))
1872 (with-syntax-table newsyntax
1873 (goto-char start)
1874 (while (and (< (point) end) (or (not length) (<= (length ts) length)))
1875 (cond
1876 ;; catch beginning of lines when needed.
1877 ;; Must be done before catching any other tokens!
1878 ((and semantic-flex-enable-bol
1879 (bolp)
1880 ;; Just insert a (bol N . N) token in the token stream,
1881 ;; without moving the point. N is the point at the
1882 ;; beginning of line.
1883 (setq ts (cons (cons 'bol (cons (point) (point))) ts))
1884 nil)) ;; CONTINUE
1885 ;; special extensions, includes whitespace, nl, etc.
1886 ((and semantic-flex-extensions
1887 (let ((fe semantic-flex-extensions)
1888 (r nil))
1889 (while fe
1890 (if (looking-at (car (car fe)))
1891 (setq ts (cons (funcall (cdr (car fe))) ts)
1893 fe nil
1894 ep (point)))
1895 (setq fe (cdr fe)))
1896 (if (and r (not (car ts))) (setq ts (cdr ts)))
1897 r)))
1898 ;; catch newlines when needed
1899 ((looking-at "\\s-*\\(\n\\|\\s>\\)")
1900 (if semantic-flex-enable-newlines
1901 (setq ep (match-end 1)
1902 ts (cons (cons 'newline
1903 (cons (match-beginning 1) ep))
1904 ts))))
1905 ;; catch whitespace when needed
1906 ((looking-at "\\s-+")
1907 (if semantic-flex-enable-whitespace
1908 ;; Language wants whitespaces, link them together.
1909 (if (eq (car (car ts)) 'whitespace)
1910 (setcdr (cdr (car ts)) (match-end 0))
1911 (setq ts (cons (cons 'whitespace
1912 (cons (match-beginning 0)
1913 (match-end 0)))
1914 ts)))))
1915 ;; numbers
1916 ((and semantic-number-expression
1917 (looking-at semantic-number-expression))
1918 (setq ts (cons (cons 'number
1919 (cons (match-beginning 0)
1920 (match-end 0)))
1921 ts)))
1922 ;; symbols
1923 ((looking-at "\\(\\sw\\|\\s_\\)+")
1924 (setq ts (cons (cons
1925 ;; Get info on if this is a keyword or not
1926 (or (semantic-lex-keyword-p (match-string 0))
1927 'symbol)
1928 (cons (match-beginning 0) (match-end 0)))
1929 ts)))
1930 ;; Character quoting characters (ie, \n as newline)
1931 ((looking-at "\\s\\+")
1932 (setq ts (cons (cons 'charquote
1933 (cons (match-beginning 0) (match-end 0)))
1934 ts)))
1935 ;; Open parens, or semantic-lists.
1936 ((looking-at "\\s(")
1937 (if (or (not depth) (< curdepth depth))
1938 (progn
1939 (setq curdepth (1+ curdepth))
1940 (setq ts (cons (cons 'open-paren
1941 (cons (match-beginning 0) (match-end 0)))
1942 ts)))
1943 (setq ts (cons
1944 (cons 'semantic-list
1945 (cons (match-beginning 0)
1946 (save-excursion
1947 (condition-case nil
1948 (forward-list 1)
1949 ;; This case makes flex robust
1950 ;; to broken lists.
1951 (error
1952 (goto-char
1953 (funcall
1954 semantic-flex-unterminated-syntax-end-function
1955 'semantic-list
1956 start end))))
1957 (setq ep (point)))))
1958 ts))))
1959 ;; Close parens
1960 ((looking-at "\\s)")
1961 (setq ts (cons (cons 'close-paren
1962 (cons (match-beginning 0) (match-end 0)))
1963 ts))
1964 (setq curdepth (1- curdepth)))
1965 ;; String initiators
1966 ((looking-at "\\s\"")
1967 ;; Zing to the end of this string.
1968 (setq ts (cons (cons 'string
1969 (cons (match-beginning 0)
1970 (save-excursion
1971 (condition-case nil
1972 (forward-sexp 1)
1973 ;; This case makes flex
1974 ;; robust to broken strings.
1975 (error
1976 (goto-char
1977 (funcall
1978 semantic-flex-unterminated-syntax-end-function
1979 'string
1980 start end))))
1981 (setq ep (point)))))
1982 ts)))
1983 ;; comments
1984 ((looking-at cs)
1985 (if (and semantic-ignore-comments
1986 (not semantic-flex-enable-whitespace))
1987 ;; If the language doesn't deal with comments nor
1988 ;; whitespaces, ignore them here.
1989 (let ((comment-start-point (point)))
1990 (forward-comment 1)
1991 (if (eq (point) comment-start-point)
1992 ;; In this case our start-skip string failed
1993 ;; to work properly. Lets try and move over
1994 ;; whatever white space we matched to begin
1995 ;; with.
1996 (skip-syntax-forward "-.'" (point-at-eol))
1997 ;;(forward-comment 1)
1998 ;; Generate newline token if enabled
1999 (if (and semantic-flex-enable-newlines
2000 (bolp))
2001 (backward-char 1)))
2002 (if (eq (point) comment-start-point)
2003 (error "Strange comment syntax prevents lexical analysis"))
2004 (setq ep (point)))
2005 (let ((tk (if semantic-ignore-comments 'whitespace 'comment)))
2006 (save-excursion
2007 (forward-comment 1)
2008 ;; Generate newline token if enabled
2009 (if (and semantic-flex-enable-newlines
2010 (bolp))
2011 (backward-char 1))
2012 (setq ep (point)))
2013 ;; Language wants comments or want them as whitespaces,
2014 ;; link them together.
2015 (if (eq (car (car ts)) tk)
2016 (setcdr (cdr (car ts)) ep)
2017 (setq ts (cons (cons tk (cons (match-beginning 0) ep))
2018 ts))))))
2019 ;; punctuation
2020 ((looking-at "\\(\\s.\\|\\s$\\|\\s'\\)")
2021 (setq ts (cons (cons 'punctuation
2022 (cons (match-beginning 0) (match-end 0)))
2023 ts)))
2024 ;; unknown token
2026 (error "What is that?")))
2027 (goto-char (or ep (match-end 0)))
2028 (setq ep nil)))
2029 ;; maybe catch the last beginning of line when needed
2030 (and semantic-flex-enable-bol
2031 (= (point) end)
2032 (bolp)
2033 (setq ts (cons (cons 'bol (cons (point) (point))) ts)))
2034 (goto-char pos)
2035 ;;(message "Flexing muscles...done")
2036 (nreverse ts)))
2038 (provide 'semantic/lex)
2040 ;; Local variables:
2041 ;; generated-autoload-file: "loaddefs.el"
2042 ;; generated-autoload-load-name: "semantic/lex"
2043 ;; End:
2045 ;;; semantic/lex.el ends here