CVS rebase
[nedit-bw.git] / macroStringLiterals.diff
blob46315f3159397abdb544b590d6698c767f876679
1 Unlimited macro string literal length and single-quoted strings
3 Available as a patch:
5 http://sourceforge.net/tracker/?func=detail&atid=311005&aid=1598271&group_id=11005
6 [ 1598271 ] Unlimited macro string literal length, single-quoted strings
7 macroStringLiterals.diff 2006-11-21
9 String literals are scanned twice, firstly to calculate their space
10 requirements, secondly to read their contents into allocated memory.
12 Separate string literals that follow one another are combined into one,
13 avoiding run-time concatenation of the pieces. Also single-quoted string
14 literals are allowed, within which backslash ('\') has no special
15 meaning (so you can't include a single-quote in a single-quoted string).
17 Note that a double-quoted string can be continued over multiple lines
18 by ending each line but the last with a backslash, like in C.
20 2006-11-21:
22 Fixed adjacent string literal merging to allow concatenation with "".
24 ---
26 doc/help.etx | 41 +++++--
27 source/parse.y | 314 ++++++++++++++++++++++++++++++++++-----------------------
28 2 files changed, 221 insertions(+), 134 deletions(-)
30 diff --quilt old/doc/help.etx new/doc/help.etx
31 --- old/doc/help.etx
32 +++ new/doc/help.etx
33 @@ -1975,15 +1975,16 @@ Macro Language
34 are executed together conditionally, such as the body of a loop, are
35 surrounded by curly braces "{}".
37 Blank lines and comments are also allowed. Comments begin with a "#" and end
38 with a newline, and can appear either on a line by themselves, or at the end
39 - of a statement.
40 + of a statement line.
42 Statements which are too long to fit on a single line may be split across
43 several lines, by placing a backslash "\" character at the end of each line
44 - to be continued.
45 + to be continued. Note that a comment with a backslash at the end is treated
46 + as a continuation in this way too.
49 3>Data Types
51 The NEdit macro language recognizes only three data types, dynamic character
52 @@ -2001,16 +2002,18 @@ Macro Language
53 a = -1
54 b = 1000
56 4>Character String Constants
58 - Character string constants are enclosed in double quotes. For example:
59 + Character string constants are enclosed in single or double quotes, but the
60 + start and end quotes must be the same character. For example:
62 a = "a string"
63 - dialog("Hi there!", "OK")
64 + dialog('Hi there!', "OK")
66 - Strings may also include C-language style escape sequences:
67 + A double-quoted string may also include C-language style escape
68 + sequences:
70 \\ Backslash \t Tab \f Form feed
71 \" Double quote \b Backspace \a Alert
72 \n Newline \r Carriage return \v Vertical tab
74 @@ -2024,32 +2027,48 @@ Macro Language
75 explicit newlines, and also buffers its output on a per-line basis:
77 t_print("a = " a "\n")
79 Other characters can be expressed as backslash-escape sequences in macro
80 - strings. The format is the same as for regular expressions, described in the
81 - paragraphs headed "Octal and Hex Escape Sequences" of the section
82 - "Metacharacters_", except that an octal escape sequence can start with any
83 - octal digit, not just 0, so the single character string "\0033" is the same
84 - as "\33", "\x1B" and "\e" (for an ASCII version of NEdit).
85 + double-quoted strings. The format is the same as for regular expressions,
86 + described in the paragraphs headed "Octal and Hex Escape Sequences" of the
87 + section "Metacharacters_", except that an octal escape sequence can start with
88 + any octal digit, not just 0, so the single character string "\0033" is the
89 + same as "\33", "\x1B" and "\e" (for an ASCII version of NEdit).
91 Note that if you want to define a regular expression in a macro string,
92 you need to "double-up" the backslashes for the metacharacters with
93 special meaning in regular expressions. For example, the expression
95 (?N(\s|/\*(?n(?:(?!\*/).)*)\*/|//.*\n|\n)+)
97 which matches whitespace or C/C++/Java-style comments, should be written as
98 - a macro string as
99 + a macro double-quoted string as
101 "(?N(\\s|/\\*(?n(?:(?!\\*/).)*)\\*/|//.*\n|\n)+)"
103 (The "\n"s towards the end add literal newline characters to the string. The
104 regular expression interpretation treats the newlines as themselves. It can
105 also interpret the sequence "\\n" as a newline, although the macro string here
106 would then contain a literal backslash followed by a lowercase `N'.)
108 + Alternatively, if you don't need special escapes or a single quote
109 + (apostrophe) in your string (true for this example), just turn the expression
110 + into a single-quoted string, as
112 + '(?N(\s|/\*(?n(?:(?!\*/).)*)\*/|//.*\n|\n)+)'
114 + Neighboring string literals (separated by whitespace or line continuations)
115 + are combined, as if by the concatenation operation before use. For example
117 + "The backslash '" '\' "' is an " \
118 + 'escape only in "double-quoted" strings' "\n"
120 + is treated as a single string ending with a newline character, looking like
122 + The backslash '\' is an escape only in "double-quoted" strings
125 3>Variables
127 Variable names must begin either with a letter (local variables), or a $
128 (global variables). Beyond the first character, variables may also contain
129 diff --quilt old/source/parse.y new/source/parse.y
130 --- old/source/parse.y
131 +++ new/source/parse.y
132 @@ -44,10 +44,11 @@ static int yylex(void);
133 int yyparse(void);
134 static int follow(char expect, int yes, int no);
135 static int follow2(char expect1, int yes1, char expect2, int yes2, int no);
136 static int follow_non_whitespace(char expect, int yes, int no);
137 static Symbol *matchesActionRoutine(char **inPtr);
138 +static int scanString(void);
140 static char *ErrMsg;
141 static char *InPtr;
142 extern Inst *LoopStack[]; /* addresses of break, cont stmts */
143 extern Inst **LoopStackPtr; /* to fill at the end of a loop */
144 @@ -488,42 +489,43 @@ Program *ParseMacro(char *expr, char **m
145 *msg = "";
146 *stoppedAt = InPtr;
147 return prog;
151 -static int yylex(void)
152 +static char skipWhitespace(void)
154 - int i, len;
155 - Symbol *s;
156 - static DataValue value = {NO_TAG, {0}};
157 - static char escape[] = "\\\"ntbrfave";
158 -#ifdef EBCDIC_CHARSET
159 - static char replace[] = "\\\"\n\t\b\r\f\a\v\x27"; /* EBCDIC escape */
160 -#else
161 - static char replace[] = "\\\"\n\t\b\r\f\a\v\x1B"; /* ASCII escape */
162 -#endif
164 /* skip whitespace, backslash-newline combinations, and comments, which are
165 all considered whitespace */
166 for (;;) {
167 if (*InPtr == '\\' && *(InPtr + 1) == '\n')
168 InPtr += 2;
169 else if (*InPtr == ' ' || *InPtr == '\t')
170 InPtr++;
171 - else if (*InPtr == '#')
172 + else if (*InPtr == '#') {
173 + InPtr++;
174 while (*InPtr != '\n' && *InPtr != '\0') {
175 /* Comments stop at escaped newlines */
176 if (*InPtr == '\\' && *(InPtr + 1) == '\n') {
177 InPtr += 2;
178 break;
180 InPtr++;
181 - } else
184 + else
185 break;
187 + return *InPtr;
190 +static int yylex(void)
192 + int len;
193 + Symbol *s;
194 + static DataValue value = {NO_TAG, {0}};
196 + skipWhitespace();
198 /* return end of input at the end of the string */
199 if (*InPtr == '\0') {
200 return 0;
202 @@ -578,119 +580,14 @@ static int yylex(void)
204 yylval.sym = s;
205 return SYMBOL;
208 - /* Process quoted strings with embedded escape sequences:
209 - For backslashes we recognise hexadecimal values with initial 'x' such
210 - as "\x1B"; octal value (upto 3 oct digits with a possible leading zero)
211 - such as "\33", "\033" or "\0033", and the C escapes: \", \', \n, \t, \b,
212 - \r, \f, \a, \v, and the added \e for the escape character, as for REs.
213 - Disallow hex/octal zero values (NUL): instead ignore the introductory
214 - backslash, eg "\x0xyz" becomes "x0xyz" and "\0000hello" becomes
215 - "0000hello". */
217 - if (*InPtr == '\"') {
218 - char string[MAX_STRING_CONST_LEN], *p = string;
219 - char *backslash;
220 - InPtr++;
221 - while (*InPtr != '\0' && *InPtr != '\"' && *InPtr != '\n') {
222 - if (p >= string + MAX_STRING_CONST_LEN) {
223 - InPtr++;
224 - continue;
226 - if (*InPtr == '\\') {
227 - backslash = InPtr;
228 - InPtr++;
229 - if (*InPtr == '\n') {
230 - InPtr++;
231 - continue;
233 - if (*InPtr == 'x') {
234 - /* a hex introducer */
235 - int hexValue = 0;
236 - const char *hexDigits = "0123456789abcdef";
237 - const char *hexD;
238 - InPtr++;
239 - if (*InPtr == '\0' ||
240 - (hexD = strchr(hexDigits, tolower(*InPtr))) == NULL) {
241 - *p++ = 'x';
243 - else {
244 - hexValue = hexD - hexDigits;
245 - InPtr++;
246 - /* now do we have another digit? only accept one more */
247 - if (*InPtr != '\0' &&
248 - (hexD = strchr(hexDigits,tolower(*InPtr))) != NULL){
249 - hexValue = hexD - hexDigits + (hexValue << 4);
250 - InPtr++;
252 - if (hexValue != 0) {
253 - *p++ = (char)hexValue;
255 - else {
256 - InPtr = backslash + 1; /* just skip the backslash */
259 - continue;
261 - /* the RE documentation requires \0 as the octal introducer;
262 - here you can start with any octal digit, but you are only
263 - allowed up to three (or four if the first is '0'). */
264 - if ('0' <= *InPtr && *InPtr <= '7') {
265 - if (*InPtr == '0') {
266 - InPtr++; /* octal introducer: don't count this digit */
268 - if ('0' <= *InPtr && *InPtr <= '7') {
269 - /* treat as octal - first digit */
270 - char octD = *InPtr++;
271 - int octValue = octD - '0';
272 - if ('0' <= *InPtr && *InPtr <= '7') {
273 - /* second digit */
274 - octD = *InPtr++;
275 - octValue = (octValue << 3) + octD - '0';
276 - /* now do we have another digit? can we add it?
277 - if value is going to be too big for char (greater
278 - than 0377), stop converting now before adding the
279 - third digit */
280 - if ('0' <= *InPtr && *InPtr <= '7' &&
281 - octValue <= 037) {
282 - /* third digit is acceptable */
283 - octD = *InPtr++;
284 - octValue = (octValue << 3) + octD - '0';
287 - if (octValue != 0) {
288 - *p++ = (char)octValue;
290 - else {
291 - InPtr = backslash + 1; /* just skip the backslash */
294 - else { /* \0 followed by non-digits: go back to 0 */
295 - InPtr = backslash + 1; /* just skip the backslash */
297 - continue;
299 - for (i=0; escape[i]!='\0'; i++) {
300 - if (escape[i] == *InPtr) {
301 - *p++ = replace[i];
302 - InPtr++;
303 - break;
306 - /* if we get here, we didn't recognise the character after
307 - the backslash: just copy it next time round the loop */
309 - else {
310 - *p++= *InPtr++;
313 - *p = '\0';
314 - InPtr++;
315 - yylval.sym = InstallStringConstSymbol(string);
316 - return STRING;
317 + /* Process quoted strings */
319 + if (*InPtr == '\"' || *InPtr == '\'') {
320 + return scanString();
323 /* process remaining two character tokens or return single char as token */
324 switch(*InPtr++) {
325 case '>': return follow('=', GE, GT);
326 @@ -781,10 +678,181 @@ static Symbol *matchesActionRoutine(char
327 *inPtr = c;
328 return s;
332 +** Process quoted string literals. These can be in single or double quotes.
333 +** A sequence of string literals separated by whitespace (see skipWhitespace())
334 +** are read as a single string.
336 +** Double-quoted string literals allow embedded escape sequences:
337 +** For backslashes we recognise hexadecimal values with initial 'x' such
338 +** as "\x1B"; octal value (upto 3 oct digits with a possible leading zero)
339 +** such as "\33", "\033" or "\0033", and the C escapes: \", \', \n, \t, \b,
340 +** \r, \f, \a, \v, and the added \e for the escape character, as for REs.
341 +** We disallow hex/octal zero values (NUL): instead ignore the introductory
342 +** backslash, eg "\x0xyz" becomes "x0xyz" and "\0000hello" becomes "0000hello".
343 +** An escaped newline is elided, and the string content continues on the next
344 +** source line.
346 +static int scanString(void)
348 +# define SCANSTRING_WRITE_TO_STRING(p, len, val) \
349 + do { char mc = (val); if (p) { *p++ = mc; } else { ++len; } } while (0)
351 + /* scan the string twice: once to get its size, then again to build it */
352 + char *startPtr = InPtr;
353 + char *p = NULL, *string = NULL;
354 + int len, scan, i;
355 + char stopper, first_stopper = *startPtr;
356 + char *backslash;
357 + int handleBackslash;
359 + static char escape[] = "\\\"ntbrfave";
360 +#ifdef EBCDIC_CHARSET
361 + static char replace[] = "\\\"\n\t\b\r\f\a\v\x27"; /* EBCDIC escape */
362 +#else
363 + static char replace[] = "\\\"\n\t\b\r\f\a\v\x1B"; /* ASCII escape */
364 +#endif
366 + if (first_stopper != '\"' && first_stopper != '\'')
367 + return yyerror("expected a string");
369 + for (scan = 0; scan < 2; ++scan)
371 + InPtr = startPtr;
372 + stopper = first_stopper;
373 + handleBackslash = (stopper == '\"');
374 + len = 0;
375 + InPtr++;
376 + while (*InPtr != '\0' && *InPtr != '\n') {
377 + if (*InPtr == stopper) {
378 + char *endPtr = InPtr++;
379 + skipWhitespace();
380 + /* is this followed by another string literal? */
381 + if (*InPtr == '\"' || *InPtr == '\'') {
382 + stopper = *InPtr++; /* add it to the end of the first */
383 + handleBackslash = (stopper == '\"');
385 + else {
386 + InPtr = endPtr; /* no further string: restore position */
387 + break;
390 + else if (handleBackslash && *InPtr == '\\') {
391 + backslash = InPtr;
392 + InPtr++;
393 + if (*InPtr == '\n') { /* allows newline to be skipped */
394 + InPtr++;
395 + continue;
397 + if (*InPtr == 'x') {
398 + /* a hex introducer */
399 + int hexValue = 0;
400 + const char *hexDigits = "0123456789abcdef";
401 + const char *hexD;
402 + InPtr++;
403 + if (*InPtr == '\0')
404 + break;
405 + if ((hexD = strchr(hexDigits, tolower(*InPtr))) == NULL) {
406 + SCANSTRING_WRITE_TO_STRING(p, len, 'x');
408 + else {
409 + hexValue = hexD - hexDigits;
410 + InPtr++;
411 + if (*InPtr == '\0')
412 + break;
413 + /* now do we have another digit? only accept one more */
414 + if ((hexD = strchr(hexDigits,tolower(*InPtr))) != NULL){
415 + hexValue = hexD - hexDigits + (hexValue << 4);
416 + InPtr++;
418 + if (hexValue != 0) {
419 + SCANSTRING_WRITE_TO_STRING(p, len, (char)hexValue);
421 + else {
422 + InPtr = backslash + 1; /* just skip the backslash */
425 + continue;
427 + /* the RE documentation requires \0 as the octal introducer;
428 + here you can start with any octal digit, but you are only
429 + allowed up to three (or four if the first is '0'). */
430 + if ('0' <= *InPtr && *InPtr <= '7') {
431 + if (*InPtr == '0') {
432 + InPtr++; /* octal introducer: don't count this digit */
434 + if ('0' <= *InPtr && *InPtr <= '7') {
435 + /* treat as octal - first digit */
436 + char octD = *InPtr++;
437 + int octValue = octD - '0';
438 + if ('0' <= *InPtr && *InPtr <= '7') {
439 + /* second digit */
440 + octD = *InPtr++;
441 + octValue = (octValue << 3) + octD - '0';
442 + /* now do we have another digit? can we add it?
443 + if value is going to be too big for char (greater
444 + than 0377), stop converting now before adding the
445 + third digit */
446 + if ('0' <= *InPtr && *InPtr <= '7' &&
447 + octValue <= 037) {
448 + /* third digit is acceptable */
449 + octD = *InPtr++;
450 + octValue = (octValue << 3) + octD - '0';
453 + if (octValue != 0) {
454 + SCANSTRING_WRITE_TO_STRING(p, len, (char)octValue);
456 + else {
457 + InPtr = backslash + 1; /* just skip the backslash */
460 + else { /* \0 followed by non-digits: go back to 0 */
461 + InPtr = backslash + 1; /* just skip the backslash */
463 + continue;
465 + /* check for a valid c-style escape character */
466 + for (i = 0; escape[i] != '\0'; i++) {
467 + if (escape[i] == *InPtr) {
468 + SCANSTRING_WRITE_TO_STRING(p, len, replace[i]);
469 + InPtr++;
470 + break;
473 + /* if we get here, we didn't recognise the character after
474 + the backslash: just copy it next time round the loop */
476 + else {
477 + SCANSTRING_WRITE_TO_STRING(p, len, *InPtr++);
480 + /* terminate the string content */
481 + SCANSTRING_WRITE_TO_STRING(p, len, '\0');
482 + if (*InPtr == stopper) {
483 + if (!p) {
484 + /* this was the size measurement and validation */
485 + p = string = AllocString(len);
487 + else {
488 + /* OK: string now contains our string text */
489 + InPtr++; /* skip past stopper */
490 + yylval.sym = InstallStringConstSymbol(string);
491 + return STRING;
494 + else {
495 + /* failure: end quote doesn't match start quote */
496 + break;
499 + return yyerror("unterminated string");
503 ** Called by yacc to report errors (just stores for returning when
504 ** parsing is aborted. The error token action is to immediate abort
505 ** parsing, so this message is immediately reported to the caller
506 ** of ParseExpr)