vimrc: use the 'diff' syntax highlighting; 'gitdiff' does not exist
[msysgit.git] / bin / s2p
blob31c5ab8d12c6bd9281b1b253a16d4f541ccf9675
1 #!/usr/bin/perl
2 eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
3 if $running_under_some_shell;
4 my $startperl;
5 my $perlpath;
6 ($startperl = <<'/../') =~ s/\s*\z//;
7 #!/usr/bin/perl
8 /../
9 ($perlpath = <<'/../') =~ s/\s*\z//;
10 /usr/bin/perl
11 /../
13 $0 =~ s/^.*?(\w+)[\.\w]*$/$1/;
15 # (p)sed - a stream editor
16 # History: Aug 12 2000: Original version.
17 # Mar 25 2002: Rearrange generated Perl program.
19 use strict;
20 use integer;
21 use Symbol;
23 =head1 NAME
25 psed - a stream editor
27 =head1 SYNOPSIS
29 psed [-an] script [file ...]
30 psed [-an] [-e script] [-f script-file] [file ...]
32 s2p [-an] [-e script] [-f script-file]
34 =head1 DESCRIPTION
36 A stream editor reads the input stream consisting of the specified files
37 (or standard input, if none are given), processes is line by line by
38 applying a script consisting of edit commands, and writes resulting lines
39 to standard output. The filename `C<->' may be used to read standard input.
41 The edit script is composed from arguments of B<-e> options and
42 script-files, in the given order. A single script argument may be specified
43 as the first parameter.
45 If this program is invoked with the name F<s2p>, it will act as a
46 sed-to-Perl translator. See L<"sed Script Translation">.
48 B<sed> returns an exit code of 0 on success or >0 if an error occurred.
50 =head1 OPTIONS
52 =over 4
54 =item B<-a>
56 A file specified as argument to the B<w> edit command is by default
57 opened before input processing starts. Using B<-a>, opening of such
58 files is delayed until the first line is actually written to the file.
60 =item B<-e> I<script>
62 The editing commands defined by I<script> are appended to the script.
63 Multiple commands must be separated by newlines.
65 =item B<-f> I<script-file>
67 Editing commands from the specified I<script-file> are read and appended
68 to the script.
70 =item B<-n>
72 By default, a line is written to standard output after the editing script
73 has been applied to it. The B<-n> option suppresses automatic printing.
75 =back
77 =head1 COMMANDS
79 B<sed> command syntax is defined as
81 Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
83 with whitespace being permitted before or after addresses, and between
84 the function character and the argument. The I<address>es and the
85 address inverter (C<!>) are used to restrict the application of a
86 command to the selected line(s) of input.
88 Each command must be on a line of its own, except where noted in
89 the synopses below.
91 The edit cycle performed on each input line consist of reading the line
92 (without its trailing newline character) into the I<pattern space>,
93 applying the applicable commands of the edit script, writing the final
94 contents of the pattern space and a newline to the standard output.
95 A I<hold space> is provided for saving the contents of the
96 pattern space for later use.
98 =head2 Addresses
100 A sed address is either a line number or a pattern, which may be combined
101 arbitrarily to construct ranges. Lines are numbered across all input files.
103 Any address may be followed by an exclamation mark (`C<!>'), selecting
104 all lines not matching that address.
106 =over 4
108 =item I<number>
110 The line with the given number is selected.
112 =item B<$>
114 A dollar sign (C<$>) is the line number of the last line of the input stream.
116 =item B</>I<regular expression>B</>
118 A pattern address is a basic regular expression (see
119 L<"Basic Regular Expressions">), between the delimiting character C</>.
120 Any other character except C<\> or newline may be used to delimit a
121 pattern address when the initial delimiter is prefixed with a
122 backslash (`C<\>').
124 =back
126 If no address is given, the command selects every line.
128 If one address is given, it selects the line (or lines) matching the
129 address.
131 Two addresses select a range that begins whenever the first address
132 matches, and ends (including that line) when the second address matches.
133 If the first (second) address is a matching pattern, the second
134 address is not applied to the very same line to determine the end of
135 the range. Likewise, if the second address is a matching pattern, the
136 first address is not applied to the very same line to determine the
137 begin of another range. If both addresses are line numbers,
138 and the second line number is less than the first line number, then
139 only the first line is selected.
142 =head2 Functions
144 The maximum permitted number of addresses is indicated with each
145 function synopsis below.
147 The argument I<text> consists of one or more lines following the command.
148 Embedded newlines in I<text> must be preceded with a backslash. Other
149 backslashes in I<text> are deleted and the following character is taken
150 literally.
152 =over 4
154 =cut
156 my %ComTab;
157 my %GenKey;
158 #--------------------------------------------------------------------------
159 $ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
161 =item [1addr]B<a\> I<text>
163 Write I<text> (which must start on the line following the command)
164 to standard output immediately before reading the next line
165 of input, either by executing the B<N> function or by beginning a new cycle.
167 =cut
169 #--------------------------------------------------------------------------
170 $ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
172 =item [2addr]B<b> [I<label>]
174 Branch to the B<:> function with the specified I<label>. If no label
175 is given, branch to the end of the script.
177 =cut
179 #--------------------------------------------------------------------------
180 $ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
181 { print <<'TheEnd'; } $doPrint = 0; goto EOS;
183 ### continue OK => next CYCLE;
185 =item [2addr]B<c\> I<text>
187 The line, or range of lines, selected by the address is deleted.
188 The I<text> (which must start on the line following the command)
189 is written to standard output. With an address range, this occurs at
190 the end of the range.
192 =cut
194 #--------------------------------------------------------------------------
195 $ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
196 { $doPrint = 0;
197 goto EOS;
200 ### continue OK => next CYCLE;
202 =item [2addr]B<d>
204 Deletes the pattern space and starts the next cycle.
206 =cut
208 #--------------------------------------------------------------------------
209 $ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
210 { s/^.*\n?//;
211 if(length($_)){ goto BOS } else { goto EOS }
214 ### continue OK => next CYCLE;
216 =item [2addr]B<D>
218 Deletes the pattern space through the first embedded newline or to the end.
219 If the pattern space becomes empty, a new cycle is started, otherwise
220 execution of the script is restarted.
222 =cut
224 #--------------------------------------------------------------------------
225 $ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
227 =item [2addr]B<g>
229 Replace the contents of the pattern space with the hold space.
231 =cut
233 #--------------------------------------------------------------------------
234 $ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
236 =item [2addr]B<G>
238 Append a newline and the contents of the hold space to the pattern space.
240 =cut
242 #--------------------------------------------------------------------------
243 $ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
245 =item [2addr]B<h>
247 Replace the contents of the hold space with the pattern space.
249 =cut
251 #--------------------------------------------------------------------------
252 $ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
254 =item [2addr]B<H>
256 Append a newline and the contents of the pattern space to the hold space.
258 =cut
260 #--------------------------------------------------------------------------
261 $ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
263 =item [1addr]B<i\> I<text>
265 Write the I<text> (which must start on the line following the command)
266 to standard output.
268 =cut
270 #--------------------------------------------------------------------------
271 $ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
273 =item [2addr]B<l>
275 Print the contents of the pattern space: non-printable characters are
276 shown in C-style escaped form; long lines are split and have a trailing
277 `C<\>' at the point of the split; the true end of a line is marked with
278 a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
279 BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
280 octal number for all other non-printable characters.
282 =cut
284 #--------------------------------------------------------------------------
285 $ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
286 { print $_, "\n" if $doPrint;
287 printQ() if @Q;
288 $CondReg = 0;
289 last CYCLE unless getsARGV();
290 chomp();
294 =item [2addr]B<n>
296 If automatic printing is enabled, write the pattern space to the standard
297 output. Replace the pattern space with the next line of input. If
298 there is no more input, processing is terminated.
300 =cut
302 #--------------------------------------------------------------------------
303 $ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
304 { printQ() if @Q;
305 $CondReg = 0;
306 last CYCLE unless getsARGV( $h );
307 chomp( $h );
308 $_ .= "\n$h";
312 =item [2addr]B<N>
314 Append a newline and the next line of input to the pattern space. If
315 there is no more input, processing is terminated.
317 =cut
319 #--------------------------------------------------------------------------
320 $ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
322 =item [2addr]B<p>
324 Print the pattern space to the standard output. (Use the B<-n> option
325 to suppress automatic printing at the end of a cycle if you want to
326 avoid double printing of lines.)
328 =cut
330 #--------------------------------------------------------------------------
331 $ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
332 { if( /^(.*)/ ){ print $1, "\n"; } }
335 =item [2addr]B<P>
337 Prints the pattern space through the first embedded newline or to the end.
339 =cut
341 #--------------------------------------------------------------------------
342 $ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
343 { print $_, "\n" if $doPrint;
344 last CYCLE;
348 =item [1addr]B<q>
350 Branch to the end of the script and quit without starting a new cycle.
352 =cut
354 #--------------------------------------------------------------------------
355 $ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
357 =item [1addr]B<r> I<file>
359 Copy the contents of the I<file> to standard output immediately before
360 the next attempt to read a line of input. Any error encountered while
361 reading I<file> is silently ignored.
363 =cut
365 #--------------------------------------------------------------------------
366 $ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
368 =item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
370 Substitute the I<replacement> string for the first substring in
371 the pattern space that matches the I<regular expression>.
372 Any character other than backslash or newline can be used instead of a
373 slash to delimit the regular expression and the replacement.
374 To use the delimiter as a literal character within the regular expression
375 and the replacement, precede the character by a backslash (`C<\>').
377 Literal newlines may be embedded in the replacement string by
378 preceding a newline with a backslash.
380 Within the replacement, an ampersand (`C<&>') is replaced by the string
381 matching the regular expression. The strings `C<\1>' through `C<\9>' are
382 replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
383 To get a literal `C<&>' or `C<\>' in the replacement text, precede it
384 by a backslash.
386 The following I<flags> modify the behaviour of the B<s> command:
388 =over 8
390 =item B<g>
392 The replacement is performed for all matching, non-overlapping substrings
393 of the pattern space.
395 =item B<1>..B<9>
397 Replace only the n-th matching substring of the pattern space.
399 =item B<p>
401 If the substitution was made, print the new value of the pattern space.
403 =item B<w> I<file>
405 If the substitution was made, write the new value of the pattern space
406 to the specified file.
408 =back
410 =cut
412 #--------------------------------------------------------------------------
413 $ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
415 =item [2addr]B<t> [I<label>]
417 Branch to the B<:> function with the specified I<label> if any B<s>
418 substitutions have been made since the most recent reading of an input line
419 or execution of a B<t> function. If no label is given, branch to the end of
420 the script.
423 =cut
425 #--------------------------------------------------------------------------
426 $ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
428 =item [2addr]B<w> I<file>
430 The contents of the pattern space are written to the I<file>.
432 =cut
434 #--------------------------------------------------------------------------
435 $ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
437 =item [2addr]B<x>
439 Swap the contents of the pattern space and the hold space.
441 =cut
443 #--------------------------------------------------------------------------
444 $ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
445 =item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
447 In the pattern space, replace all characters occuring in I<string1> by the
448 character at the corresponding position in I<string2>. It is possible
449 to use any character (other than a backslash or newline) instead of a
450 slash to delimit the strings. Within I<string1> and I<string2>, a
451 backslash followed by any character other than a newline is that literal
452 character, and a backslash followed by an `n' is replaced by a newline
453 character.
455 =cut
457 #--------------------------------------------------------------------------
458 $ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
460 =item [1addr]B<=>
462 Prints the current line number on the standard output.
464 =cut
466 #--------------------------------------------------------------------------
467 $ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
469 =item [0addr]B<:> [I<label>]
471 The command specifies the position of the I<label>. It has no other effect.
473 =cut
475 #--------------------------------------------------------------------------
476 $ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
477 $ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
478 # ';' to avoid warning on empty {}-block
480 =item [2addr]B<{> [I<command>]
482 =item [0addr]B<}>
484 These two commands begin and end a command list. The first command may
485 be given on the same line as the opening B<{> command. The commands
486 within the list are jointly selected by the address(es) given on the
487 B<{> command (but may still have individual addresses).
489 =cut
491 #--------------------------------------------------------------------------
492 $ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
494 =item [0addr]B<#> [I<comment>]
496 The entire line is ignored (treated as a comment). If, however, the first
497 two characters in the script are `C<#n>', automatic printing of output is
498 suppressed, as if the B<-n> option were given on the command line.
500 =back
502 =cut
504 use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
506 my $useDEBUG = exists( $ENV{PSEDDEBUG} );
507 my $useEXTBRE = $ENV{PSEDEXTBRE} || '';
508 $useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
510 my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
511 my $doOpenWrite = 1; # open w command output files at start (-a => 0)
512 my $svOpenWrite = 0; # save $doOpenWrite
514 # lower case $0 below as a VMSism. The VMS build procedure creates the
515 # s2p file traditionally in upper case on the disk. When VMS is in a
516 # case preserved or case sensitive mode, $0 will be returned in the exact
517 # case which will be on the disk, and that is not predictable at this time.
519 my $doGenerate = lc($0) eq 's2p';
521 # Collected and compiled script
523 my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code, $Func );
524 $Code = '';
526 ##################
527 # Compile Time
529 # Labels
531 # Error handling
533 sub Warn($;$){
534 my( $msg, $loc ) = @_;
535 $loc ||= '';
536 $loc .= ': ' if length( $loc );
537 warn( "$0: $loc$msg\n" );
540 $labNum = 0;
541 sub newLabel(){
542 return 'L_'.++$labNum;
545 # safeHere: create safe here delimiter and modify opcode and argument
547 sub safeHere($$){
548 my( $codref, $argref ) = @_;
549 my $eod = 'EOD000';
550 while( $$argref =~ /^$eod$/m ){
551 $eod++;
553 $$codref =~ s/TheEnd/$eod/e;
554 $$argref .= "$eod\n";
557 # Emit: create address logic and emit command
559 sub Emit($$$$$$){
560 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
561 my $cond = '';
562 if( defined( $addr1 ) ){
563 if( defined( $addr2 ) ){
564 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
565 } else {
566 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
568 $cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
571 if( $opcode eq '' ){
572 $Code .= "$cond$arg\n";
574 } elsif( $opcode =~ s/-X-/$arg/e ){
575 $Code .= "$cond$opcode\n";
577 } elsif( $opcode =~ /TheEnd/ ){
578 safeHere( \$opcode, \$arg );
579 $Code .= "$cond$opcode$arg";
581 } else {
582 $Code .= "$cond$opcode\n";
587 # Write (w command, w flag): store pathname
589 sub Write($$$$$$){
590 my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
591 $wFiles{$path} = '';
592 Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
596 # Label (: command): label definition
598 sub Label($$$$$$){
599 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
600 my $rc = 0;
601 $lab =~ s/\s+//;
602 if( length( $lab ) ){
603 my $h;
604 if( ! exists( $Label{$lab} ) ){
605 $h = $Label{$lab}{name} = newLabel();
606 } else {
607 $h = $Label{$lab}{name};
608 if( exists( $Label{$lab}{defined} ) ){
609 my $dl = $Label{$lab}{defined};
610 Warn( "duplicate label $lab (first defined at $dl)", $fl );
611 $rc = 1;
614 $Label{$lab}{defined} = $fl;
615 $Code .= "$h:;\n";
617 $rc;
620 # BeginBlock ({ command): push block start
622 sub BeginBlock($$$$$$){
623 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
624 push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
625 Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
628 # EndBlock (} command): check proper nesting
630 sub EndBlock($$$$$$){
631 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
632 my $rc;
633 my $jcom = pop( @BlockStack );
634 if( defined( $jcom ) ){
635 $rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
636 } else {
637 Warn( "unexpected `}'", $fl );
638 $rc = 1;
640 $rc;
643 # Branch (t, b commands): check or create label, substitute default
645 sub Branch($$$$$$){
646 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
647 $lab =~ s/\s+//; # no spaces at end
648 my $h;
649 if( length( $lab ) ){
650 if( ! exists( $Label{$lab} ) ){
651 $h = $Label{$lab}{name} = newLabel();
652 } else {
653 $h = $Label{$lab}{name};
655 push( @{$Label{$lab}{used}}, $fl );
656 } else {
657 $h = 'EOS';
659 $opcode =~ s/XXX/$h/e;
660 Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
663 # Change (c command): is special due to range end watching
665 sub Change($$$$$$){
666 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
667 my $kwd = $negated ? 'unless' : 'if';
668 if( defined( $addr2 ) ){
669 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
670 if( ! $negated ){
671 $addr1 = '$icnt = ('.$addr1.')';
672 $opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
674 } else {
675 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
677 safeHere( \$opcode, \$arg );
678 $Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
683 # Comment (# command): A no-op. Who would've thought that!
685 sub Comment($$$$$$){
686 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
687 ### $Code .= "# $arg\n";
692 sub stripRegex($$){
693 my( $del, $sref ) = @_;
694 my $regex = $del;
695 print "stripRegex:$del:$$sref:\n" if $useDEBUG;
696 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E(\s*)}{}s ){
697 my $sl = $2;
698 $regex .= $1.$sl.$del;
699 if( length( $sl ) % 2 == 0 ){
700 return $regex;
702 $regex .= $3;
704 undef();
707 # stripTrans: take a <del> terminated string from y command
708 # honoring and cleaning up of \-escaped <del>'s
710 sub stripTrans($$){
711 my( $del, $sref ) = @_;
712 my $t = '';
713 print "stripTrans:$del:$$sref:\n" if $useDEBUG;
714 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E}{}s ){
715 my $sl = $2;
716 $t .= $1;
717 if( length( $sl ) % 2 == 0 ){
718 $t .= $sl;
719 $t =~ s/\\\\/\\/g;
720 return $t;
722 chop( $sl );
723 $t .= $sl.$del.$3;
725 undef();
728 # makey - construct Perl y/// from sed y///
730 sub makey($$$){
731 my( $fr, $to, $fl ) = @_;
732 my $error = 0;
734 # Ensure that any '-' is up front.
735 # Diagnose duplicate contradicting mappings
736 my %tr;
737 for( my $i = 0; $i < length($fr); $i++ ){
738 my $fc = substr($fr,$i,1);
739 my $tc = substr($to,$i,1);
740 if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
741 Warn( "ambiguous translation for character `$fc' in `y' command",
742 $fl );
743 $error++;
745 $tr{$fc} = $tc;
747 $fr = $to = '';
748 if( exists( $tr{'-'} ) ){
749 ( $fr, $to ) = ( '-', $tr{'-'} );
750 delete( $tr{'-'} );
751 } else {
752 $fr = $to = '';
754 # might just as well sort it...
755 for my $fc ( sort keys( %tr ) ){
756 $fr .= $fc;
757 $to .= $tr{$fc};
759 # make embedded delimiters and newlines safe
760 $fr =~ s/([{}])/\$1/g;
761 $to =~ s/([{}])/\$1/g;
762 $fr =~ s/\n/\\n/g;
763 $to =~ s/\n/\\n/g;
764 return $error ? undef() : "{ y{$fr}{$to}; }";
767 ######
768 # makes - construct Perl s/// from sed s///
770 sub makes($$$$$$$){
771 my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
773 # make embedded newlines safe
774 $regex =~ s/\n/\\n/g;
775 $subst =~ s/\n/\\n/g;
777 my $code;
778 # n-th occurrence
780 if( length( $nmatch ) ){
781 $code = <<TheEnd;
782 { \$n = $nmatch;
783 while( --\$n && ( \$s = m ${regex}g ) ){}
784 \$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
785 \$CondReg ||= \$s;
786 TheEnd
787 } else {
788 $code = <<TheEnd;
789 { \$s = s ${regex}${subst}s${global};
790 \$CondReg ||= \$s;
791 TheEnd
793 if( $print ){
794 $code .= ' print $_, "\n" if $s;'."\n";
796 if( defined( $path ) ){
797 $wFiles{$path} = '';
798 $code .= " _w( '$path' ) if \$s;\n";
799 $GenKey{'w'} = 1;
801 $code .= "}";
804 =head1 BASIC REGULAR EXPRESSIONS
806 A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
807 of I<atoms>, for matching parts of a string, and I<bounds>, specifying
808 repetitions of a preceding atom.
810 =head2 Atoms
812 The possible atoms of a BRE are: B<.>, matching any single character;
813 B<^> and B<$>, matching the null string at the beginning or end
814 of a string, respectively; a I<bracket expressions>, enclosed
815 in B<[> and B<]> (see below); and any single character with no
816 other significance (matching that character). A B<\> before one
817 of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
818 after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
819 becomes an atom and establishes the target for a I<backreference>,
820 consisting of the substring that actually matches the enclosed atoms.
821 Finally, B<\> followed by one of the digits B<0> through B<9> is a
822 backreference.
824 A B<^> that is not first, or a B<$> that is not last does not have
825 a special significance and need not be preceded by a backslash to
826 become literal. The same is true for a B<]>, that does not terminate
827 a bracket expression.
829 An unescaped backslash cannot be last in a BRE.
831 =head2 Bounds
833 The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
834 atom; B<\{>I<count>B<\}>, specifying that many repetitions;
835 B<\{>I<minimum>B<,\}>, giving a lower limit; and
836 B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
837 bound.
839 A bound appearing as the first item in a BRE is taken literally.
841 =head2 Bracket Expressions
843 A I<bracket expression> is a list of characters, character ranges
844 and character classes enclosed in B<[> and B<]> and matches any
845 single character from the represented set of characters.
847 A character range is written as two characters separated by B<-> and
848 represents all characters (according to the character collating sequence)
849 that are not less than the first and not greater than the second.
850 (Ranges are very collating-sequence-dependent, and portable programs
851 should avoid relying on them.)
853 A character class is one of the class names
855 alnum digit punct
856 alpha graph space
857 blank lower upper
858 cntrl print xdigit
860 enclosed in B<[:> and B<:]> and represents the set of characters
861 as defined in ctype(3).
863 If the first character after B<[> is B<^>, the sense of matching is
864 inverted.
866 To include a literal `C<^>', place it anywhere else but first. To
867 include a literal 'C<]>' place it first or immediately after an
868 initial B<^>. To include a literal `C<->' make it the first (or
869 second after B<^>) or last character, or the second endpoint of
870 a range.
872 The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
873 match the null string at the beginning and end of a word respectively.
874 (Note that neither is identical to Perl's `\b' atom.)
876 =head2 Additional Atoms
878 Since some sed implementations provide additional regular expression
879 atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
880 the following backslash escapes:
882 =over 4
884 =item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
886 =item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
888 =item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
890 =item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
892 =item B<\y> Match the empty string at a word boundary.
894 =item B<\B> Match the empty string between any two either word or non-word characters.
896 =back
898 To enable this feature, the environment variable PSEDEXTBRE must be set
899 to a string containing the requested characters, e.g.:
900 C<PSEDEXTBRE='E<lt>E<gt>wW'>.
902 =cut
904 #####
905 # bre2p - convert BRE to Perl RE
907 sub peek(\$$){
908 my( $pref, $ic ) = @_;
909 $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
912 sub bre2p($$$){
913 my( $del, $pat, $fl ) = @_;
914 my $led = $del;
915 $led =~ tr/{([</})]>/;
916 $led = '' if $led eq $del;
918 $pat = substr( $pat, 1, length($pat) - 2 );
919 my $res = '';
920 my $bracklev = 0;
921 my $backref = 0;
922 my $parlev = 0;
923 for( my $ic = 0; $ic < length( $pat ); $ic++ ){
924 my $c = substr( $pat, $ic, 1 );
925 if( $c eq '\\' ){
926 ### backslash escapes
927 my $nc = peek($pat,$ic);
928 if( $nc eq '' ){
929 Warn( "`\\' cannot be last in pattern", $fl );
930 return undef();
932 $ic++;
933 if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
934 $res .= "\\$del";
936 } elsif( $nc =~ /([[.*\\n])/ ){
937 ## check for \-escaped magics and \n:
938 ## \[ \. \* \\ \n stay as they are
939 $res .= '\\'.$nc;
941 } elsif( $nc eq '(' ){ ## \( => (
942 $parlev++;
943 $res .= '(';
945 } elsif( $nc eq ')' ){ ## \) => )
946 $parlev--;
947 $backref++;
948 if( $parlev < 0 ){
949 Warn( "unmatched `\\)'", $fl );
950 return undef();
952 $res .= ')';
954 } elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
955 my $endpos = index( $pat, '\\}', $ic );
956 if( $endpos < 0 ){
957 Warn( "unmatched `\\{'", $fl );
958 return undef();
960 my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
961 $ic = $endpos + 1;
963 if( $res =~ /^\^?$/ ){
964 $res .= "\\{$rep\}";
965 } elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
966 my $min = $1;
967 my $com = $2 || '';
968 my $max = $3;
969 if( length( $max ) ){
970 if( $max < $min ){
971 Warn( "maximum less than minimum in `\\{$rep\\}'",
972 $fl );
973 return undef();
975 } else {
976 $max = '';
978 # simplify some
979 if( $min == 0 && $max eq '1' ){
980 $res .= '?';
981 } elsif( $min == 1 && "$com$max" eq ',' ){
982 $res .= '+';
983 } elsif( $min == 0 && "$com$max" eq ',' ){
984 $res .= '*';
985 } else {
986 $res .= "{$min$com$max}";
988 } else {
989 Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
990 return undef();
993 } elsif( $nc =~ /^[1-9]$/ ){
994 ## \1 .. \9 => \1 .. \9, but check for a following digit
995 if( $nc > $backref ){
996 Warn( "invalid backreference ($nc)", $fl );
997 return undef();
999 $res .= "\\$nc";
1000 if( peek($pat,$ic) =~ /[0-9]/ ){
1001 $res .= '(?:)';
1004 } elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
1005 ## extensions - at most <>wWyB - not in POSIX
1006 if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1007 $res .= '\\b(?<=\\W)';
1008 } elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1009 $res .= '\\b(?=\\W)';
1010 } elsif( $nc eq 'y' ){ ## \y => \b
1011 $res .= '\\b';
1012 } else { ## \B, \w, \W remain the same
1013 $res .= "\\$nc";
1015 } elsif( $nc eq $led ){
1016 ## \<closing bracketing-delimiter> - keep '\'
1017 $res .= "\\$nc";
1019 } else { ## \ <char> => <char> ("as if `\' were not present")
1020 $res .= $nc;
1023 } elsif( $c eq '.' ){ ## . => .
1024 $res .= $c;
1026 } elsif( $c eq '*' ){ ## * => * but \* if there's nothing preceding it
1027 if( $res =~ /^\^?$/ ){
1028 $res .= '\\*';
1029 } elsif( substr( $res, -1, 1 ) ne '*' ){
1030 $res .= $c;
1033 } elsif( $c eq '[' ){
1034 ## parse []: [^...] [^]...] [-...]
1035 my $add = '[';
1036 if( peek($pat,$ic) eq '^' ){
1037 $ic++;
1038 $add .= '^';
1040 my $nc = peek($pat,$ic);
1041 if( $nc eq ']' || $nc eq '-' ){
1042 $add .= $nc;
1043 $ic++;
1045 # check that [ is not trailing
1046 if( $ic >= length( $pat ) - 1 ){
1047 Warn( "unmatched `['", $fl );
1048 return undef();
1050 # look for [:...:] and x-y
1051 my $rstr = substr( $pat, $ic+1 );
1052 if( $rstr =~ /^((?:\[:\(\w+|[><]\):\]|[^]-](?:-[^]])?)*)/ ){
1053 my $cnt = $1;
1054 $ic += length( $cnt );
1055 $cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1056 # try some simplifications
1057 my $red = $cnt;
1058 if( $red =~ s/0-9// ){
1059 $cnt = $red.'\d';
1060 if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1061 $cnt = $red.'\w';
1064 $add .= $cnt;
1066 # POSIX 1003.2 has this (optional) for begin/end word
1067 $add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1068 $add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1072 ## may have a trailing `-' before `]'
1073 if( $ic < length($pat) - 1 &&
1074 substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1075 $ic += length( $1 );
1076 $add .= $1;
1077 # another simplification
1078 $add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1079 $res .= $add;
1080 } else {
1081 Warn( "unmatched `['", $fl );
1082 return undef();
1085 } elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1086 $res .= "\\$c";
1088 } elsif( $c eq ']' ){ ## unmatched ] is not magic
1089 $res .= ']';
1091 } elsif( $c =~ /[|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1092 $res .= "\\$c";
1094 } elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1095 $res .= length( $res ) ? '\\^' : '^';
1097 } elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1098 $res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1100 } else {
1101 $res .= $c;
1105 if( $parlev ){
1106 Warn( "unmatched `\\('", $fl );
1107 return undef();
1110 # final cleanup: eliminate raw HTs
1111 $res =~ s/\t/\\t/g;
1112 return $del . $res . ( $led ? $led : $del );
1116 #####
1117 # sub2p - convert sed substitution to Perl substitution
1119 sub sub2p($$$){
1120 my( $del, $subst, $fl ) = @_;
1121 my $led = $del;
1122 $led =~ tr/{([</})]>/;
1123 $led = '' if $led eq $del;
1125 $subst = substr( $subst, 1, length($subst) - 2 );
1126 my $res = '';
1128 for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1129 my $c = substr( $subst, $ic, 1 );
1130 if( $c eq '\\' ){
1131 ### backslash escapes
1132 my $nc = peek($subst,$ic);
1133 if( $nc eq '' ){
1134 Warn( "`\\' cannot be last in substitution", $fl );
1135 return undef();
1137 $ic++;
1138 if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1139 $res .= '\\' . $nc;
1140 } elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1141 $res .= '${' . $nc . '}';
1142 } else { ## everything else (includes &): omit \
1143 $res .= $nc;
1145 } elsif( $c eq '&' ){ ## & => $&
1146 $res .= '$&';
1147 } elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1148 $res .= '\\' . $c;
1149 } else {
1150 $res .= $c;
1154 # final cleanup: eliminate raw HTs
1155 $res =~ s/\t/\\t/g;
1156 return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1160 sub Parse(){
1161 my $error = 0;
1162 my( $pdef, $pfil, $plin );
1163 for( my $icom = 0; $icom < @Commands; $icom++ ){
1164 my $cmd = $Commands[$icom];
1165 print "Parse:$cmd:\n" if $useDEBUG;
1166 $cmd =~ s/^\s+//;
1167 next unless length( $cmd );
1168 my $scom = $icom;
1169 if( exists( $Defined{$icom} ) ){
1170 $pdef = $Defined{$icom};
1171 if( $pdef =~ /^ #(\d+)/ ){
1172 $pfil = 'expression #';
1173 $plin = $1;
1174 } else {
1175 $pfil = "$pdef l.";
1176 $plin = 1;
1178 } else {
1179 $plin++;
1181 my $fl = "$pfil$plin";
1183 # insert command as comment in gnerated code
1185 $Code .= "# $cmd\n" if $doGenerate;
1187 # The Address(es)
1189 my( $negated, $naddr, $addr1, $addr2 );
1190 $naddr = 0;
1191 if( $cmd =~ s/^(\d+)\s*// ){
1192 $addr1 = "$1"; $naddr++;
1193 } elsif( $cmd =~ s/^\$\s*// ){
1194 $addr1 = 'eofARGV()'; $naddr++;
1195 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1196 my $del = $1;
1197 my $regex = stripRegex( $del, \$cmd );
1198 if( defined( $regex ) ){
1199 $addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1200 $naddr++;
1201 } else {
1202 Warn( "malformed regex, 1st address", $fl );
1203 $error++;
1204 next;
1207 if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1208 if( $cmd =~ s/^(\d+)\s*// ){
1209 $addr2 = "$1"; $naddr++;
1210 } elsif( $cmd =~ s/^\$\s*// ){
1211 $addr2 = 'eofARGV()'; $naddr++;
1212 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1213 my $del = $1;
1214 my $regex = stripRegex( $del, \$cmd );
1215 if( defined( $regex ) ){
1216 $addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1217 $naddr++;
1218 } else {
1219 Warn( "malformed regex, 2nd address", $fl );
1220 $error++;
1221 next;
1223 } else {
1224 Warn( "invalid address after `,'", $fl );
1225 $error++;
1226 next;
1230 # address modifier `!'
1232 $negated = $cmd =~ s/^!\s*//;
1233 if( defined( $addr1 ) ){
1234 print "Parse: addr1=$addr1" if $useDEBUG;
1235 if( defined( $addr2 ) ){
1236 print ", addr2=$addr2 " if $useDEBUG;
1237 # both numeric and addr1 > addr2 => eliminate addr2
1238 undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1239 $addr2 =~ /^\d+$/ && $addr1 > $addr2;
1242 print 'negated' if $useDEBUG && $negated;
1243 print " command:$cmd\n" if $useDEBUG;
1245 # The Command
1247 if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1248 my $h = substr( $cmd, 0, 1 );
1249 Warn( "unknown command `$h'", $fl );
1250 $error++;
1251 next;
1253 my $key = $1;
1255 my $tabref = $ComTab{$key};
1256 $GenKey{$key} = 1;
1257 if( $naddr > $tabref->[0] ){
1258 Warn( "excess address(es)", $fl );
1259 $error++;
1260 next;
1263 my $arg = '';
1264 if( $tabref->[1] eq 'str' ){
1265 # take remainder - don't care if it is empty
1266 $arg = $cmd;
1267 $cmd = '';
1269 } elsif( $tabref->[1] eq 'txt' ){
1270 # multi-line text
1271 my $goon = $cmd =~ /(.*)\\$/;
1272 if( length( $1 ) ){
1273 Warn( "extra characters after command ($cmd)", $fl );
1274 $error++;
1276 while( $goon ){
1277 $icom++;
1278 if( $icom > $#Commands ){
1279 Warn( "unexpected end of script", $fl );
1280 $error++;
1281 last;
1283 $cmd = $Commands[$icom];
1284 $Code .= "# $cmd\n" if $doGenerate;
1285 $goon = $cmd =~ s/\\$//;
1286 $cmd =~ s/\\(.)/$1/g;
1287 $arg .= "\n" if length( $arg );
1288 $arg .= $cmd;
1290 $arg .= "\n" if length( $arg );
1291 $cmd = '';
1293 } elsif( $tabref->[1] eq 'sub' ){
1294 # s///
1295 if( ! length( $cmd ) ){
1296 Warn( "`s' command requires argument", $fl );
1297 $error++;
1298 next;
1300 if( $cmd =~ s{^([^\\\n])}{} ){
1301 my $del = $1;
1302 my $regex = stripRegex( $del, \$cmd );
1303 if( ! defined( $regex ) ){
1304 Warn( "malformed regular expression", $fl );
1305 $error++;
1306 next;
1308 $regex = bre2p( $del, $regex, $fl );
1310 # a trailing \ indicates embedded NL (in replacement string)
1311 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1312 $icom++;
1313 if( $icom > $#Commands ){
1314 Warn( "unexpected end of script", $fl );
1315 $error++;
1316 last;
1318 $cmd .= $Commands[$icom];
1319 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1322 my $subst = stripRegex( $del, \$cmd );
1323 if( ! defined( $regex ) ){
1324 Warn( "malformed substitution expression", $fl );
1325 $error++;
1326 next;
1328 $subst = sub2p( $del, $subst, $fl );
1330 # parse s/// modifier: g|p|0-9|w <file>
1331 my( $global, $nmatch, $print, $write ) =
1332 ( '', '', 0, undef );
1333 while( $cmd =~ s/^([gp0-9])// ){
1334 $1 eq 'g' ? ( $global = 'g' ) :
1335 $1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1337 $write = $1 if $cmd =~ s/w\s*(.*)$//;
1338 ### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1339 if( $global && length( $nmatch ) || length( $nmatch ) > 1 ){
1340 Warn( "conflicting flags `$global$nmatch'", $fl );
1341 $error++;
1342 next;
1345 $arg = makes( $regex, $subst,
1346 $write, $global, $print, $nmatch, $fl );
1347 if( ! defined( $arg ) ){
1348 $error++;
1349 next;
1352 } else {
1353 Warn( "improper delimiter in s command", $fl );
1354 $error++;
1355 next;
1358 } elsif( $tabref->[1] eq 'tra' ){
1359 # y///
1360 # a trailing \ indicates embedded newline
1361 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1362 $icom++;
1363 if( $icom > $#Commands ){
1364 Warn( "unexpected end of script", $fl );
1365 $error++;
1366 last;
1368 $cmd .= $Commands[$icom];
1369 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1371 if( ! length( $cmd ) ){
1372 Warn( "`y' command requires argument", $fl );
1373 $error++;
1374 next;
1376 my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1377 if( $d eq '\\' ){
1378 Warn( "`\\' not valid as delimiter in `y' command", $fl );
1379 $error++;
1380 next;
1382 my $fr = stripTrans( $d, \$cmd );
1383 if( ! defined( $fr ) || ! length( $cmd ) ){
1384 Warn( "malformed `y' command argument", $fl );
1385 $error++;
1386 next;
1388 my $to = stripTrans( $d, \$cmd );
1389 if( ! defined( $to ) ){
1390 Warn( "malformed `y' command argument", $fl );
1391 $error++;
1392 next;
1394 if( length($fr) != length($to) ){
1395 Warn( "string lengths in `y' command differ", $fl );
1396 $error++;
1397 next;
1399 if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1400 $error++;
1401 next;
1406 # $cmd must be now empty - exception is {
1407 if( $cmd !~ /^\s*$/ ){
1408 if( $key eq '{' ){
1409 # dirty hack to process command on '{' line
1410 $Commands[$icom--] = $cmd;
1411 } else {
1412 Warn( "extra characters after command ($cmd)", $fl );
1413 $error++;
1414 next;
1418 # Make Code
1420 if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1421 $tabref->[3], $arg, $fl ) ){
1422 $error++;
1426 while( @BlockStack ){
1427 my $bl = pop( @BlockStack );
1428 Warn( "start of unterminated `{'", $bl );
1429 $error++;
1432 for my $lab ( keys( %Label ) ){
1433 if( ! exists( $Label{$lab}{defined} ) ){
1434 for my $used ( @{$Label{$lab}{used}} ){
1435 Warn( "undefined label `$lab'", $used );
1436 $error++;
1441 exit( 1 ) if $error;
1445 ##############
1446 #### MAIN ####
1447 ##############
1449 sub usage(){
1450 print STDERR "Usage: sed [-an] command [file...]\n";
1451 print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1454 ###################
1455 # Here we go again...
1457 my $expr = 0;
1458 while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1459 my $opt = $1;
1460 my $arg = $2;
1461 shift( @ARGV );
1462 if( $opt eq 'e' ){
1463 if( length( $arg ) ){
1464 push( @Commands, split( "\n", $arg ) );
1465 } elsif( @ARGV ){
1466 push( @Commands, shift( @ARGV ) );
1467 } else {
1468 Warn( "option -e requires an argument" );
1469 usage();
1470 exit( 1 );
1472 $expr++;
1473 $Defined{$#Commands} = " #$expr";
1474 next;
1476 if( $opt eq 'f' ){
1477 my $path;
1478 if( length( $arg ) ){
1479 $path = $arg;
1480 } elsif( @ARGV ){
1481 $path = shift( @ARGV );
1482 } else {
1483 Warn( "option -f requires an argument" );
1484 usage();
1485 exit( 1 );
1487 my $fst = $#Commands + 1;
1488 open( SCRIPT, "<$path" ) || die( "$0: $path: could not open ($!)\n" );
1489 my $cmd;
1490 while( defined( $cmd = <SCRIPT> ) ){
1491 chomp( $cmd );
1492 push( @Commands, $cmd );
1494 close( SCRIPT );
1495 if( $#Commands >= $fst ){
1496 $Defined{$fst} = "$path";
1498 next;
1500 if( $opt eq '-' && $arg eq '' ){
1501 last;
1503 if( $opt eq 'h' || $opt eq '?' ){
1504 usage();
1505 exit( 0 );
1507 if( $opt eq 'n' ){
1508 $doAutoPrint = 0;
1509 } elsif( $opt eq 'a' ){
1510 $doOpenWrite = 0;
1511 } else {
1512 Warn( "illegal option `$opt'" );
1513 usage();
1514 exit( 1 );
1516 if( length( $arg ) ){
1517 unshift( @ARGV, "-$arg" );
1521 # A singleton command may be the 1st argument when there are no options.
1523 if( @Commands == 0 ){
1524 if( @ARGV == 0 ){
1525 Warn( "no script command given" );
1526 usage();
1527 exit( 1 );
1529 push( @Commands, split( "\n", shift( @ARGV ) ) );
1530 $Defined{0} = ' #1';
1533 print STDERR "Files: @ARGV\n" if $useDEBUG;
1535 # generate leading code
1537 $Func = <<'[TheEnd]';
1539 # openARGV: open 1st input file
1541 sub openARGV(){
1542 unshift( @ARGV, '-' ) unless @ARGV;
1543 my $file = shift( @ARGV );
1544 open( ARG, "<$file" )
1545 || die( "$0: can't open $file for reading ($!)\n" );
1546 $isEOF = 0;
1549 # getsARGV: Read another input line into argument (default: $_).
1550 # Move on to next input file, and reset EOF flag $isEOF.
1551 sub getsARGV(;\$){
1552 my $argref = @_ ? shift() : \$_;
1553 while( $isEOF || ! defined( $$argref = <ARG> ) ){
1554 close( ARG );
1555 return 0 unless @ARGV;
1556 my $file = shift( @ARGV );
1557 open( ARG, "<$file" )
1558 || die( "$0: can't open $file for reading ($!)\n" );
1559 $isEOF = 0;
1564 # eofARGV: end-of-file test
1566 sub eofARGV(){
1567 return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1570 # makeHandle: Generates another file handle for some file (given by its path)
1571 # to be written due to a w command or an s command's w flag.
1572 sub makeHandle($){
1573 my( $path ) = @_;
1574 my $handle;
1575 if( ! exists( $wFiles{$path} ) || $wFiles{$path} eq '' ){
1576 $handle = $wFiles{$path} = gensym();
1577 if( $doOpenWrite ){
1578 if( ! open( $handle, ">$path" ) ){
1579 die( "$0: can't open $path for writing: ($!)\n" );
1582 } else {
1583 $handle = $wFiles{$path};
1585 return $handle;
1588 # printQ: Print queued output which is either a string or a reference
1589 # to a pathname.
1590 sub printQ(){
1591 for my $q ( @Q ){
1592 if( ref( $q ) ){
1593 # flush open w files so that reading this file gets it all
1594 if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1595 open( $wFiles{$$q}, ">>$$q" );
1597 # copy file to stdout: slow, but safe
1598 if( open( RF, "<$$q" ) ){
1599 while( defined( my $line = <RF> ) ){
1600 print $line;
1602 close( RF );
1604 } else {
1605 print $q;
1608 undef( @Q );
1611 [TheEnd]
1613 # generate the sed loop
1615 $Code .= <<'[TheEnd]';
1616 sub openARGV();
1617 sub getsARGV(;\$);
1618 sub eofARGV();
1619 sub printQ();
1621 # Run: the sed loop reading input and applying the script
1623 sub Run(){
1624 my( $h, $icnt, $s, $n );
1625 # hack (not unbreakable :-/) to avoid // matching an empty string
1626 my $z = "\000"; $z =~ /$z/;
1627 # Initialize.
1628 openARGV();
1629 $Hold = '';
1630 $CondReg = 0;
1631 $doPrint = $doAutoPrint;
1632 CYCLE:
1633 while( getsARGV() ){
1634 chomp();
1635 $CondReg = 0; # cleared on t
1636 BOS:;
1637 [TheEnd]
1639 # parse - avoid opening files when doing s2p
1641 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1642 if $doGenerate;
1643 Parse();
1644 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1645 if $doGenerate;
1647 # append trailing code
1649 $Code .= <<'[TheEnd]';
1650 EOS: if( $doPrint ){
1651 print $_, "\n";
1652 } else {
1653 $doPrint = $doAutoPrint;
1655 printQ() if @Q;
1658 exit( 0 );
1660 [TheEnd]
1663 # append optional functions, prepend prototypes
1665 my $Proto = "# prototypes\n";
1666 if( $GenKey{'l'} ){
1667 $Proto .= "sub _l();\n";
1668 $Func .= <<'[TheEnd]';
1669 # _l: l command processing
1671 sub _l(){
1672 my $h = $_;
1673 my $mcpl = 70;
1674 # transform non printing chars into escape notation
1675 $h =~ s/\\/\\\\/g;
1676 if( $h =~ /[^[:print:]]/ ){
1677 $h =~ s/\a/\\a/g;
1678 $h =~ s/\f/\\f/g;
1679 $h =~ s/\n/\\n/g;
1680 $h =~ s/\t/\\t/g;
1681 $h =~ s/\r/\\r/g;
1682 $h =~ s/\e/\\e/g;
1683 $h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1685 # split into lines of length $mcpl
1686 while( length( $h ) > $mcpl ){
1687 my $l = substr( $h, 0, $mcpl-1 );
1688 $h = substr( $h, $mcpl );
1689 # remove incomplete \-escape from end of line
1690 if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1691 $h = $1 . $h;
1693 print $l, "\\\n";
1695 print "$h\$\n";
1698 [TheEnd]
1701 if( $GenKey{'r'} ){
1702 $Proto .= "sub _r(\$);\n";
1703 $Func .= <<'[TheEnd]';
1704 # _r: r command processing: Save a reference to the pathname.
1706 sub _r($){
1707 my $path = shift();
1708 push( @Q, \$path );
1711 [TheEnd]
1714 if( $GenKey{'t'} ){
1715 $Proto .= "sub _t();\n";
1716 $Func .= <<'[TheEnd]';
1717 # _t: t command - condition register test/reset
1719 sub _t(){
1720 my $res = $CondReg;
1721 $CondReg = 0;
1722 $res;
1725 [TheEnd]
1728 if( $GenKey{'w'} ){
1729 $Proto .= "sub _w(\$);\n";
1730 $Func .= <<'[TheEnd]';
1731 # _w: w command and s command's w flag - write to file
1733 sub _w($){
1734 my $path = shift();
1735 my $handle = $wFiles{$path};
1736 if( ! $doOpenWrite && ! defined( fileno( $handle ) ) ){
1737 open( $handle, ">$path" )
1738 || die( "$0: $path: cannot open ($!)\n" );
1740 print $handle $_, "\n";
1743 [TheEnd]
1746 $Code = $Proto . $Code;
1748 # magic "#n" - same as -n option
1750 $doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1752 # eval code - check for errors
1754 print "Code:\n$Code$Func" if $useDEBUG;
1755 eval $Code . $Func;
1756 if( $@ ){
1757 print "Code:\n$Code$Func";
1758 die( "$0: internal error - generated incorrect Perl code: $@\n" );
1761 if( $doGenerate ){
1763 # write full Perl program
1766 # bang line, declarations, prototypes
1767 print <<TheEnd;
1768 #!$perlpath -w
1769 eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1770 if 0;
1771 \$0 =~ s/^.*?(\\w+)\[\\.\\w+\]*\$/\$1/;
1773 use strict;
1774 use Symbol;
1775 use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1776 \$doAutoPrint \$doOpenWrite \$doPrint };
1777 \$doAutoPrint = $doAutoPrint;
1778 \$doOpenWrite = $doOpenWrite;
1779 TheEnd
1781 my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1782 if( $wf ne "''" ){
1783 print <<TheEnd;
1784 sub makeHandle(\$);
1785 for my \$p ( $wf ){
1786 exit( 1 ) unless makeHandle( \$p );
1788 TheEnd
1791 print $Code;
1792 print "Run();\n";
1793 print $Func;
1794 exit( 0 );
1796 } else {
1798 # execute: make handles (and optionally open) all w files; run!
1799 for my $p ( keys( %wFiles ) ){
1800 exit( 1 ) unless makeHandle( $p );
1802 Run();
1806 =head1 ENVIRONMENT
1808 The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1809 See L<"Additional Atoms">.
1811 =head1 DIAGNOSTICS
1813 =over 4
1815 =item ambiguous translation for character `%s' in `y' command
1817 The indicated character appears twice, with different translations.
1819 =item `[' cannot be last in pattern
1821 A `[' in a BRE indicates the beginning of a I<bracket expression>.
1823 =item `\' cannot be last in pattern
1825 A `\' in a BRE is used to make the subsequent character literal.
1827 =item `\' cannot be last in substitution
1829 A `\' in a subsitution string is used to make the subsequent character literal.
1831 =item conflicting flags `%s'
1833 In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1834 multiple n-th occurrence flags are specified. Note that only the digits
1835 `1' through `9' are permitted.
1837 =item duplicate label %s (first defined at %s)
1839 =item excess address(es)
1841 The command has more than the permitted number of addresses.
1843 =item extra characters after command (%s)
1845 =item illegal option `%s'
1847 =item improper delimiter in s command
1849 The BRE and substitution may not be delimited with `\' or newline.
1851 =item invalid address after `,'
1853 =item invalid backreference (%s)
1855 The specified backreference number exceeds the number of backreferences
1856 in the BRE.
1858 =item invalid repeat clause `\{%s\}'
1860 The repeat clause does not contain a valid integer value, or pair of
1861 values.
1863 =item malformed regex, 1st address
1865 =item malformed regex, 2nd address
1867 =item malformed regular expression
1869 =item malformed substitution expression
1871 =item malformed `y' command argument
1873 The first or second string of a B<y> command is syntactically incorrect.
1875 =item maximum less than minimum in `\{%s\}'
1877 =item no script command given
1879 There must be at least one B<-e> or one B<-f> option specifying a
1880 script or script file.
1882 =item `\' not valid as delimiter in `y' command
1884 =item option -e requires an argument
1886 =item option -f requires an argument
1888 =item `s' command requires argument
1890 =item start of unterminated `{'
1892 =item string lengths in `y' command differ
1894 The translation table strings in a B<y> commanf must have equal lengths.
1896 =item undefined label `%s'
1898 =item unexpected `}'
1900 A B<}> command without a preceding B<{> command was encountered.
1902 =item unexpected end of script
1904 The end of the script was reached although a text line after a
1905 B<a>, B<c> or B<i> command indicated another line.
1907 =item unknown command `%s'
1909 =item unterminated `['
1911 A BRE contains an unterminated bracket expression.
1913 =item unterminated `\('
1915 A BRE contains an unterminated backreference.
1917 =item `\{' without closing `\}'
1919 A BRE contains an unterminated bounds specification.
1921 =item `\)' without preceding `\('
1923 =item `y' command requires argument
1925 =back
1927 =head1 EXAMPLE
1929 The basic material for the preceding section was generated by running
1930 the sed script
1932 #no autoprint
1933 s/^.*Warn( *"\([^"]*\)".*$/\1/
1934 t process
1936 :process
1937 s/$!/%s/g
1938 s/$[_[:alnum:]]\{1,\}/%s/g
1939 s/\\\\/\\/g
1940 s/^/=item /
1943 on the program's own text, and piping the output into C<sort -u>.
1946 =head1 SED SCRIPT TRANSLATION
1948 If this program is invoked with the name F<s2p> it will act as a
1949 sed-to-Perl translator. After option processing (all other
1950 arguments are ignored), a Perl program is printed on standard
1951 output, which will process the input stream (as read from all
1952 arguments) in the way defined by the sed script and the option setting
1953 used for the translation.
1955 =head1 SEE ALSO
1957 perl(1), re_format(7)
1959 =head1 BUGS
1961 The B<l> command will show escape characters (ESC) as `C<\e>', but
1962 a vertical tab (VT) in octal.
1964 Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1966 The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1967 is "the last pattern used, at run time". This deviates from the Perl
1968 interpretation, which will re-use the "last last successfully executed
1969 regular expression". Since keeping track of pattern usage would create
1970 terribly cluttered code, and differences would only appear in obscure
1971 context (where other B<sed> implementations appear to deviate, too),
1972 the Perl semantics was adopted. Note that common usage of this feature,
1973 such as in C</abc/s//xyz/>, will work as expected.
1975 Collating elements (of bracket expressions in BREs) are not implemented.
1977 =head1 STANDARDS
1979 This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1980 definition of B<sed>, and is compatible with the I<OpenBSD>
1981 implementation, except where otherwise noted (see L<"BUGS">).
1983 =head1 AUTHOR
1985 This Perl implementation of I<sed> was written by Wolfgang Laun,
1986 I<Wolfgang.Laun@alcatel.at>.
1988 =head1 COPYRIGHT and LICENSE
1990 This program is free and open software. You may use, modify,
1991 distribute, and sell this program (and any modified variants) in any
1992 way you wish, provided you do not restrict others from doing the same.
1994 =cut