Create branches
[apertium.git] / apertium / apertium / format.dtd
blob56eb206d2b0591b546a97cd5d82becdee7a43d8c
1 <!--
2 DTD for the format specification files
3 Sergio Ortiz 2005.05.13
4 -->
7 <!ELEMENT format (options,rules)>
8 <!ATTLIST format name CDATA #REQUIRED>
9 <!--
10 'format' is the root element containing the whole format specification
11 file. The attribute 'name' specifies the name of the format
12 -->
14 <!ELEMENT options (largeblocks,input,output, tag-name, escape-chars,space-chars,case-sensitive)>
15 <!--
16 General options of the format
17 -->
19 <!ELEMENT largeblocks EMPTY>
20 <!ATTLIST largeblocks size CDATA #REQUIRED>
21 <!--
22 The attribute size is used to define the maximal size in bytes of
23 inline format blocks
24 -->
26 <!ELEMENT input EMPTY>
27 <!ATTLIST input zip-path CDATA #IMPLIED>
28 <!ATTLIST input encoding CDATA #REQUIRED>
29 <!--
30 Reserved for future extensions
31 -->
33 <!ELEMENT output EMPTY>
34 <!ATTLIST output zip-path CDATA #IMPLIED>
35 <!ATTLIST output encoding CDATA #REQUIRED>
36 <!--
37 Reserved for future extensions
38 -->
40 <!ELEMENT tag-name EMPTY>
41 <!ATTLIST tag-name regexp CDATA #REQUIRED>
42 <!--
43 The attribute regexp defines (whith a _flex_ regular expression) how
44 take a tag name from a whole tag. '\'
45 -->
47 <!ELEMENT escape-chars EMPTY>
48 <!ATTLIST escape-chars regexp CDATA #REQUIRED>
49 <!--
50 The attribute regexp defines (whith a _flex_ regular expression) the
51 set of characters to be escaped with preceding a backslash '\'
52 -->
54 <!ELEMENT space-chars EMPTY>
55 <!ATTLIST space-chars regexp CDATA #REQUIRED>
56 <!--
57 Define the space characters (in regexp) with a _flex_ regular
58 expression
59 -->
61 <!ELEMENT case-sensitive EMPTY>
62 <!ATTLIST case-sensitive value (yes|no) #REQUIRED>
63 <!--
64 The attribute 'value' is set to 'yes' if the case is relevant in the
65 specification of the format. Otherwise is set to 'no'
66 -->
69 <!ELEMENT rules (format-rule|replacement-rule)+>
70 <!--
71 Group the rules of processing format and the rules of substitute
72 expressions by characters that are part of the text
73 -->
75 <!ELEMENT format-rule (tag|(begin,end))>
76 <!ATTLIST format-rule type (comment|empty|open|close) #IMPLIED>
77 <!ATTLIST format-rule eos (yes|no) #IMPLIED>
78 <!ATTLIST format-rule priority CDATA #REQUIRED>
79 <!--
80 Format rule parent element. It may include a 'tag' element or
81 a couple of elements 'begin', 'end'. In the first case, this element is
82 considered to be part of the format. In the second case, the begin and
83 the end element are considered to enclosing format. The attribute
84 'eos' (end of sentence) is set to 'yes' if that rule defines a dot in
85 the text being processed (is no by default). The attribute 'priority'
86 marks the order of precedence of the rule
87 -->
89 <!ELEMENT tag EMPTY>
90 <!ATTLIST tag regexp CDATA #REQUIRED>
91 <!--
92 Define an element that is part of the format by the pattern specified
93 as a value for the regexp attribute
94 -->
96 <!ELEMENT begin EMPTY>
97 <!ATTLIST begin regexp CDATA #REQUIRED>
98 <!--
99 The attribute 'regexp' is the regular expression that detects the
100 begining delimiter of a block of format
103 <!ELEMENT end EMPTY>
104 <!ATTLIST end regexp CDATA #REQUIRED>
105 <!--
106 The attribute 'regexp' is the regular expression that detects the
107 ending delimiter of a block of format
110 <!ELEMENT replacement-rule (replace+)>
111 <!ATTLIST replacement-rule regexp CDATA #REQUIRED>
112 <!--
113 Root element for a replacement rule. The attribute 'regexp' is the
114 general expression to detect the elements to replace
117 <!ELEMENT replace EMPTY>
118 <!ATTLIST replace source CDATA #REQUIRED>
119 <!ATTLIST replace target CDATA #REQUIRED>
120 <!ATTLIST replace prefer (yes|no) #IMPLIED>
121 <!--
122 Replacement rule. The 'source' is a string of one or more characters.
123 The 'target' MUST be a single character. The 'prefer' attribute, when
124 set to 'yes' defines the preferred reverse translation of the
125 replacement.