1 \input texinfo @c -*-texinfo-*-
3 @setfilename psgml-api.info
5 @c @setchapternewpage odd
7 @c $Id: psgml-api.texi,v 1.2 1999/12/03 17:29:39 lenst Exp $
12 * PSGML-API: (psgml-api). PSGML, the API documentation.
18 Documentation for PSGML, a major mode for SGML.
20 Copyright 1994 Lennart Staflin
22 Permission is granted to make and distribute verbatim
23 copies of this manual provided the copyright notice and
24 this permission notice are preserved on all copies.
27 Permission is granted to process this file through TeX
28 and print the results, provided the printed document
29 carries a copying permission notice identical to this
30 one except for the removal of this paragraph (this
31 paragraph not being relevant to the printed manual).
34 Permission is granted to copy and distribute modified
35 versions of this manual under the conditions for
36 verbatim copying, and provided that the entire
37 resulting derived work is distributed under the terms
38 of a permission notice identical to this one.
40 Permission is granted to copy and distribute
41 translations of this manual into another language,
42 under the above conditions for modified versions,
43 except that this permission notice may be stated in a
44 translation approved by the Free Software Foundation.
50 @title Internals of PSGML
51 @author Lennart Staflin
53 @c The following two commands
54 @c start the copyright page.
56 @vskip 0pt plus 1filll
57 Copyright @copyright{} 1994 Lennart Staflin
61 Permission is granted to make and distribute verbatim
62 copies of this manual provided the copyright notice and
63 this permission notice are preserved on all copies.
66 Permission is granted to process this file through TeX
67 and print the results, provided the printed document
68 carries a copying permission notice identical to this
69 one except for the removal of this paragraph (this
70 paragraph not being relevant to the printed manual).
73 Permission is granted to copy and distribute modified
74 versions of this manual under the conditions for
75 verbatim copying, and provided that the entire
76 resulting derived work is distributed under the terms
77 of a permission notice identical to this one.
79 Permission is granted to copy and distribute
80 translations of this manual into another language,
81 under the above conditions for modified versions,
82 except that this permission notice may be stated in a
83 translation approved by the Free Software Foundation.
86 @node Top, Types, (dir), (dir)
87 @comment node-name, next, previous, up
93 * Types:: Types and operations
95 * Implementation:: Implementation notes
98 --- The Detailed Node Listing ---
102 * element:: The element structure
103 * attribute:: Attribute Types
104 * parser state:: Parser state
106 * entities:: Entities
109 @node Types, Hooks, Top, Top
110 @comment node-name, next, previous, up
111 @chapter Types and operations
114 Names of element types, attributes and entities should be treated as far
115 as possible as a real type. In versions prior to 1.0 names are
116 represented by lisp symbols but in 1.0 they are strings.
118 Perhaps I should make a @file{psgml-api.el} that defines some functions
119 to deal with names. Then it would be possible to write code that works
124 * element:: The element structure
125 * attribute:: Attribute Types
126 * parser state:: Parser state
128 * entities:: Entities
131 @node element, attribute, Types, Types
132 @comment node-name, next, previous, up
133 @section The element structure
135 @deftp {Data type} element
136 The basic data type representing the element structure is the Element (this
137 happens to be a node in the parse tree).
141 @subsection Mapping buffer positions to elements
143 @defun sgml-find-context-of pos
144 Return the element current at buffer position @var{pos}. If @var{pos}
145 is in markup, @code{sgml-markup-type} will be a symbol identifying the
146 markup type. It will be @code{nil} if @var{pos} is outside markup.
149 @defun sgml-find-element-of pos
150 Return the element containing the character at buffer position @var{pos}.
154 @subsection Functions operating on elements
156 @defun sgml-element-name element
157 Returns the name of the element. (obsolete)
160 @defun sgml-element-gi element
161 Return the general identifier (string) of @var{element}.
164 @defun sgml-element-level element
165 Returns the level of @var{element} in the element structure. The
166 document element is level 1.
169 @subsubsection Structure
171 @defun sgml-top-element
172 Return the document element.
175 @defun sgml-off-top-p element
176 True if @var{element} is the pseudo element above the document element.
179 These functions return other related elements, or possibly @code{nil}.
181 @defun sgml-element-content element
182 First element in content of @var{element}, or nil.
185 @defun sgml-element-next element
186 Next sibling of @var{element}. To loop thru all sub elements of an
187 element, @code{el}, You could do like this:
190 (let ((c (sgml-element-content el)))
192 <<Do something with c>>
193 (setq c (sgml-element-next c))))
197 @defun sgml-element-parent element
198 Parent of @var{element}.
204 @defun sgml-element-stag-optional element
205 Return true if the start-tag of @var{element} is omissible.
208 @defun sgml-element-etag-optional element
209 Return true if the end-tag of @var{element} is omissible.
212 @defun sgml-element-stag-len element
213 Return the length of the start-tag of @var{element}. If the start-tag
214 has been omitted the length is 0.
217 @defun sgml-element-etag-len element
218 Return the length of the end-tag of @var{element}. If the end-tag
219 has been omitted the length is 0.
222 @defun sgml-element-net-enabled element
223 Return true, if @var{element} or some parent of the element has null end
224 tag (NET) enabled. Return @code{t}, if it is @var{element} that has NET
229 @subsubsection Positions
231 These functions relates an element to positions in the buffer.
233 @defun sgml-element-start element
234 Position of start of @var{element}.
237 @defun sgml-element-end element
238 Position after @var{element}.
241 @defun sgml-element-stag-end element
242 Position after start-tag of @var{element}.
245 @defun sgml-element-etag-start element
246 Position before end-tag of @var{element}.
250 @subsubsection Attributes
252 @defun sgml-element-attlist element
253 Return the attribute declaration list for @var{element}.
256 @defun sgml-element-attribute-specification-list element
257 Return the attribute specification list for @var{element}.
260 @defun sgml-element-attval element attribute
261 Return the value of the @var{attribute} in @var{element}, string or nil.
265 @subsubsection Misc technical
268 @defun sgml-element-data-p element
269 True if @var{element} can contain data characters.
272 @defun sgml-element-mixed element
273 True if @var{element} has mixed content.
276 @defun sgml-element-eltype element
279 @defun sgml-element-empty element
280 True if @var{element} is empty.
283 @defun sgml-element-excludes element
286 @defun sgml-element-includes element
289 @defun sgml-element-model element
290 Declared content or content model of @var{element}.
293 @defun sgml-element-context-string element
294 Return string describing context of @var{element}.
297 @c ----------------------------------------------------------------------
298 @node attribute, parser state, element, Types
299 @comment node-name, next, previous, up
300 @section Attribute Types
302 Basic types for attributes are names and values. (See note about names
303 in @ref{Types}.) And attribute values (attval) by lisp strings.
306 @subsection Attribute Declaration List Type
308 @deftp {Data type} attlist attdecl*
309 This is the result of the ATTLIST declarations in the DTD.
310 All attribute declarations for an element is the elements
314 @defun sgml-lookup-attdecl name attlist
315 Return attribute declaration (attdecl) for attribute @var{name} in
316 attribute declaration list @var{attlist}.
319 @defun sgml-attribute-with-declared-value attlist declared-value
320 Find the first attribute in @var{attlist} that has @var{declared-value}.
324 @subsection Attribute Declaration Type
326 @deftp {Data type} attdecl name declared-value default-value
327 This is the representation of an individual attribute declaration
328 contained in an ATTLIST declaration.
331 @defun sgml-make-attdecl name declared-value default-value
335 @defun sgml-attdecl-name attdecl
336 Returns the name of an attribute declaration.
339 @defun sgml-attdecl-declared-value attdecl
340 Returns the declared-value of attribute declaration @var{attdecl}.
343 @defun sgml-attdecl-default-value: attdecl
344 Returns the default-value of attribute declaration @var{attdecl}.
348 @subsection Declared Value Type
350 @deftp {Data type} declared-value (token-group | notation | simple)
351 A declared value of an SGML attribute can be of different kinds. If the
352 declared value is a token group there is an associated list of name
353 tokens. For notation there is also a list of associated names, the
354 allowed notation names. The other declared values are represented by the
355 type name as a lisp symbol.
356 @c token-group = nametoken+
357 @c notation = nametoken+
361 @defun sgml-declared-value-token-group declared-value
362 Return the name token group for the @var{declared-value}.
363 This applies to name token groups. For other declared values nil is
367 @defun sgml-declared-value-notation declared-value
368 Return the list of notation names for the @var{declared-value}.
369 This applies to notation declared value. For other declared values
374 @subsection Default Value Type
376 @deftp {Data type} default-value (required | implied | conref | specified )
377 @c implied, conref = constant symbol
378 @c specified = (fixed | normal)
379 @c fixed, normal = attval
380 There are several kinds of default values. The @var{required},
381 @var{implied}, and @var{conref} has no associated information. The
382 @var{specified} have an associated attribute value and can be either
383 @code{fixed} or @code{normal}.
386 @defun sgml-make-default-value type &optional attval
389 @defun sgml-default-value-attval default-value
390 Return the actual default value of the declared @var{default-value}.
391 The actual value is a string. Return @code{nil} if no actual value.
394 @defun sgml-default-value-type-p type default-value
395 Return true if @var{default-value} is of @var{type}. Where @var{type}
396 is a symbol, one of @code{required}, @code{implied}, @code{conref}, or
401 @subsection Attribute Specification Type
403 @deftp {Data type} attspec name attval
404 This is the result of parsing an attribute specification.
407 @defun sgml-make-attspec name attval
408 Create an attspec from @var{name} and @var{attval}.
409 Special case, if @var{attval} is @code{nil} this is an implied attribute.
412 @defun sgml-attspec-name attspec
413 Return the name of the attribute specified by @var{attspec}.
416 @defun sgml-attspec-attval attspec
417 Return the value (attval) of attribute specification @var{attspec}.
418 If @var{attspec} is @code{nil}, @code{nil} is returned.
422 @subsection Attribute Specification List Type
424 @deftp {Data type} asl attspec*
425 This is the result of parsing an attribute specification list.
428 @defun sgml-lookup-attspec name asl
429 Return the attribute specification for attribute with @var{name} in the
430 attribute specification list @var{asl}. If the attribute is unspecified
431 @code{nil} is returned.
436 @c ------------------------------------------------------------------
437 @node parser state, DTD, attribute, Types
438 @comment node-name, next, previous, up
439 @section Parser state
441 The state of the parser that needs to be kept between commands are
442 stored in a buffer local variable. Some global variables are
443 initialised from this variable when parsing starts.
445 @defvar sgml-buffer-parse-state
446 The state of the parser that is kept between commands. The value of
447 this variable is of type pstate.
450 @deftp {Data type} pstate
454 @defun sgml-pstate-dtd pstate
455 The document type information (dtd) for the parser.
459 @c ------------------------------------------------------------------
460 @node DTD, entities, parser state, Types
461 @comment node-name, next, previous, up
464 @deftp {Data type} dtd
465 Represents what PSGML knows about the DTD.
468 @defun sgml-dtd-doctype dtd
469 The document type name.
472 @defun sgml-dtd-eltypes dtd
473 The table of element types.
476 @defun sgml-dtd-entities dtd
477 The table of declared general entities (entity-table).
480 @defun sgml-dtd-parameters dtd
481 The table of declared parameter entities (entity-table).
484 @defun sgml-dtd-shortmaps dtd
485 The list of short reference maps.
488 @defun sgml-dtd-notations dtd
494 @c ------------------------------------------------------------------
495 @node entities, , DTD, Types
496 @comment node-name, next, previous, up
499 @deftp {Data type} entity
500 An entity has the following properties:
504 The name of the entity (a string). This is either the name of a
505 declared entity (general or parameter) or the doctype name if it is the
506 implicit entity referred to by the doctype declaration.
509 This is a symbol. It is @code{text} if it is a text entity, other
510 values are @code{cdata}, @code{ndata}, @code{sdata}, @code{sgml} or
514 This is the text of the entity, either a string or an external
519 Operations on entities
521 @defun sgml-make-entity name type text
525 @defun sgml-entity-name entity
526 The name of the entity.
529 @defun sgml-entity-type entity
530 The type of the entity.
533 @defun sgml-entity-text entity
534 The text of the entity.
537 @defun sgml-entity-insert-text entity
538 Insert the text of the entity into the current buffer at point.
541 @defun sgml-entity-data-p entity
542 True if @var{entity} is a data entity, that is not of type @code{text}.
546 @deftp {Data type} entity-table
547 A table of entities that can be referenced by entity name.
550 @defun sgml-lookup-entity name entity-table
551 The entity with named @var{name} in the table @var{entity-table}. If no
552 such entity exists, @code{nil} is returned.
555 @defun sgml-entity-declare name entity-table type text
556 Create an entity from @var{name}, @var{type} and @var{text}; and enter
557 the entity into the table @var{entity-table}.
560 @defun sgml-entity-completion-table entity-table
561 Make a completion table from the @var{entity-table}.
564 @defun sgml-map-entities fn entity-table &optional collect
565 Apply the function @var{fn} to all entities in @var{entity-table}. If
566 @var{collect} is @code{t}, the results of the applications are collected
567 in a list and returned.
572 @c ------------------------------------------------------------------
573 @node Hooks, Implementation, Types, Top
574 @comment node-name, next, previous, up
577 @defvar sgml-open-element-hook
578 The hook run by @code{sgml-open-element}.
579 Theses functions are called with two arguments, the first argument is
580 the opened element and the second argument is the attribute specification
581 list. It is probably best not to refer to the content or the end-tag of
585 @defvar sgml-close-element-hook
586 The hook run by @code{sgml-close-element}. These functions are invoked
587 with @code{sgml-current-tree} bound to the element just parsed.
590 @defvar sgml-doctype-parsed-hook
591 This hook is called after the doctype has been parsed.
592 It can be used to load any additional information into the DTD structure.
595 @defvar sgml-sysid-resolve-functions
596 This variable should contain a list of functions.
597 Each function should take one argument, the system identifier of an entity.
598 If the function can handle that identifier, it should insert the text
599 of the entity into the current buffer at point and return t. If the
600 system identifier is not handled the function should return nil.
603 @defvar sgml-doctype-parsed-hook
604 This hook is caled after the doctype has been parsed.
605 It can be used to load any additional information into the DTD structure.
608 @defvar sgml-close-element-hook
610 The hook run by `sgml-close-element'.
611 These functions are invoked with `sgml-current-tree' bound to the
615 *** sgml-new-attribute-list-function
616 This hook is run when a new element is inserted to construct the
617 attribute specification list. The default function prompts for the
621 @c ------------------------------------------------------------------
622 @node Implementation, Index, Hooks, Top
623 @comment node-name, next, previous, up
624 @chapter Implementation notes
626 @section Data Types and Operations
628 @subsection Element Type
630 @deftp {Data type} eltype
631 Data type representing the information about an element type. An
632 @code{eltype} has information from @samp{ELEMENT} and @samp{ATTLIST}
633 declarations. It can also store data for the application.
636 The element types are symbols in a special oblist. The oblist is the
637 table of element types. The symbols name is the GI, its value is used
638 to store three flags and the function definition holds the content
639 model. Other information about the element type is stored on the
642 @defun sgml-eltype-name et
643 The name (a string) of the element type @var{et}.
646 @defun sgml-eltype-appdata et prop
647 Get application data from element type @var{et} with name @var{prop}.
648 @var{prop} should be a symbol, reserved names are: flags, model, attlist,
649 includes, excludes, conref-regexp, mixed, stag-optional, etag-optional.
651 This function can be used as a place in @code{setf}, @code{push} and
652 other functions from the CL library.
655 @defun sgml-eltype-all-miscdata eltype
656 A list of all data properties for eltype except for flags, model,
657 includes and excludes. This function filters the property list of
658 @var{eltype}. Used when saving the parsed DTD.
661 @defun sgml-eltype-set-all-miscdata eltype miscdata
662 Append the @var{miscdata} data properties to the properties of
666 @defun sgml-eltype-attlist et
667 The attribute specification list for the element type @var{et}.
670 @defun sgml-eltype-completion-table eltypes
671 Make a completion table from a list, @var{eltypes}, of element types.
674 @defun sgml-eltype-stag-optional et
675 True if the element type @var{et} has optional start-tag.
678 @defun sgml-eltype-etag-optional et
679 True if the element type @var{et} has optional end-tag.
682 @defun sgml-eltype-excludes et
683 The list of excluded element types for element type @var{et}.
686 @defun sgml-eltype-includes et
687 The list of included element types for element type @var{et}.
690 @defun sgml-eltype-flags et
691 Contains three flags as a number. The flags are stag-optional,
692 etag-optional and mixed.
695 @defun sgml-eltype-mixed et
696 True if element type @var{et} has mixed content.
699 @defun sgml-eltype-model et
700 The content model of element type @var{et}. The content model is either
701 the start state in the DFA for the content model or a symbol identifying
705 @defun sgml-eltype-shortmap et
706 The name of the shortmap associated with element type @var{et}. This
707 can also be the symbol @code{empty} (if declared with a @samp{<!USEMAP
708 gi #EMPTY>} or @code{nil} (if no associated map).
712 @defun sgml-eltype-token et
713 Return a token for the element type @var{et}.
716 @defun sgml-eltypes-in-state state tree
717 List of element types valid in @var{state} and @var{tree}.
723 The DTD data type is realised as a lisp vector using @code{defstruct}.
725 There are two additional fields for internal use: dependencies and
728 @defun sgml-dtd-dependencies dtd
729 The list of files used to create this DTD.
732 @defun sgml-dtd-merged dtd
733 The pair (@var{file} . @var{merged-dtd}), if the DTD has had a
734 precompiled dtd merged into it. @var{File} is the file containing the
735 compiled DTD and @var{merged-dtd} is the DTD loaded from that file.
739 @subsection Element and Tree
741 @deftp {Data Type} tree
742 This is the data type for the nodes in the tree build by the parser.
745 The tree nodes are represented as lisp vectors, using @code{defstruct}
746 to define basic operations.
748 The Element data type is a view of the tree built by the parser.
751 @section Parsing model
753 PSGML uses finite state machines and a stack to parse SGML. Every
754 element type has an associated DFA (deterministic finite automaton).
755 This DFA is constructed from the content model.
757 SGML restricts the allowed content models in such a way that it is
758 easy to directly construct a DFA.
760 To be able to determine when a start-tag can be omitted the DFA need to
761 contain some more information than the traditional DFA. In PSGML a DFA
762 has a set of states and two sets of edges. The edges are associated
763 with tokens (corresponding to SGML's primitive content tokens). I call
764 these moves. One set of moves, the @dfn{optional moves}, represents
765 optional tokens. I call the other set @dfn{required moves}. The
766 correspondence to SGML definitions are: if there is precisely one
767 required move from one state, then the associated token is required.
768 A state is final if there is not required move from that state.
770 The SGML construct @samp{(...&...&...)} (@dfn{AND-group}) is another
771 problem. There is a simple translation to sequence- and or-connectors.
772 For example @samp{(a & b & c)} is can be translated to:
775 ((a, ((c, b) | (b, c))) |
776 (b, ((a, c) | (c, a))) |
777 (c, ((a, b) | (b, a))) )
780 But this grows too fast to be of direct practical use. PSGML represents
781 an AND-group with one DFA for every (SGML) token in the group. During
782 parsing of an AND-group there is a pointer to a state in one of the
783 group's DFAs, and a list of the DFAs for the tokens not yet satisfied.
784 Most of this is hidden by the primitives for the state type. The parser
785 only sees states in a DFA and moves.
788 @section Entity manager
790 @defun sgml-push-to-entity entity &optional ref-start type
791 Set current buffer to a buffer containing the entity @var{entity}.
792 @var{entity} can also be a file name. Optional argument @var{ref-start}
793 should be the start point of the entity reference. Optional argument
794 @var{type}, overrides the entity type in entity look up.
798 @defun sgml-pop-entity
799 Should be called after a @code{sgml-push-to-entity} (or similar).
800 Restore the current buffer to the buffer that was current when the push
801 to this buffer was made.
804 @defun sgml-push-to-string string
805 Create an entity from @var{string} and push it on the top of the entity
806 stack. After this the current buffer will be a scratch buffer containing
807 the text of the new entity with point at the first character.
809 Use @code{sgml-pop-entity} to exit from this buffer.
815 @section Parser functions
818 This makes sure that the buffer has a DTD and set global variables
819 needed by parsing routines. One global variable is @code{sgml-dtd-info}
820 which contain the DTD (type dtd).
824 @defun sgml-parse-to goal &optional extra-cond quiet
825 This is the low level interface to the parser.
827 Parse until (at least) @var{goal}, a buffer position. Optional argument
828 @var{extra-cond} should be a function. This function is called in the
829 parser loop, and the loop is exited if the function returns t. If third
830 argument @var{quit} is non-@code{nil}, no "@samp{Parsing...}" message
835 @defun sgml-reparse-buffer shortref-fun
836 Reparse the buffer and let @var{shortref-fun} take care of short
837 references. @var{shortref-fun} is called with the entity as
838 argument and @code{sgml-markup-start} pointing to start of short
839 reference and point pointing to the end.
843 @section Saved DTD Format
847 S-expression --dependencies--,
852 S-expression --shortref maps--,
853 S-expression --notations--
855 Elements = Counted Sequence of S-expression --element type name--,
856 Counted Sequence of Element type description
858 File version = "(sgml-saved-dtd-version 5)
865 [10] --end of line marker--)*
867 Element type description = S-expression --Misc info--,
869 OF [0-7] --Flags 1:stag-opt, 2:etag-opt, 4:mixed--,
870 Content specification,
871 Token list --includes--,
872 Token list --excludes--
873 OF [128] --Flag undefined element--
875 Content specification = CASE
881 OF [128] --model follows--,
882 Model --nodes in the finite state automaton--
884 Model = Counted Sequence of Node
890 Normal State = Moves --moves for optional tokens--,
891 Moves --moves for required tokens--
893 Moves = Counted Sequence of (Token,
896 And Node = [255] --signals an AND node--,
897 Number --next state (node number)--,
898 Counted Sequence of Model --set of models--
900 Token = Number --index in list of elements--
903 OF [0-250] --Small number 0--250--
904 OF [251-255] --Big number, first octet--,
905 OCTET --Big number, second octet--
907 Token list = Counted Sequence of Token
909 Parameter entites = S-expression --internal representation of parameter entities--
911 General entities = S-expression --internal representation of general entities--
913 Document type name = S-expression --name of document type as a string--
917 Counted Sequence = Number_a --length of sequence--,
925 @c ------------------------------------------------------------------
926 @node Index, , Implementation, Top
927 @comment node-name, next, previous, up