lttoolbox/lttoolbox/lt-proc.1

   1 .TH lt-proc 1 2006-03-23 "" ""
   2 .SH NAME
   3 lt-proc \- This application is part of the lexical processing modules
   4 and tools (
   5 .B lttoolbox
   6 )
   7 .PP
   8 This tool is part of the apertium machine translation
   9 architecture: \fBhttp://apertium.sf.net\fR.
  10 .SH SYNOPSIS
  11 .B lt-proc
  12 [
  13 .B \-a \fR|
  14 .B \-g \fR|
  15 .B \-n \fR|
  16 .B \-d \fR|
  17 .B \-p \fR|
  18 .B \-s \fR|
  19 .B \-v \fR|
  20 .B \-h
  21 ] fst_file [input_file [output_file]]
  22 .PP
  23 .B lt-proc
  24 [
  25 .B \-\-analysis \fR|
  26 .B \-\-generation \fR|
  27 .B \-\-non-marked-gen \fR|
  28 .B \-\-debugged-gen \fR|
  29 .B \-\-post-generation \fR|
  30 .B \-\-sao \fR|
  31 .B \-\-version \fR|
  32 .B \-\-help
  33 ] fst_file [input_file [output_file]]
  34 .SH DESCRIPTION
  35 .BR lt-proc
  36 is the application responsible of providing the four lexical
  37 processing functionalities
  38
  39 .RS
  40 \(bu \fImorphological analyser\fR  ( option \fB\-a\fR )
  41 .PP
  42 \(bu \fImorphological generator\fR  ( option \fB\-g\fR )
  43 .PP
  44 \(bu \fImorphological generator\fR (without marks) ( option \fB\-n\fR )
  45 .PP
  46 \(bu \fImorphological generator\fR (with debugging information) ( option \fB\-d\fR )
  47 .PP
  48 \(bu \fIpost-generator\fR  ( option \fB\-p\fR )
  49 .RE
  50 \fR
  51 .PP
  52 It accomplishes these tasks by reading binary files containing a
  53 compact and efficient representation of dictionaries (a class of
  54 finite-state transducers called augmented letter transducers). These
  55 files are generated by \fBlt\-comp(1)\fR.
  56 .PP
  57 It is worth to mention that some characters
  58 (`\fB[\fR', `\fB]\fR', `\fB$\fR', `\fB^\fR', `\fB/\fR', `\fB+\fR') are
  59 \fIspecial\fR chars used for format and encapsulation. They should be
  60 escaped if they have to be used literally, for
  61 instance: `\fB[\fR'...`\fB]\fR' are ignored and the format of a
  62 \fIlinefeed\fR is `\fB^\fR...\fB$\fR'.
  63 .SH OPTIONS
  64 .TP
  65 .B \-a, \-\-analysis
  66 Tokenizes the text in surface forms (lexical units as they appear in
  67 texts) and delivers, for each surface form, one or more lexical forms
  68 consisting of lemma, lexical category and morphological inflection
  69 information. Tokenization is not straightforward due to the existence,
  70 on the one hand, of contractions, and, on the other hand, of
  71 multi-word lexical units. For contractions, the system reads in a
  72 single surface form and delivers the corresponding sequence of lexical
  73 forms. Multi-word surface forms are analysed in a left-to-right,
  74 longest-match fashion. Multi-word surface forms may be invariable
  75 (such as a multi-word preposition or conjunction) or inflected (for
  76 example, in es, \fI"echaban de menos"\fR, \(dqthey missed\(dq, is a
  77 form of the imperfect indicative tense of the verb \fI"echar de
  78 menos"\fR, \(dqto miss\(dq). Limited support for some kinds of
  79 discontinuous multi-word units is also available. Single-word surface
  80 forms analysis produces output like the one in these examples:
  81 \fI"cantar"\fR \-> `\fI^cantar/cantar<vblex><inf>$\fR' or
  82 \fI"cantaba"\fR \->
  83 `\fI^cantaba/cantar<vblex><pii><p1><sg>/cantar<vblex><pii><p3><sg>$\fR'.
  84 .TP
  85 .B \-g, \-\-generation
  86 Delivers a target-language surface form for each target-language
  87 lexical form, by suitably inflecting it.
  88 .TP
  89 .B \-n, \-\-non-marked-gen
  90 Morphological generation (like \fB-g\fR) but without unknown word
  91 marks (asterisk `*').
  92 .TP
  93 .B \-d, \-\-debugged-gen
  94 Morphological generation (like \fB-g\fR) but also with tags information
  95 for errors in the transfer and generator dictionaries.
  96 .TP
  97 .B \-p, \-\-post-generation
  98 Performs orthographical operations such as contractions and
  99 apostrophations. The post-generator is usually \fIdormant\fR (just
 100 copies the input to the output) until a special \fIalarm\fR symbol
 101 contained in some target-language surface forms \fIwakes\fR it up to
 102 perform a particular string transformation if necessary; then it goes
 103 back to sleep.
 104 .TP
 105 .B \-s, \-\-sao
 106 Input processing is in \fIorthoepikon\fR (previously `\fIsao\fR')
 107 annotation system format: \fBhttp://orthoepikon.sf.net\fR.
 108 .TP
 109 .B \-v, \-\-version
 110 Display the version number.
 111 .TP
 112 .B \-h, \-\-help
 113 Display this help.
 114 .SH FILES
 115 .B input_file
 116 The input compiled dictionary.
 117 .SH SEE ALSO
 118 .I lt-expand\fR(1),
 119 .I lt-comp\fR(1),
 120 .I apertium-tagger\fR(1),
 121 .I apertium-translator\fR(1),
 122 .I apertium\fR(1).
 123 .SH BUGS
 124 Lots of...lurking in the dark and waiting for you!
 125 .SH AUTHOR
 126 (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights
 127 reserved.