README

   1
   2   sparse (spärs), adj,., spars-er, spars-est.
   3         1. thinly scattered or distributed; "a sparse population"
   4         2. thin; not thick or dense: "sparse hair"
   5         3. scanty; meager.
   6         4. semantic parse
   7         [ from Latin: spars(us) scattered, past participle of
   8           spargere 'to sparge' ]
   9
  10         Antonym: abundant
  11
  12 Sparse is a semantic parser of source files: it's neither a compiler
  13 (although it could be used as a front-end for one) nor is it a
  14 preprocessor (although it contains as a part of it a preprocessing
  15 phase).
  16
  17 It is meant to be a small - and simple - library.  Scanty and meager,
  18 and partly because of that easy to use.  It has one mission in life:
  19 create a semantic parse tree for some arbitrary user for further
  20 analysis.  It's not a tokenizer, nor is it some generic context-free
  21 parser.  In fact, context (semantics) is what it's all about - figuring
  22 out not just what the grouping of tokens are, but what the _types_ are
  23 that the grouping implies.
  24
  25 And no, it doesn't use lex and yacc (or flex and bison).  In my personal
  26 opinion, the result of using lex/yacc tends to end up just having to
  27 fight the assumptions the tools make.
  28
  29 The parsing is done in four phases:
  30
  31  - full-file tokenization
  32  - pre-processing (which can cause another tokenization phase of another
  33    file)
  34  - semantic parsing.
  35  - lazy type evaluation
  36
  37 Note the "full file" part. Partly for efficiency, but mostly for ease of
  38 use, there are no "partial results". The library completely parses one
  39 whole source file, and builds up the _complete_ parse tree in memory.
  40
  41 Also note the "lazy" in the type evaluation.  The semantic parsing
  42 itself will know which symbols are typedefines (required for parsing C
  43 correctly), but it will not have calculated what the details of the
  44 different types are.  That will be done only on demand, as the back-end
  45 requires the information.
  46
  47 This means that a user of the library will literally just need to do
  48
  49         struct token *token;
  50         int fd = open(filename, O_RDONLY);
  51         struct symbol_list *list = NULL;
  52
  53         if (fd < 0)
  54                 exit_with_complaint();
  55
  56         // Initialize parse symbols
  57         init_symbols();
  58
  59         // Tokenize the input stream
  60         token = tokenize(filename, fd, NULL);
  61
  62         // Pre-process the stream
  63         token = preprocess(token);
  64
  65         // Parse the resulting C code
  66         translation_unit(token, &list);
  67
  68         // Evaluate the types now if we want to
  69         // Or leave it until later.
  70         symbol_iterate(list, evaluate_symbol, NULL);
  71
  72 and he is now done - having a full C parse of the file he opened.  The
  73 library doesn't need any more setup, and once done does not impose any
  74 more requirements.  The user is free to do whatever he wants with the
  75 parse tree that got built up, and needs not worry about the library ever
  76 again.  There is no extra state, there are no parser callbacks, there is
  77 only the parse tree that is described by the header files.
  78
  79 The library also contains (as an example user) a few clients that do the
  80 preprocessing, parsing and type evaluation and just print out the
  81 results.  These clients were done to verify and debug the library, and
  82 also as trivial examples of what you can do with the parse tree once it
  83 is formed, so that users can see how the tree is organized.