descriptionnone
homepage URLhttp://www.clsp.jhu.edu/workshops/ws10/groups/msgismt/
ownerws10smt@clsp.jhu.edu
last changeTue, 22 Jun 2010 04:28:25 +0000 (22 00:28 -0400)
content tags
add:
README
cdec is a fast decoder.

SPEED COMPARISON
------------------------------------------------------------------------------

Here is a comparison with a couple of other decoders doing SCFG decoding:

  Decoder  Lang.   BLEU    Run-Time        Memory
  cdec     c++     31.47   0.37 sec/sent   1.0-1.1GB
  Joshua   Java    31.55   2.34 sec/sent   4.0-4.8GB
  Hiero    Python  31.22   27.2 sec/sent   1.7-1.9GB

The maximum number of pops from candidate heap at each node is k=30, no other
pruning, 3gm LM, Chinese-English translation task.


GETTING STARTED
------------------------------------------------------------------------------

See the BUILDING file for instructions on how to build the software.  To
explore the decoder's features, the best way to get started is to look
at cdec's command line options or to have a look at the test cases in
the tests/system_tests/ directory.  Each of these can be run with a command
like ./cdec -c cdec.ini -i input.txt -w weights .  The files should be
self explanatory.


EXTRACTING A SYNCHRONOUS GRAMMAR / PHRASE TABLE
------------------------------------------------------------------------------
cdec does not include code for generating grammars.  To build these, you will
need to write your own software or use an existing package like Joshua, Hiero,
or Moses.


OPTIMIZING / TRAINING MODELS
------------------------------------------------------------------------------
cdec does include code for optimizing models, according to a number of
training criteria, including training models as CRFs (with latent derivation
variables), MERT (over hypergraphs) to opimize BLEU, TER, etc.

Eventually, I will provide documentation for this.


ALIGNMENT / SYNCHRONOUS PARSING / CONSTRAINED DECODING
------------------------------------------------------------------------------
cdec can be used as an aligner.  For examples, see the test cases.


COPYRIGHT AND LICENSE
------------------------------------------------------------------------------
Copyright (c) 2009 by Chris Dyer <redpony@gmail.com>

See the file LICENSE.txt for the licensing terms that this software is
released under. This software also includes the file m4/boost.m4 which is
licensed under the LGPL v3, for more information refer to the comments
in that file.
shortlog
2010-06-22 Adam Lopeztestmaster
2010-06-15 chris dyeraccidental commit
2010-06-15 Chris DyerMerge branch 'master' of github.com:redpony/cdec
2010-06-15 Chris Dyerscore_grammar fixes
2010-06-15 chris dyerflag for counters
2010-06-15 chris dyercheck for existence of test set in filter script
2010-06-14 Phil BlunsomAdded function for printing phrases spans for each...
2010-06-04 Vladimir Eidelmanupdated README and header of filter and scoring
2010-05-28 Chris Dyermerge vlad's filtering code
2010-05-28 Vladimir Eidelmangrammar scoring and filtering
2010-05-27 Chris DyerMerge branch 'master' of github.com:redpony/cdec
2010-05-27 Chris Dyerhandle rule contexts
2010-05-27 Chris Dyercontext extractor with example
2010-05-20 chris dyerfix nasty sorting bug in dist-vest.pl, couple other...
2010-05-18 Chris DyerMerge branch 'master' of github.com:redpony/cdec
2010-05-18 Chris Dyersupport unidirectional extraction
...
heads
13 years ago master