1 voc.txt was generated from a dump of the Lithuanian wikipedia by the script
2 wikipedia-most-common-words like so:
4 WikiExtractor.py ltwiki-latest-pages-articles.xml.bz2
5 cat text/*/*|./wikipedia-most-common-words 10 latin > voc.txt
7 The dump used was dated May 3rd 2018.
9 output.txt was generated from voc.txt by running it through the stemmer.
11 Wikipedia is licensed as: https://creativecommons.org/licenses/by-sa/3.0/