privacore-open-source-searc.../tokenizer
2018-07-26 17:31:36 +02:00
..
.gitignore tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
ligature_decomposition.cpp work-in-progress: new tokenizer 2018-03-01 16:38:19 +01:00
ligature_decomposition.h work-in-progress: new tokenizer 2018-03-01 16:38:19 +01:00
Makefile Use same optimization in unicode+tokenizer as in the main executable 2018-03-26 16:58:25 +02:00
tokenizer2.cpp bugfix tokenizer posessive-s fix was refering to possible reallocated memory 2018-07-26 15:12:57 +02:00
tokenizer3.cpp tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
tokenizer4.cpp tokenizer compilation fix 2018-07-26 17:31:36 +02:00
tokenizer5.cpp Updated superscript/supscript html handling 2018-03-06 11:38:00 +01:00
tokenizer_unittest.cpp tokenizer: combining mark removal for Italian 2018-06-13 16:56:59 +02:00
tokenizer_util.cpp Updated superscript/supscript html handling 2018-03-06 11:38:00 +01:00
tokenizer_util.h Updated superscript/supscript html handling 2018-03-06 11:38:00 +01:00
tokenizer.cpp tokenizer: keep track of whether a token is from phase 1 or phase 2 2018-03-20 14:37:56 +01:00
tokenizer.h tokenizer: Handle slash-abbrevations as a single token 2018-04-03 16:00:55 +02:00
xml_tokenizer_unittest.cpp Fixed soft-hypen unittest in Xml tokenizer 2018-03-19 15:32:30 +01:00