Commit Graph

  • 01a780f85a Wiki: removed last use of Words Ivan Skytte Jørgensen 2018-03-09 16:35:31 +01:00
  • 5ad589527c XmlDoc: removed last use of Words Ivan Skytte Jørgensen 2018-03-09 16:32:54 +01:00
  • 3fd158c397 tokenizer: first shot at somethign that appears to work Ivan Skytte Jørgensen 2018-03-09 16:24:39 +01:00
  • 94f4964648 Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-09 14:58:47 +01:00
  • 953e59a8eb Title: flags&0x01 was not used. Ivan Skytte Jørgensen 2018-03-09 14:58:21 +01:00
  • 45c7602724 Use StackBuf<> in Title.cpP Ivan Skytte Jørgensen 2018-03-09 14:50:07 +01:00
  • da70112bf6 Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-09 13:53:43 +01:00
  • 90e00d35d7 More trace info inPrahses.cpp Ivan Skytte Jørgensen 2018-03-09 13:53:34 +01:00
  • 8a607068ba Merge branch 'master' into dev-language Ai Lin Chia 2018-03-09 10:25:51 +01:00
  • 14d5397768 Suppress more clang++ warnings Ai Lin Chia 2018-03-09 10:25:14 +01:00
  • 773c6cbcfc Cater for languages not in gb (eg: nynorsk/bokmaal) Ai Lin Chia 2018-03-09 10:24:22 +01:00
  • 27b8e2b038 Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-08 18:14:50 +01:00
  • 17583f3860 Cleanup in Synonyms.* Ivan Skytte Jørgensen 2018-03-08 18:13:58 +01:00
  • 3cf1927b72 Made Synonyms::m_ptr private Ivan Skytte Jørgensen 2018-03-08 18:03:45 +01:00
  • 3571ce3220 Eliminated Synonyms::m_words Ivan Skytte Jørgensen 2018-03-08 17:58:07 +01:00
  • e85c93fb53 Fix clang++ warning: ISO C++11 does not allow conversion from string literal to 'char *' Ai Lin Chia 2018-03-07 15:54:03 +01:00
  • 0214e2395c Modify logging Ai Lin Chia 2018-03-06 15:25:51 +01:00
  • c3d9c60826 Merge branch 'tokenizer' of github.com:privacore/open-source-search-engine into tokenizer Ivan Skytte Jørgensen 2018-03-06 14:19:52 +01:00
  • 51f35aa2e8 Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-06 14:19:37 +01:00
  • d6f0ac060f Simplified if() in Summary by removing superfluous condition Ivan Skytte Jørgensen 2018-03-06 14:19:18 +01:00
  • 7bef1ee08b Updated superscript/supscript html handling Ivan Skytte Jørgensen 2018-03-06 11:38:00 +01:00
  • 478637da6a tokenizer: Preliminary work on html sub/superscript handling Ivan Skytte Jørgensen 2018-03-06 00:54:24 +01:00
  • 6fcaa13fcd Fix compilation error Ai Lin Chia 2018-03-05 17:47:34 +01:00
  • 6547e0eba9 Get query language externally Ai Lin Chia 2018-03-05 17:47:12 +01:00
  • ef9bd1f417 Change default port for query language to 8078 Ai Lin Chia 2018-03-05 17:30:32 +01:00
  • b6496af6bd Add bestEffort flag Ai Lin Chia 2018-03-05 17:29:48 +01:00
  • 3460b6b225 Add more logs Ai Lin Chia 2018-03-05 17:29:14 +01:00
  • cd92aad2b1 Merge branch 'master' into dev-language Ai Lin Chia 2018-03-05 15:46:32 +01:00
  • f8a93fc70b Fix gzip_id2 check Ai Lin Chia 2018-03-05 15:09:46 +01:00
  • 1a565404c8 Use m_format directly instead of tmpFormat Ai Lin Chia 2018-03-05 13:48:55 +01:00
  • e7a5a14bd5 Remove unused variable, use/set once variable, commented out code Ai Lin Chia 2018-03-05 12:58:00 +01:00
  • db3d15d99a Remove use of tmp variable that is only used once Ai Lin Chia 2018-03-05 12:07:24 +01:00
  • 7512171dc7 Move include to top of file Ai Lin Chia 2018-03-05 12:07:06 +01:00
  • bd8d98b9f1 Merge branch 'master' into dev-language Ai Lin Chia 2018-03-05 11:31:18 +01:00
  • 9000deffd8 Remove unused variable Ai Lin Chia 2018-03-05 11:30:39 +01:00
  • 7e35793974 tokenizer: Added another TODO Ivan Skytte Jørgensen 2018-03-04 19:47:56 +01:00
  • 8c2c896475 Added two TODOs Ivan Skytte Jørgensen 2018-03-04 01:42:45 +01:00
  • c980a5ba20 tokenizer: added phase-2 xml tokenizer (incomplete, but works) Ivan Skytte Jørgensen 2018-03-02 19:00:54 +01:00
  • 9591eb7978 tokenizer: phase-1 XML tokenizer added + tests Ivan Skytte Jørgensen 2018-03-02 18:46:03 +01:00
  • 34197dd1f6 More const in Xml Ivan Skytte Jørgensen 2018-03-02 18:01:52 +01:00
  • f494ee4edf Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-02 17:36:43 +01:00
  • e9d8ad8117 MOved BACKBIT/BACKBITCOMP from Words.h to nodeid_t.h Ivan Skytte Jørgensen 2018-03-02 17:36:31 +01:00
  • 41bb518844 tokenizer: handle superscript/subscript Ivan Skytte Jørgensen 2018-03-02 16:22:16 +01:00
  • 783a0470f9 Fix logging so we can differentiate between different clients Ai Lin Chia 2018-03-02 16:01:23 +01:00
  • d763c70812 Initial implementation of QueryLanguage client Ai Lin Chia 2018-03-02 15:54:48 +01:00
  • 53483298d0 Code style changes Ai Lin Chia 2018-03-02 15:34:24 +01:00
  • 2b9e76a009 Changed log message & add/remove comments Ai Lin Chia 2018-03-02 15:33:22 +01:00
  • fbb51086b6 Merge branch 'master' into dev-language Ai Lin Chia 2018-03-02 15:29:11 +01:00
  • cb7c8e4bc5 tokenizer: minor bugfix of telephone number recognition Ivan Skytte Jørgensen 2018-03-02 15:13:39 +01:00
  • 329127d060 Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-03-02 14:57:24 +01:00
  • 1fd9093792 tokenizer: recognize phone numbers Ivan Skytte Jørgensen 2018-03-02 14:56:14 +01:00
  • bfecac92c8 Remove no-op SearchInput::m_queryCharset Ai Lin Chia 2018-03-02 14:43:12 +01:00
  • 9b4bb76813 Rename Query::set2 to Query::set Ai Lin Chia 2018-03-02 14:09:16 +01:00
  • 3e10232096 tokenizer: handle word hyphenation Ivan Skytte Jørgensen 2018-03-02 14:04:11 +01:00
  • 655f5fd0bc Fix possessive-s non-apostrophes Ivan Skytte Jørgensen 2018-03-01 18:17:56 +01:00
  • f66fe3ec40 minor stuff Ivan Skytte Jørgensen 2018-03-01 18:17:30 +01:00
  • 530502722f work-in-progress: new tokenizer Ivan Skytte Jørgensen 2018-03-01 16:38:19 +01:00
  • 8bb2f39837 combining-mark decomposition+composition Ivan Skytte Jørgensen 2018-03-01 16:18:51 +01:00
  • 4cc7c8dbd3 Handle multi-layer decompositions when generating unicode_combining_mark_decomposition.dat Ivan Skytte Jørgensen 2018-03-01 16:06:08 +01:00
  • d8a27c7948 Fix clang++ warning: unused exception parameter 'e' Ai Lin Chia 2018-02-28 11:58:38 +01:00
  • 38558a690e Fix clang++ warning: 'FxClient' has virtual functions but non-virtual destructor Ai Lin Chia 2018-02-28 11:58:01 +01:00
  • 4868783d25 Merge branch 'master' into dev-language Ai Lin Chia 2018-02-28 11:46:31 +01:00
  • 464e8b99d7 Fix clang++ warning: not a Doxygen trailing comment Ai Lin Chia 2018-02-28 11:45:05 +01:00
  • bd588d63c1 Fix clang++ warning: private field 'm_docIdSplitNumber' is not used Ai Lin Chia 2018-02-28 11:42:10 +01:00
  • 4a39f1c22c Fix clang++ warning: no newline at end of file Ai Lin Chia 2018-02-28 11:40:20 +01:00
  • 9d4f0f5b32 Fix clang++ warning: comparison of constant -1 with expression of type 'parameter_object_type_t' is always false Ai Lin Chia 2018-02-28 11:39:58 +01:00
  • b76d46db33 Fix clang++ warning: 'ResultOverride' defined as a class here but previously declared as a struct Ai Lin Chia 2018-02-28 11:30:06 +01:00
  • b5a3499c4f Fix clang++ warning: function 'signature_verification_failed' could be declared with attribute 'noreturn' Ai Lin Chia 2018-02-28 11:26:38 +01:00
  • 3f14127783 Fix clang++ warning: no previous extern declaration for non-static variable 'g_TermCheckList' Ai Lin Chia 2018-02-28 11:07:23 +01:00
  • b39e925d82 Fix clang++ warning: no newline at end of file Ai Lin Chia 2018-02-28 11:06:20 +01:00
  • bcbc7ba79c Fix clang++ warning: comparison of constant 139 with expression of type 'char' is always false Ai Lin Chia 2018-02-28 11:00:54 +01:00
  • a84010913a Fix clang++ warning: unannotated fall-through between switch labels Ai Lin Chia 2018-02-28 10:59:20 +01:00
  • 4a92c79f7e Fix clang++ warning: xyz has virtual functions but non-virtual destructor Ai Lin Chia 2018-02-28 10:56:41 +01:00
  • cd784bcde9 Fix clang++ warning: assigning field to itself Ai Lin Chia 2018-02-28 10:53:53 +01:00
  • 1553f1ba5f Split out generic server communication from UrlRealtimeClassification into FxClient Ai Lin Chia 2018-02-27 15:24:18 +01:00
  • cc8322899f Merge branch 'master' into tokenizer Ivan Skytte Jørgensen 2018-02-27 14:50:59 +01:00
  • f967d9d428 Split out nodeid_t typedef to separate include file Ivan Skytte Jørgensen 2018-02-27 14:50:28 +01:00
  • ca5b5fcd1d Merge branch 'master' into dev-language Ai Lin Chia 2018-02-27 11:59:12 +01:00
  • 8e09a0efc7 Add pathpartial to UrlMatchList Ai Lin Chia 2018-02-27 11:54:54 +01:00
  • 0f95410032 Add pathparam to UrlMatchList Ai Lin Chia 2018-02-27 11:48:16 +01:00
  • 4fb49d608c Make unicode_is_ignorable.dat Ivan Skytte Jørgensen 2018-02-26 17:44:30 +01:00
  • 2e5ecde0c3 Added unicode decomposition map for combining-mark removal Ivan Skytte Jørgensen 2018-02-26 15:46:22 +01:00
  • dfa252b12b Added .gitignore file to unicide/ subdir Ivan Skytte Jørgensen 2018-02-26 15:43:18 +01:00
  • ee38537667 More constness in matches and Title Ivan Skytte Jørgensen 2018-02-26 14:43:05 +01:00
  • 995bd58113 More const in Matches and Title Ivan Skytte Jørgensen 2018-02-26 14:31:30 +01:00
  • 54d43fcb8b Removed unnecessary casts in hash.h Ivan Skytte Jørgensen 2018-02-26 14:23:17 +01:00
  • ef67e96ff9 Moved getCharacterLanguage() and getFieldValue() from Words.cpp to XmlDoc where their only use is Ivan Skytte Jørgensen 2018-02-26 13:44:37 +01:00
  • 40b520382b Removed stop-wrod methods from Words class and put it into the only two callers instead Ivan Skytte Jørgensen 2018-02-26 13:33:26 +01:00
  • df43b5b247 Changed Words::isUpper() to call is_upper_utf8_string() Ivan Skytte Jørgensen 2018-02-26 13:23:42 +01:00
  • 1bcbe8f127 Removed unused #define Ivan Skytte Jørgensen 2018-02-26 13:21:29 +01:00
  • afb00a9022 unicode: updated ucdata/unicode_wordchars.dat Ivan Skytte Jørgensen 2018-02-26 12:12:13 +01:00
  • ed0b8fd0ae unicode: wordchars include superscript/subscript and other numbers Ivan Skytte Jørgensen 2018-02-25 17:16:08 +01:00
  • 9147b36f7b Merge branch 'master' into dev-language Ai Lin Chia 2018-02-23 16:28:01 +01:00
  • e491f3800d Fix commit 6f4ad67e05 Ai Lin Chia 2018-02-23 14:22:53 +01:00
  • 1b1ef87049 Renamed m_firstSent -> m_firstSentence and m_nextSent -> m_nextSentence Ivan Skytte Jørgensen 2018-02-23 13:53:03 +01:00
  • 796dd9d1a1 Remove leftover code from tagdb cache (now completely removed) Ai Lin Chia 2018-02-23 12:04:35 +01:00
  • 6f4ad67e05 Use SiteGetter instead of using host Ai Lin Chia 2018-02-23 12:00:52 +01:00
  • 2e32ba7517 Add timing log for linkinfo Ai Lin Chia 2018-02-23 11:40:13 +01:00
  • e8f23fb250 Remove unused/set only variables Ai Lin Chia 2018-02-23 10:50:35 +01:00
  • 63ee2e897d Fix compilation error from fe95e110d5 Ai Lin Chia 2018-02-23 09:52:42 +01:00