Commit Graph

  • 3abdc4d271 Added clarifying comment about the auxilliary verb for 'at være' Ivan Skytte Jørgensen 2017-12-05 13:48:53 +01:00
  • 91b7def617 More fixes for docreindex Ai Lin Chia 2017-12-05 12:49:52 +01:00
  • 5fdb14a066 word variations: also add 'allé' when seeing 'alle' (could be 'Bangs Allé') Ivan Skytte Jørgensen 2017-12-04 16:31:40 +01:00
  • 0ae306ae31 word variations: bigfix 'René' generation Ivan Skytte Jørgensen 2017-12-04 16:23:15 +01:00
  • 78c9382348 Add json tag response Ai Lin Chia 2017-12-04 15:22:42 +01:00
  • 1556e87fbe Add json response for get?page=1 Ai Lin Chia 2017-12-04 14:55:46 +01:00
  • 6fc94fbf2c Fix json response Ai Lin Chia 2017-12-04 14:55:27 +01:00
  • 1789782d6a Remove commented out code and unused variable Ai Lin Chia 2017-12-04 14:55:03 +01:00
  • a44144fe9c Fix string compare check Ai Lin Chia 2017-12-04 14:54:34 +01:00
  • 630a3c9a76 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-12-04 14:53:13 +01:00
  • 1c9b89a147 Changed QueryWord::opcode to a real enumeration Ivan Skytte Jørgensen 2017-12-04 14:53:03 +01:00
  • 011ca6ab2c Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-12-04 13:03:12 +01:00
  • 8a640c7693 Removed several now-unused PosdbTable fields Ivan Skytte Jørgensen 2017-12-04 13:02:59 +01:00
  • b77ccfa338 Removed unused range-based operators (gbmin/gbmax/gbsortbyfloat etc). Ivan Skytte Jørgensen 2017-12-04 12:59:05 +01:00
  • 0720bafd26 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-12-01 18:56:59 +01:00
  • b47884890d Removed write-only QueryTerm::m_ks Ivan Skytte Jørgensen 2017-12-01 18:52:46 +01:00
  • 2520a07fe5 Removed unused QueryWord::m_hardCount and QueryTerm::m_hardCount Ivan Skytte Jørgensen 2017-12-01 18:42:01 +01:00
  • 603c724d9e Removed QueryWord/Term::m_implicitBits/m_matchesExplicitBits/m_explicitBit and MAX_EXPLICIT_BITS Ivan Skytte Jørgensen 2017-12-01 18:35:21 +01:00
  • 706790e78c Add http interface to docprocess for ease of testing Ai Lin Chia 2017-12-01 17:59:55 +01:00
  • 4b19bac731 Remove unused function Ai Lin Chia 2017-12-01 17:59:03 +01:00
  • 5884516cd2 Check result of dynamic_cast Ai Lin Chia 2017-12-01 17:46:35 +01:00
  • 5f6a47ade7 Add more trace log for XmlDoc Ai Lin Chia 2017-12-01 17:45:52 +01:00
  • f8d61d5012 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-12-01 16:57:59 +01:00
  • 4da0a2ce61 Ensure extradoc URL does not exceed our URL size limit Ivan Skytte Jørgensen 2017-12-01 16:56:19 +01:00
  • ec896b0153 Removed suprefluous null check (coverity) Ivan Skytte Jørgensen 2017-12-01 16:50:03 +01:00
  • 68a7d01b7b Check return value from hashinit() Ivan Skytte Jørgensen 2017-12-01 16:45:32 +01:00
  • 9a5b52fa8a Removed superfluous null-check (coverity) Ivan Skytte Jørgensen 2017-12-01 16:41:47 +01:00
  • 5cf9308a20 word variations: danish: past<->past variations Ivan Skytte Jørgensen 2017-12-01 15:06:51 +01:00
  • c999e065ce more logging of word variations Ivan Skytte Jørgensen 2017-12-01 15:06:27 +01:00
  • 58400aa48b Support word variations generating single-word variations for bigrams Ivan Skytte Jørgensen 2017-12-01 14:17:32 +01:00
  • c930c625d1 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-12-01 12:45:25 +01:00
  • 55308013c3 Add overridable TitleRec version Ai Lin Chia 2017-12-01 12:28:31 +01:00
  • c1b760ddee Remove unused variable Ai Lin Chia 2017-12-01 12:24:35 +01:00
  • 79ba0d61cc Merge branch 'master' into dev-robots Ai Lin Chia 2017-12-01 11:40:02 +01:00
  • c50cfc9bef Initialize DocProcess::m_lastModifiedTime Ai Lin Chia 2017-12-01 11:30:01 +01:00
  • 2a40d28154 Fix DocRebuild Ai Lin Chia 2017-11-30 17:31:30 +01:00
  • b64b4f222a Make sure we actually replace titlerec with new version Ai Lin Chia 2017-11-30 16:31:17 +01:00
  • c469c802bb Better documentation/comments abotu word variatiosn in danish regardign accent-acute Ivan Skytte Jørgensen 2017-11-30 16:03:18 +01:00
  • aff4b46e9e fix typo in Parms.cpp Ivan Skytte Jørgensen 2017-11-30 15:52:13 +01:00
  • 729682c93f word variations: made verb spelling varaitions seprately configurable Ivan Skytte Jørgensen 2017-11-30 15:51:28 +01:00
  • d80d638b4a bugfix word variation: removed copy-pasta regarding bigram synonyms Ivan Skytte Jørgensen 2017-11-30 15:15:59 +01:00
  • 9be350d909 Revert "Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid)" Ai Lin Chia 2017-11-30 14:48:26 +01:00
  • 3a518868a9 Allow startup without *.sto files Ivan Skytte Jørgensen 2017-11-30 14:31:30 +01:00
  • 59dbe4aac5 word variations: danish: special cases for acute accent (René, ...allé, ...) Ivan Skytte Jørgensen 2017-11-30 14:16:13 +01:00
  • 4465c993e0 Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid) Ai Lin Chia 2017-11-30 14:01:53 +01:00
  • 413a9d8562 Only disallow doc that has noindex & nofollow (separate flow from noindex & follow) Ai Lin Chia 2017-11-30 13:59:54 +01:00
  • 7931ddfb83 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-30 13:35:52 +01:00
  • 6193d41bb7 Add more files to gitignore Ai Lin Chia 2017-11-30 12:35:47 +01:00
  • d63cd95416 Merge branch 'master' into dev-docprocess Ai Lin Chia 2017-11-30 12:06:27 +01:00
  • c9026de7cb Split out processDoc logic to processDocItem in each child class Ai Lin Chia 2017-11-30 11:28:04 +01:00
  • 1b03151a0b bugfix utf8Decode(): sequences longer than 2 bytes were treated as two-byte sequences Ivan Skytte Jørgensen 2017-11-29 16:09:06 +01:00
  • 8ac525b0f6 Fix segfault when an unknown docid is present Ai Lin Chia 2017-11-29 12:25:44 +01:00
  • 55c6d733e4 Code style changes & remove comments Ai Lin Chia 2017-11-28 16:51:32 +01:00
  • 8b84c8d80e Add logTrace to XmlDoc::loadFromOldTitleRec Ai Lin Chia 2017-11-28 16:13:56 +01:00
  • e52968acfa word variations: danish: strip é of acute accent Ivan Skytte Jørgensen 2017-11-28 15:58:50 +01:00
  • c33d8aba21 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-28 15:49:56 +01:00
  • 976f0f61a2 utf8Decode(): don't decode invalid utf8 Ivan Skytte Jørgensen 2017-11-28 15:49:41 +01:00
  • bce905e7bf Fill in spider request for DocProcess Ai Lin Chia 2017-11-28 15:41:10 +01:00
  • 5a6d741054 word variations: danish: bolle-å vs. double-a Ivan Skytte Jørgensen 2017-11-28 14:39:01 +01:00
  • 427a9ec778 word variations: Use lowercase when consulting STO lexicon Ivan Skytte Jørgensen 2017-11-28 14:07:28 +01:00
  • 3d5318c77a word variations seems to work now Ivan Skytte Jørgensen 2017-11-27 16:51:50 +01:00
  • 05cfb6381a Merge branch 'sto' of github.com:privacore/open-source-search-engine into sto Ivan Skytte Jørgensen 2017-11-27 16:39:50 +01:00
  • edc370bddc Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-27 16:39:30 +01:00
  • 0917a63eae Fix term weights when some earlier terms had no matches Ivan Skytte Jørgensen 2017-11-27 16:39:17 +01:00
  • b38cc63c1f Make sure we don't remove pending file before we're really done Ai Lin Chia 2017-11-27 14:45:57 +01:00
  • 688800fa3a Fix deadlock when shutting down Ai Lin Chia 2017-11-27 13:07:39 +01:00
  • e70af7050c Removed tmp. debug log Ivan Skytte Jørgensen 2017-11-27 13:01:44 +01:00
  • b59ce43479 Fix segfault when reload is called with wrong parameters Ai Lin Chia 2017-11-27 11:50:18 +01:00
  • 58e052d8a7 Initial implementation of DocProcess with DocDelete, DocRebuild, DocReindex Ai Lin Chia 2017-11-27 11:03:24 +01:00
  • 26e04ada57 word variations support: danish: noun: singular->plural and plural->singular Ivan Skytte Jørgensen 2017-11-24 17:45:32 +01:00
  • 019ef59de0 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-24 17:18:50 +01:00
  • b2d409100b Merge branch 'master' into sqlite Ivan Skytte Jørgensen 2017-11-24 17:18:37 +01:00
  • bea9662ba8 word variations support: danish: noun: definite->indefinite Ivan Skytte Jørgensen 2017-11-24 17:13:09 +01:00
  • 2c64859421 Made wordvarition stuff configurable. made wordvariation-da handle indefinite->definite variations Ivan Skytte Jørgensen 2017-11-24 16:43:19 +01:00
  • 007535e073 Fix unit test Ai Lin Chia 2017-11-24 15:31:25 +01:00
  • e0c474a6cd Add timeout to test steps Ai Lin Chia 2017-11-24 13:51:01 +01:00
  • 670ff8d9e4 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-24 14:34:47 +01:00
  • 978a986b0d Changed default argument to Query::set2() to non-default Ivan Skytte Jørgensen 2017-11-24 14:27:02 +01:00
  • 36fd2c2615 Made lang-spec word variations configurable Ivan Skytte Jørgensen 2017-11-24 14:19:50 +01:00
  • 0fd05add68 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-24 14:00:06 +01:00
  • 2ebfda46bc Removed Speller::test() Ivan Skytte Jørgensen 2017-11-24 13:59:00 +01:00
  • 7bac917df4 Renamed Query::*queryExapnsion to wiktionaryWordVariations Ivan Skytte Jørgensen 2017-11-24 13:52:11 +01:00
  • 5c1f58b0bd Renamed m_queryExpansion to m_wiktionaryWordVariations Ivan Skytte Jørgensen 2017-11-24 13:44:28 +01:00
  • 9a0ad484fb Moved config m_wiktionaryWordVariations from 'serach controls' to new page 'word variations' Ivan Skytte Jørgensen 2017-11-24 13:35:07 +01:00
  • c5cd8f21a1 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-24 12:47:26 +01:00
  • f9ee20cae9 Renamed m_queryExpansion to m_wiktionaryWordVariations Ivan Skytte Jørgensen 2017-11-24 12:47:11 +01:00
  • 1df27bd90a Split noindex to check nofollow as well Ai Lin Chia 2017-11-24 12:35:53 +01:00
  • 1472ffe999 Make sure we process all robots meta tags (if there are multiple) Ai Lin Chia 2017-11-24 12:26:46 +01:00
  • f438dea29e Filter word variations on threshold and duplication Ivan Skytte Jørgensen 2017-11-24 12:15:04 +01:00
  • ca35d36ab6 Experimental support for STO Danish word variations based on STO lexicon Ivan Skytte Jørgensen 2017-11-23 17:54:07 +01:00
  • cf22910ce5 Fix compilation error Ai Lin Chia 2017-11-23 17:04:54 +01:00
  • d7c9104ce2 More criteria for dump_unwanted tool Ai Lin Chia 2017-11-23 17:02:30 +01:00
  • 96521646fc Make sure we really don't index canonical url Ai Lin Chia 2017-11-23 16:48:34 +01:00
  • 19eb4a6a6e added todo Ivan Skytte Jørgensen 2017-11-23 16:11:35 +01:00
  • 4324191bdf Added word variation support (hardcoded test implementation for now) Ivan Skytte Jørgensen 2017-11-23 16:03:47 +01:00
  • 149e3d82f3 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-23 14:48:41 +01:00
  • 050b86f9db Made QueryWord::m_word const Ivan Skytte Jørgensen 2017-11-23 14:48:32 +01:00
  • 85aef1fabc SafeBuf::utf8Encode2(): const input Ivan Skytte Jørgensen 2017-11-23 14:47:35 +01:00
  • b46d381d40 Merge branch 'master' into sto Ivan Skytte Jørgensen 2017-11-23 14:10:02 +01:00
  • 352de0328b const Ivan Skytte Jørgensen 2017-11-23 14:09:52 +01:00