Commit Graph

  • 3cee3d0923 Fix RdbBucketsTest seg fault (missing return statements), call pytest instead of py.test in system test Makefile master Zachary D. Rowitsch 2023-12-03 23:55:58 -0500
  • d21bf841ab Fix BitsTest to use a getter method instead of trying to access the private class member Zachary D. Rowitsch 2023-12-03 19:52:45 -0500
  • d3f98f87de Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) Zachary D. Rowitsch 2023-12-03 19:52:06 -0500
  • ea443ad739 Update generate_entities.py to run in python 3 Zachary D. Rowitsch 2023-12-03 19:51:02 -0500
  • dac28de673
    Update README.md br-bitberry 2018-11-08 13:40:03 +0100
  • 86f0ca70a5 Revert "Fixed long time bug regarding startup on clean DB" Ivan Skytte Jørgensen 2018-10-12 15:39:30 +0200
  • d9a05046ac Fixed long time bug regarding startup on clean DB Ivan Skytte Jørgensen 2018-10-12 13:51:40 +0200
  • 6e30dee3d3 Disable gcc deprecation warnings in places where we can't don anything about it Ivan Skytte Jørgensen 2018-10-09 15:41:00 +0200
  • ec09ad81be Use openssl's ERR_remove_thread_state() or nothing at all depending on version Ivan Skytte Jørgensen 2018-10-09 15:40:23 +0200
  • 5d5c31351e Size file/pathbuffers correctly Ivan Skytte Jørgensen 2018-10-09 15:39:31 +0200
  • c1ecafc6be Made Profiler.cpp compile with newer glibc Ivan Skytte Jørgensen 2018-10-09 14:46:21 +0200
  • a9b55147fb Removed obsolete comment about gigabits Ivan Skytte Jørgensen 2018-10-04 15:41:24 +0200
  • e6ffafaef8 Select best of og:title, og_site_name, <title> and meta title Ivan Skytte Jørgensen 2018-09-06 16:52:15 +0200
  • a4f2f59c0c Fix typos in comment Ivan Skytte Jørgensen 2018-09-06 14:42:00 +0200
  • 5d671061a3 bugfix 183b99c925: Bits::setInLinkBits() was coredumping Ivan Skytte Jørgensen 2018-09-06 13:11:24 +0200
  • 183b99c925 const for Bits:.setInLinkBits() Ivan Skytte Jørgensen 2018-09-04 16:18:29 +0200
  • 23607d7afe Changed wbit_t from uint32_t to uint8_t Ivan Skytte Jørgensen 2018-09-04 16:08:46 +0200
  • 990d84fb7e Clean up Bits class Ivan Skytte Jørgensen 2018-09-04 15:17:30 +0200
  • 11e3037110 Closed off Bits member access so we know what can be changed freely Ivan Skytte Jørgensen 2018-09-04 14:50:38 +0200
  • 0e92640cf1 Detect a few cases when generating bigram across '.' shouldn't be done Ivan Skytte Jørgensen 2018-09-04 13:05:24 +0200
  • 3af0f2a971 Optimized avoid-cookies-warning in summary generation Ivan Skytte Jørgensen 2018-08-31 16:11:29 +0200
  • 62b091b8ca Support dynamic TLD list Ivan Skytte Jørgensen 2018-08-31 13:32:06 +0200
  • 3a0c267cf7 Call initializeDomains() after g_conf has been initialized (need working-dir) Ivan Skytte Jørgensen 2018-08-31 13:30:50 +0200
  • 6306057a69 Made a const version of getDomainOfIp() to avoid const casts elsewhere Ivan Skytte Jørgensen 2018-08-30 14:34:57 +0200
  • 995c82a1d4 Copied Url::set() into Docid.cpp Ivan Skytte Jørgensen 2018-08-30 14:15:13 +0200
  • a28cea7250 More unittest for domain/TLD/url Ivan Skytte Jørgensen 2018-08-30 13:41:38 +0200
  • 835bef157f Fix compliation of filter_titledb Ivan Skytte Jørgensen 2018-08-28 15:09:12 +0200
  • 16f3fd001d Fix compliation of validate_rdbindex Ivan Skytte Jørgensen 2018-08-28 15:09:00 +0200
  • 7a3eace43f fix compilation of print_urlinfo Ivan Skytte Jørgensen 2018-08-28 15:05:41 +0200
  • 95005b0877 Fix compilation of dump_wordcount Ivan Skytte Jørgensen 2018-08-28 15:02:27 +0200
  • a79135e861 Fix compilation of clean_url.cpp and dump_unwanted.cpp Ivan Skytte Jørgensen 2018-08-28 14:58:25 +0200
  • 202d3f77cd #include cleanup of Titledb.h Ivan Skytte Jørgensen 2018-08-27 14:37:55 +0200
  • 23fc5d0e23 Moved Titledb::...ProbableDocId... methods to separate namespace Ivan Skytte Jørgensen 2018-08-27 14:03:00 +0200
  • e4217c584d Disabled cookie-warning check in defaultsummary due to performance problem Ivan Skytte Jørgensen 2018-08-31 12:10:55 +0200
  • e4b643221e Removed non-functional ban-all-these-domains functionality in PageResults Ivan Skytte Jørgensen 2018-08-30 16:49:50 +0200
  • cdf6f51fbf Removed udplicated cookie-warnign excerpts Ivan Skytte Jørgensen 2018-08-30 16:10:40 +0200
  • f176986cb7 Removed bogus/obsolete comments from Domains.* Ivan Skytte Jørgensen 2018-08-27 15:39:47 +0200
  • db43287908 Removed unused Url::getIp() method Ivan Skytte Jørgensen 2018-08-27 14:35:30 +0200
  • 2012f06911 Msg22: use Titledb::getProbablyDocid(Url) Ivan Skytte Jørgensen 2018-08-24 16:00:54 +0200
  • 8b335971fa Removed non-const pointer-returning methods from Url class Ivan Skytte Jørgensen 2018-08-24 13:37:43 +0200
  • e90f690c0d Removed non-const version of Url::getHost() Ivan Skytte Jørgensen 2018-08-24 13:13:09 +0200
  • d14ecfc86a Removed unused method Titledb::isLocal(Url*) Ivan Skytte Jørgensen 2018-08-24 12:38:33 +0200
  • 262d1645a6 Added w more start-up tests for Url::getDomain() and getDomFast() Ivan Skytte Jørgensen 2018-08-23 15:44:26 +0200
  • 3402f4885e Removed obsolete comments from Titledb.cpp Ivan Skytte Jørgensen 2018-08-23 15:32:37 +0200
  • f0052e3528 Added forgotten change to Makefile (getProbableDocIdTest.o) Ivan Skytte Jørgensen 2018-08-23 15:29:48 +0200
  • 0a03703055 Added unittest for Titledb::getProbableDocId() Ivan Skytte Jørgensen 2018-08-23 12:38:49 +0200
  • 0b82af748f Workaround for renamed RdbBase files while Msg3 job is queued Ivan Skytte Jørgensen 2018-08-20 15:59:29 +0200
  • acf4ad07ea Reverted back to static list of 1st/2nd/3rd-level TLDs Ivan Skytte Jørgensen 2018-08-16 13:40:01 +0200
  • e383fdc5d9 added missing tokenizer include dir so we can build tools again.. Brian Rasmusson 2018-08-11 12:36:42 +0200
  • c39cab4ea0 added missing utf8_fast.h Brian Rasmusson 2018-08-09 20:51:23 +0200
  • bd094bb002 #include cleanup in fctypes.* Ivan Skytte Jørgensen 2018-08-07 15:44:56 +0200
  • a3c725cc19 Removed freestanding dequote() Ivan Skytte Jørgensen 2018-08-07 15:19:40 +0200
  • 9603adbacd Removed 'strip' parameter from stripHtml() Ivan Skytte Jørgensen 2018-08-07 14:59:12 +0200
  • 23edbd55d6 Better comments on gettimeofdayInMilliseconds() and getTime() Ivan Skytte Jørgensen 2018-08-07 14:47:08 +0200
  • e515e92dae Removed local/global time distinction Ivan Skytte Jørgensen 2018-08-07 14:38:37 +0200
  • dd9602333d Removed unused get...Time... functions Ivan Skytte Jørgensen 2018-08-07 14:24:48 +0200
  • 7b9f77db7d Avoid generating sumamries that are actualyl cookie warnings Ivan Skytte Jørgensen 2018-08-06 16:17:28 +0200
  • 7dced7c45c Removed leftover comment-markers and @@-logging from lemma branch Ivan Skytte Jørgensen 2018-08-03 14:29:15 +0200
  • a6cabd4ea1 word variation: don't eliminate duplicates if they are for different base words (or base word instances) Ivan Skytte Jørgensen 2018-08-03 14:26:39 +0200
  • 0004941414 Merge duplicated synonyms in query (eg. if wiktionary wordvartions and lemma produce the same synonyms of a query word Ivan Skytte Jørgensen 2018-08-03 14:20:11 +0200
  • 7a67a8049e word variations: identify genitive of street names Ivan Skytte Jørgensen 2018-08-03 13:20:29 +0200
  • 8853a156ac bugfix/workaround for bigram hashes Ivan Skytte Jørgensen 2018-08-02 13:14:44 +0200
  • c1de250ddc Reject summaries that contain some html junk Ivan Skytte Jørgensen 2018-07-31 13:20:14 +0200
  • e80b24ddfc Fix use-after-realloc bug in PageRoot Ivan Skytte Jørgensen 2018-07-30 15:32:44 +0200
  • 8234d4232f Fixed compilation of unittests Ivan Skytte Jørgensen 2018-07-30 14:37:24 +0200
  • f6e8d992ae Moved serialization/deserialization functions to separate file Ivan Skytte Jørgensen 2018-07-30 13:07:02 +0200
  • 268ae31095 Removed unused function saftenTags() Ivan Skytte Jørgensen 2018-07-30 12:43:17 +0200
  • 86c12d668d Made setInjectionRequestFromParms() local to PageInject.cpp Ivan Skytte Jørgensen 2018-07-30 11:12:04 +0200
  • 11b18d9c68 Removed unused function is_urlchar() Ivan Skytte Jørgensen 2018-07-30 10:36:15 +0200
  • bf2c274c02 Moved getNumWords() to only module that uses it Ivan Skytte Jørgensen 2018-07-30 10:35:21 +0200
  • 6fae3a6f77 Update urlmatchlist.txt.example Ai Lin Chia 2018-07-26 17:44:05 +0200
  • e3840cb100 Remove unused UrlMatchCriterias Ai Lin Chia 2018-07-26 17:38:03 +0200
  • 43ee356e10 tokenizer compilation fix Ivan Skytte Jørgensen 2018-07-26 17:31:36 +0200
  • beeddcf35d Got rid of gb-include.h Ivan Skytte Jørgensen 2018-07-26 17:29:51 +0200
  • 2c393c0deb Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-07-26 17:02:12 +0200
  • 95f7e8b79c Moved 'gbmemcpy' macro to separate file Ivan Skytte Jørgensen 2018-07-26 17:01:42 +0200
  • 9dec2f6ccd Fix warning: catching polymorphic type ‘class std::bad_alloc’ by value Ai Lin Chia 2018-07-26 17:00:39 +0200
  • fb04f7d884 Add some warnings for g++ 8 Ai Lin Chia 2018-07-26 16:59:53 +0200
  • 53b9973d2f Changed calls to gbmemcpy() where it was obvious if memcpy or memmove were applicable Ivan Skytte Jørgensen 2018-07-26 16:19:54 +0200
  • 1636426d66 Fix rare (harmless) bug on memcpy Ivan Skytte Jørgensen 2018-07-26 15:29:55 +0200
  • 3bcc51f1e2 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-07-26 15:13:06 +0200
  • 6af453e3df bugfix tokenizer posessive-s fix was refering to possible reallocated memory Ivan Skytte Jørgensen 2018-07-26 15:12:57 +0200
  • 0fb0a600cf memcpy.->memmove Ivan Skytte Jørgensen 2018-07-26 14:34:06 +0200
  • 3c8b332c0c Merge branch 'master' into dev-acceptlang Ai Lin Chia 2018-07-26 11:09:47 +0200
  • 27c98acb8d Make logDebugTcpBuf work again Ai Lin Chia 2018-07-25 14:09:56 +0200
  • 061e93804e Use host instead of url to get country tld Ai Lin Chia 2018-07-25 14:03:24 +0200
  • 3d3a192faf Change CountryLanguage config filename Ai Lin Chia 2018-07-25 13:48:42 +0200
  • 9d9a71f688 Add new file to makefile (missed from previous commit) Ai Lin Chia 2018-07-25 13:43:56 +0200
  • be3751c686 Initial implementation of CountryLanguage Ai Lin Chia 2018-07-25 13:35:21 +0200
  • aadfab89c0 Spelling fix unicode readme Ivan Skytte Jørgensen 2018-07-24 17:21:54 +0200
  • 6aa7d01098 Removed UCPropTable.* (forgotten removal from unicode-branch merge) Ivan Skytte Jørgensen 2018-07-24 17:09:06 +0200
  • 1ff1c807a1 Remove commented out code Ai Lin Chia 2018-07-24 16:46:30 +0200
  • 4d77aea58f Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-07-24 14:37:53 +0200
  • c839df797d Don't require lexicon_da.sto Ivan Skytte Jørgensen 2018-07-24 14:37:45 +0200
  • 9ef5e66087 Merge branch 'master' into dev-urlmatchlist Ai Lin Chia 2018-07-24 12:19:26 +0200
  • b4e7364f6e Make numCandidatePhrases match better (helps until automated tests are fixed) Ivan Skytte Jørgensen 2018-07-23 16:45:26 +0200
  • 1582e8244e Got rid of PTRFMT/PTRTYPE (except in Mem.cpp), and use %p instead Ivan Skytte Jørgensen 2018-07-23 15:24:16 +0200
  • c4261cda0a Add hostid to eventlog Ai Lin Chia 2018-07-23 14:57:09 +0200
  • 2ec4810768 Handle bigrams from two highfreqterms. Workaroudn for not using the bigram at all Ivan Skytte Jørgensen 2018-07-23 14:18:21 +0200
  • 82e01e787a Log eventlog when invalid urlmatchlist is detected Ai Lin Chia 2018-07-23 11:28:20 +0200