Commit Graph

  • b4b993c06f Revert "Fix read of uninitializae data in Words::set()". This commit is causing coredump when 'string literal' is passed to Words::set. Reverting it until a better fix can be found. Ai Lin Chia 2015-11-18 19:30:04 +01:00
  • f7275a746c Fully initialize StatKey and StatData so we don't write uninitialized bytes to the Rdb Ivan Skytte Jørgensen 2015-11-18 14:41:56 +01:00
  • 7fd6ffb6df Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-18 13:34:54 +01:00
  • 413352e09e Fix read of uninitializae data in Words::set() Ivan Skytte Jørgensen 2015-11-18 13:31:24 +01:00
  • 5e568b1a50 Added option to force merge of linkdb, and removed abort call in Mem.cpp Brian Rasmusson 2015-11-18 12:50:19 +01:00
  • 8f83834af5 Initialize TopTree:m_pickRight Ivan Skytte Jørgensen 2015-11-18 12:24:28 +01:00
  • c4d692234f Initialize HashTableX::m_ds Ivan Skytte Jørgensen 2015-11-18 12:13:50 +01:00
  • 48c38a7bc4 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-17 16:17:27 +01:00
  • 2b3b840e4a Initialize PostQueryRerank::m_cvtUrl Ivan Skytte Jørgensen 2015-11-17 16:16:59 +01:00
  • d2e8694c82 NUL-terminate rewritten URL Ivan Skytte Jørgensen 2015-11-17 16:10:04 +01:00
  • 3672ec2b8e Fix uninitialized variable read if timestamp is zero Ivan Skytte Jørgensen 2015-11-17 14:36:35 +01:00
  • cc53fd1c4b fix core from bogus url some more. Matt 2015-11-16 12:51:18 -07:00
  • b7f21d1018 fix core dump from empty url Matt 2015-11-16 12:08:16 -07:00
  • 70aa78828f Merged gigablast 296651d4160ee7c9946103c0e4e0576c412d36cd Matt 2015-11-16 09:53:40 -07:00
  • 5c75f23fba Initialize QueryTerm::m_numAlnumWordsInSynonym Ivan Skytte Jørgensen 2015-11-17 11:42:07 +01:00
  • 88b3af1271 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-16 14:01:04 +01:00
  • df5a4bff71 Use unsigned types for bitfields Ivan Skytte Jørgensen 2015-11-16 13:57:22 +01:00
  • 8c19d9efdc Clear out padding bytes in Msg22Request structure. Ivan Skytte Jørgensen 2015-11-16 13:39:43 +01:00
  • 329cc8619e Merged noquery/nospider fixes from gigablast Zak Betz 2015-11-13 15:03:02 -07:00
  • 87df5122ed Make sure we output valid json (remove some control characters, remove invalid utf-8 characters) Ai Lin Chia 2015-11-13 17:55:44 +01:00
  • dbf1ab6efe Remove commented-out from Scraper.cpp Ivan Skytte Jørgensen 2015-11-13 11:41:49 +01:00
  • 9db08e0981 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-13 11:05:05 +01:00
  • dd21abf042 Clear Msg39Request in reset() so we don't send uninitialized bytes over the network Ivan Skytte Jørgensen 2015-11-13 11:04:09 +01:00
  • 8b1ee00927 Merge remote-tracking branch 'origin/master' Brian Rasmusson 2015-11-12 16:36:18 +01:00
  • eedcd6b871 Added option to use readble timestamps in logging Brian Rasmusson 2015-11-12 16:16:39 +01:00
  • 8d75ea4bea Add make dist target Ai Lin Chia 2015-11-12 16:01:57 +01:00
  • d11c7895cb Fix supported_charsets so we can compile/execute Ai Lin Chia 2015-11-12 15:17:42 +01:00
  • 49dd0e77da Makefile cleanup, remove target with missing source Ai Lin Chia 2015-11-12 15:15:44 +01:00
  • d43bc2d92b Add const for some Json encode/decode methods (used in unit test) Ai Lin Chia 2015-11-12 12:10:39 +01:00
  • 64874b91ee Remove old commented out codes Ai Lin Chia 2015-11-12 11:08:26 +01:00
  • e82e5fbe19 Merge branch 'master' of github.com:privacore/open-source-search-engine Ai Lin Chia 2015-11-11 17:32:20 +01:00
  • acaa1d4480 Use isSaneUtf8Char instead of getUtf8CharSize because the latter returns 1 for both ascii and invalid UTF-8 Ai Lin Chia 2015-11-11 17:31:11 +01:00
  • 5f990cef3e Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-11 17:22:36 +01:00
  • a1f5b083f3 Removed probing of non-existent field Msg39::rerankRuleset Ivan Skytte Jørgensen 2015-11-11 17:22:26 +01:00
  • c91eadc5bf Merge branch 'master' of github.com:privacore/open-source-search-engine Ai Lin Chia 2015-11-11 17:11:09 +01:00
  • 2060be357e Move existing JSON test into unit test Ai Lin Chia 2015-11-11 17:10:32 +01:00
  • 1a056ba028 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-11 16:28:56 +01:00
  • 3af7449f5b Bugfix operator new[] and delete[] Ivan Skytte Jørgensen 2015-11-11 16:26:43 +01:00
  • f934e8ea73 Split main into separate test file Ai Lin Chia 2015-11-11 15:55:07 +01:00
  • 559d8652c7 Use libz64.a static library again Ai Lin Chia 2015-11-11 14:52:18 +01:00
  • 3d5d119735 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-11 14:43:42 +01:00
  • 9e84af293d Resurrect libz.h etc. due to binary incompatibilities Ivan Skytte Jørgensen 2015-11-11 14:43:31 +01:00
  • 6ab1df41a2 Ignore 'dirty' third-party modules Ai Lin Chia 2015-11-11 13:57:50 +01:00
  • 68a08a59ae Fix compilation error from removal of libz64.a Ai Lin Chia 2015-11-11 13:57:25 +01:00
  • 95304a56e5 Merge branch 'master' of github.com:privacore/open-source-search-engine Ai Lin Chia 2015-11-11 13:51:27 +01:00
  • fa80fc7dea Remove reference to static libraries in Makefile Ivan Skytte Jørgensen 2015-11-11 13:50:22 +01:00
  • 0e3c1c5a04 Merge branch 'master' of github.com:privacore/open-source-search-engine Ai Lin Chia 2015-11-11 13:38:59 +01:00
  • 7900ddd85d Single character UTF-8 is valid UTF-8 as well and thus we should encode it Ai Lin Chia 2015-11-11 13:37:29 +01:00
  • 6c6c270498 Removed static libraries Ivan Skytte Jørgensen 2015-11-11 13:31:30 +01:00
  • 1357827cce Reworked timestamps, signal handlers, and profiling. Ivan Skytte Jørgensen 2015-11-11 13:25:53 +01:00
  • de7f677ea6 Merge branch 'master' of github.com:privacore/open-source-search-engine Ai Lin Chia 2015-11-10 16:38:48 +01:00
  • 1e853cebf4 Fix merge conflicts Ai Lin Chia 2015-11-10 16:38:29 +01:00
  • 949f725552 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-10 16:03:47 +01:00
  • 31030fd87b Removed global variable g_cpuUsage Ivan Skytte Jørgensen 2015-11-10 15:59:30 +01:00
  • a1038fbf66 Added additional unit test from cherry-picked commit Ai Lin Chia 2015-11-10 14:53:59 +01:00
  • c4f033d990 fix url.cpp Matt 2015-11-10 00:29:42 -07:00
  • eccc43b2a7 normalize utf8 url paths into url encoded sequences. Matt 2015-11-09 13:54:32 -07:00
  • 41e237e499 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-10 14:25:41 +01:00
  • 4725f8eb4b Move -Wl, parameter to LIBS macro instead of CPPFLAGS Ivan Skytte Jørgensen 2015-11-10 14:13:21 +01:00
  • 681b2c580f - Fix coredump when spidering "http://www.refworld.org.ru/docid/538f2bb24.html" due to invalid URL ("http://undocs.org/ru/A/C.3/68/\vSR.48") - Add google test as submodule - Move existing URL unit test into separate sub-directory Ai Lin Chia 2015-11-10 12:30:51 +01:00
  • ec41884050 Don't use sscanf() on potetially non-nul-terminated buffer Ivan Skytte Jørgensen 2015-11-09 15:42:24 +01:00
  • e73962b2ea Don't call strcpy() with src and dst being the same pointer. This happens during startup: Collectiondb::loadAllCollRecs() -> Collectiondb::addExistingColl() -> CollectionRec::load() -> strcpy() Ivan Skytte Jørgensen 2015-11-09 15:28:51 +01:00
  • e7be10182b Don't read uninitialized members of XmlNode Ivan Skytte Jørgensen 2015-11-09 15:23:59 +01:00
  • c7491ddc3e Use cld2 for language detection on query Ai Lin Chia 2015-11-05 15:43:21 +01:00
  • 995e3ec405 Remove reference to addtest in Makefile Ai Lin Chia 2015-11-05 12:14:34 +01:00
  • de5efda4ee Make QUICKPOLL macro a no-op. The QUICKPOLL macro is called excessively, eg. for each character in documents. With the searches and merging and sorting being done by backgroun threads I'm not even sure the macro is needed anymore. Making it a no-op gains approximately 2% performance. Ivan Skytte Jørgensen 2015-11-05 12:13:50 +01:00
  • 7e8933b8d5 Add third party module (cld2) Ai Lin Chia 2015-11-05 12:12:31 +01:00
  • 4455904b63 Removed forward declarations of no longer existing globals g_map_is_wspace/g_map_is_ascii3 Ivan Skytte Jørgensen 2015-11-05 12:10:51 +01:00
  • 5be1d9dae4 Remove uncompiled/unused files Ai Lin Chia 2015-11-05 12:01:47 +01:00
  • c45af99717 Fix coredump when using add URL Ai Lin Chia 2015-11-05 11:50:06 +01:00
  • 4f71098963 Optimize fctypes.h macros is_ascii(), is_ascii9(), is_ascii3(), is_wspace_a() by using simple conditionals instead of table lookups Ivan Skytte Jørgensen 2015-11-05 11:42:28 +01:00
  • 9d240694f0 Only execute first-pass filtering on first pass Ivan Skytte Jørgensen 2015-11-05 11:31:55 +01:00
  • fdec133b44 Merge remote-tracking branch 'upstream/master' Ai Lin Chia 2015-11-10 14:03:24 +01:00
  • 72a7dba89f - Fix coredump when spidering "http://www.refworld.org.ru/docid/538f2bb24.html" due to invalid URL ("http://undocs.org/ru/A/C.3/68/\vSR.48") - Add google test as submodule - Move existing URL unit test into separate sub-directory Ai Lin Chia 2015-11-10 12:30:51 +01:00
  • 423ff20365 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-11-09 15:53:33 +01:00
  • 64999c96f3 Don't use sscanf() on potetially non-nul-terminated buffer Ivan Skytte Jørgensen 2015-11-09 15:42:24 +01:00
  • 1b5dc0227f Don't call strcpy() with src and dst being the same pointer. This happens during startup: Collectiondb::loadAllCollRecs() -> Collectiondb::addExistingColl() -> CollectionRec::load() -> strcpy() Ivan Skytte Jørgensen 2015-11-09 15:28:51 +01:00
  • 66bc9ec3f0 Don't read uninitialized members of XmlNode Ivan Skytte Jørgensen 2015-11-09 15:23:59 +01:00
  • ca98d20173 Use cld2 for language detection on query Ai Lin Chia 2015-11-05 15:43:21 +01:00
  • da7fc8084f Remove reference to addtest in Makefile Ai Lin Chia 2015-11-05 12:14:34 +01:00
  • 2832184ffc Make QUICKPOLL macro a no-op. The QUICKPOLL macro is called excessively, eg. for each character in documents. With the searches and merging and sorting being done by backgroun threads I'm not even sure the macro is needed anymore. Making it a no-op gains approximately 2% performance. Ivan Skytte Jørgensen 2015-11-05 12:13:50 +01:00
  • 576fb5b19b Add third party module (cld2) Ai Lin Chia 2015-11-05 12:12:31 +01:00
  • 45ff10d8d2 Removed forward declarations of no longer existing globals g_map_is_wspace/g_map_is_ascii3 Ivan Skytte Jørgensen 2015-11-05 12:10:51 +01:00
  • eaeb641680 Remove uncompiled/unused files Ai Lin Chia 2015-11-05 12:01:47 +01:00
  • e4a121114e Fix coredump when using add URL Ai Lin Chia 2015-11-05 11:50:06 +01:00
  • 950291a64a Optimize fctypes.h macros is_ascii(), is_ascii9(), is_ascii3(), is_wspace_a() by using simple conditionals instead of table lookups Ivan Skytte Jørgensen 2015-11-05 11:42:28 +01:00
  • 5acfb94f41 Only execute first-pass filtering on first pass Ivan Skytte Jørgensen 2015-11-05 11:31:55 +01:00
  • 95d70b110e fix bug in rebuild pipeline. need to merge the files lest we max the # of files out. Matt 2015-11-03 11:12:39 -07:00
  • 488db03f60 do not send summary requests to non queryable hosts Matt 2015-10-22 11:46:13 -06:00
  • 75b72cc233 fix add url seg fault Matt 2015-10-14 13:57:47 -06:00
  • e57e3481b4 fix innerloop strangeness when counting keys in buckets Matt 2015-10-14 13:52:42 -06:00
  • 3e19d43aa5 fix core Matt 2015-10-14 12:03:12 -06:00
  • a4901431be a couple little fixes to pass smokes Matt 2015-10-14 11:53:05 -06:00
  • c37ab2697e Merge branch 'ia' into testing Matt 2015-10-12 10:40:16 -06:00
  • 08877b6334 Merge branch 'diffbot-testing' into testing Matt 2015-10-12 10:39:35 -06:00
  • d045138c22 Merge branch 'ia' into ia-zak Matt 2015-10-10 14:15:46 -06:00
  • 4d7d5b12a2 Merge branch 'diffbot-testing' into ia Matt 2015-10-10 14:15:36 -06:00
  • fa691bf06c also fix for numbers like for facet termlists Matt 2015-10-10 14:15:09 -06:00
  • 43c8949841 Merge branch 'ia' into ia-zak Matt 2015-10-10 14:07:15 -06:00
  • 298ae7e7b2 Merge branch 'diffbot-testing' into ia Matt 2015-10-10 14:05:27 -06:00