Commit Graph

  • 3286b00952 Fix error in Jenkinsfile Ai Lin Chia 2018-01-31 11:10:18 +01:00
  • 68de07dde4 Merge branch 'master' into sqlite Ai Lin Chia 2018-01-31 10:48:00 +01:00
  • 11e809196c Use same branch name for pywebtest if available Ai Lin Chia 2018-01-31 10:47:29 +01:00
  • 70f8867ca4 Multicast was incrementing wrong statistisc (sub-array overflow) Ivan Skytte Jørgensen 2018-01-30 16:08:56 +01:00
  • 82cb44d31b Add SpiderRequest before SpiderReply in XmlDoc::getMetalist. Batch up SpiderColl::addSpiderRequest/addSpiderReply so we don't add spider request/reply for entries that are not commited to db. Sort records before adding to SpiderColl so we always add SpiderReply before SpiderRequest. Ai Lin Chia 2018-01-30 16:00:53 +01:00
  • 5d2accdc4c Remove unused SpiderdbRdbSqliteBridge::addRecord function Ai Lin Chia 2018-01-30 15:26:32 +01:00
  • 730dfacb6d Merge branch 'master' into sqlite Ivan Skytte Jørgensen 2018-01-30 14:41:22 +01:00
  • 37e6c88c47 Removed mid-file #include Ivan Skytte Jørgensen 2018-01-30 14:23:52 +01:00
  • 36ce231742 Merge branch 'master' into sqlite Ai Lin Chia 2018-01-30 11:12:21 +01:00
  • eb89a888c5 Fix bug where column m_replyFlags instead of m_requestFlags was updated with requestFlagBits Ai Lin Chia 2018-01-30 10:21:16 +01:00
  • 59a69bf469 Made more functions private in PageResults.cpp Ivan Skytte Jørgensen 2018-01-29 16:33:00 +01:00
  • 07db55fec6 Removed unused Msg40::printSearchResult9() Ivan Skytte Jørgensen 2018-01-29 16:27:53 +01:00
  • 4b866f61c2 Enforce g_conf.m_msg40_msg39_timeout while (re-)sending msg3a/msg39 Ivan Skytte Jørgensen 2018-01-29 16:17:27 +01:00
  • 9bb06c8d7c Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-01-29 15:25:20 +01:00
  • 419e0f0baf Temporarily allo non-mepty redirect results (old titledb records) Ivan Skytte Jørgensen 2018-01-29 15:25:01 +01:00
  • 289c2cdbcf Split Msg4 trace log to Msg4In, Msg4Out, Msg4Out data Ai Lin Chia 2018-01-29 14:30:18 +01:00
  • c070d7c3ae Log query_id when msg40 starts, redoes, and ends. And also in msg39 start (debug only) Ivan Skytte Jørgensen 2018-01-29 14:19:30 +01:00
  • 10231557ee Removed fwd-decl of ucCombiningClass() which was not implemented Ivan Skytte Jørgensen 2018-01-26 18:31:58 +01:00
  • b6864e7091 Made g_ucKDIndex local/private Ivan Skytte Jørgensen 2018-01-26 18:17:26 +01:00
  • f0ca1bdb3c Moved calls to unicode-proptables reset() from Process to Unicode.* Ivan Skytte Jørgensen 2018-01-26 18:14:44 +01:00
  • 52b942b478 Moved ucIsWordChar() from fctypes to UnicodeProperties. Ivan Skytte Jørgensen 2018-01-26 18:07:48 +01:00
  • a54a29b056 bugfix: reverted bogus test code in main() Ivan Skytte Jørgensen 2018-01-26 18:04:33 +01:00
  • 74e6d04e8b Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-01-26 17:57:18 +01:00
  • 3d45e85a8c Moved calculateChecksum() from fctypes to UnicodeProperties which are the only caller Ivan Skytte Jørgensen 2018-01-26 17:57:11 +01:00
  • 44cfac7f46 Don't loop infinitely during shutdown Ai Lin Chia 2018-01-26 17:29:29 +01:00
  • 02a127a5d2 Add check to see if ts is nullptr before using it Ai Lin Chia 2018-01-26 16:38:11 +01:00
  • f13aff5162 Merge branch 'master' into sqlite Ai Lin Chia 2018-01-26 11:41:34 +01:00
  • c838f659d6 Merge branch 'master' into dev-tagrec Ai Lin Chia 2018-01-26 11:36:00 +01:00
  • 36a953e5c2 Removed unused g_ucScriptNames Ivan Skytte Jørgensen 2018-01-25 16:48:47 +01:00
  • f60566aafb Merge branch 'master' into dev-tagrec Ai Lin Chia 2018-01-25 14:43:09 +01:00
  • ca90e298b9 Use different Msg8a to get tagrec Ai Lin Chia 2018-01-25 14:13:44 +01:00
  • d83dba7385 Don't need to refresh tagrec for msg20. We're not using manualban anymore Ai Lin Chia 2018-01-25 13:57:15 +01:00
  • 48779b3023 Use getCurrentTagRec for XmlDoc::getOutlinkTagRecVector Ai Lin Chia 2018-01-24 17:21:17 +01:00
  • ec8d6957ef fix log introduced in d4770bbf2d Brian Rasmusson 2018-01-24 17:16:55 +01:00
  • fbb8fe384c Revert "Use currentUrl instead of firstUrl for tagrec" Ai Lin Chia 2018-01-24 17:10:12 +01:00
  • e2e4b669a2 Revert "Revert "We should use tagrec based on current url to get site tag instead of tagrec based on first url"" Ai Lin Chia 2018-01-24 17:10:02 +01:00
  • 94d62deccc Simplify code Ai Lin Chia 2018-01-24 17:00:32 +01:00
  • 25defea50c Use gr instead of m_tagRec directly Ai Lin Chia 2018-01-24 16:48:46 +01:00
  • f240d00895 Remove manualban logic (we now use urlblacklist instead) Ai Lin Chia 2018-01-24 16:48:15 +01:00
  • 4502af99b7 Remove commented out codes Ai Lin Chia 2018-01-24 16:13:07 +01:00
  • d4770bbf2d Log sub-site deteted in SiteGetter Ivan Skytte Jørgensen 2018-01-24 15:51:28 +01:00
  • 505f04603d Merge branch 'master' into dev-tagrec Ai Lin Chia 2018-01-24 14:57:28 +01:00
  • fa53559b1e Add explicitKeywords to json response as well Ai Lin Chia 2018-01-24 14:57:06 +01:00
  • 7f96ef2510 Removed unused logic in SiteGetter Ivan Skytte Jørgensen 2018-01-24 14:50:23 +01:00
  • f6bbd83818 Removed unused ucDigitValue() Ivan Skytte Jørgensen 2018-01-24 13:15:24 +01:00
  • dcfe64be26 Merge branch 'master' into dev-tagrec Ai Lin Chia 2018-01-24 12:33:04 +01:00
  • d49384234a Add conf to decide if we want to reuse tagrec data from titledb during a rebuild Ai Lin Chia 2018-01-24 11:12:27 +01:00
  • dd232d9712 Assume that query instance 'instantly' complete scan for repair Ai Lin Chia 2018-01-23 15:56:22 +01:00
  • 428d3e5d46 Simplify code Ai Lin Chia 2018-01-23 15:45:49 +01:00
  • ecd90a9961 only trigger rebuild on spider instances, and no more spliting of records between twins Ai Lin Chia 2018-01-23 15:44:41 +01:00
  • 22d105fe38 Don't recalculate isAdult flag when only rebuilding spiderdb Ai Lin Chia 2018-01-23 14:41:48 +01:00
  • 67b5149773 Remove commented out code Ai Lin Chia 2018-01-23 12:37:54 +01:00
  • 37ba06d6d0 Don't spider links when m_rebuildAddOutlinks is disabled for repair Ai Lin Chia 2018-01-22 17:21:56 +01:00
  • 495740c738 Removed unused local variable Ivan Skytte Jørgensen 2018-01-22 16:13:02 +01:00
  • ab65af04cb Removed special case for collection 'GLOBAL-INDEX' (whatever that was) Ivan Skytte Jørgensen 2018-01-22 16:00:36 +01:00
  • 8698efca36 Don't spider links when rebuilding spiderdb & m_rebuildAddOutlinks is not set to true Ai Lin Chia 2018-01-22 15:59:25 +01:00
  • 1888db1e35 Merge branch 'master' into sqlite Ivan Skytte Jørgensen 2018-01-22 15:41:16 +01:00
  • 417d04fd50 Removed all mention onf ping servers Ivan Skytte Jørgensen 2018-01-22 15:15:56 +01:00
  • a68f28c02a Fix trace logs Ai Lin Chia 2018-01-22 14:24:40 +01:00
  • 6bc3608cc2 bugfix Words tokenizer Ivan Skytte Jørgensen 2018-01-22 12:39:09 +01:00
  • ad00cc9ad6 Don't include anchor & query parameter as part of site Ai Lin Chia 2018-01-22 11:54:18 +01:00
  • d31c92bbf7 Cater for redirection from filtering html document (javascript redirect) Ai Lin Chia 2018-01-19 17:38:16 +01:00
  • f5707c99e3 UnicodeProperties.h: comment on ucScriptLatin enum value Ivan Skytte Jørgensen 2018-01-19 17:17:50 +01:00
  • 045b33e6a8 Removed always-false Words::m_hasTags Ivan Skytte Jørgensen 2018-01-19 17:01:19 +01:00
  • 8c807405b3 Use currentUrl instead of firstUrl for tagrec Ai Lin Chia 2018-01-19 16:18:08 +01:00
  • aec712eea8 Revert "We should use tagrec based on current url to get site tag instead of tagrec based on first url" Ai Lin Chia 2018-01-19 16:17:40 +01:00
  • 28e0afb019 Show explicit keywords on docid info page(s) Ivan Skytte Jørgensen 2018-01-19 16:15:08 +01:00
  • 6b0d9b1f7d Handle deleted explicit keywords Ivan Skytte Jørgensen 2018-01-19 15:48:56 +01:00
  • 41663915b8 Explicit padding to avoid uninitialized bytes in network traffic Ivan Skytte Jørgensen 2018-01-19 14:50:42 +01:00
  • c5305d1b20 Removed debuglog left over in FxExplicitKeywords.cpp Ivan Skytte Jørgensen 2018-01-19 14:30:43 +01:00
  • ab214632e7 Support extra keywords/temrs on pages Ivan Skytte Jørgensen 2018-01-19 14:24:01 +01:00
  • 3be653ac2e Add more logs Ai Lin Chia 2018-01-19 14:13:49 +01:00
  • 903f494cec Add verify tagrec Ai Lin Chia 2018-01-19 12:58:23 +01:00
  • 8932cfa01e Don't continue if tagrec is null Ai Lin Chia 2018-01-19 11:48:09 +01:00
  • 7d30ff012f Don't use m_baseTagRec when url is blocked. Set it to nullptr instead Ai Lin Chia 2018-01-19 11:39:57 +01:00
  • f4ff307356 Code style changes Ai Lin Chia 2018-01-19 11:11:34 +01:00
  • d114f7771a Simplify code Ai Lin Chia 2018-01-19 11:11:19 +01:00
  • 956fb1b002 Fix sitehash search Ai Lin Chia 2018-01-18 13:11:43 +01:00
  • 461f14dc5a Fix bug where host is not treated as hex Ai Lin Chia 2018-01-18 11:42:48 +01:00
  • fabc0d5b72 More logs Ai Lin Chia 2018-01-18 11:37:06 +01:00
  • ea65f2e1a9 Enable logs Ai Lin Chia 2018-01-18 10:58:51 +01:00
  • ed98b2df79 More logs for DocProcess Ai Lin Chia 2018-01-17 18:35:21 +01:00
  • 933f826084 First ip must be valid so just get it from latest xmldoc Ai Lin Chia 2018-01-16 16:52:53 +01:00
  • 80cb8cb956 Fix logging for verify_spiderdb Ai Lin Chia 2018-01-16 16:38:03 +01:00
  • fb45f5e6ad Add verify_spiderdb tool Ai Lin Chia 2018-01-16 16:01:47 +01:00
  • fda39e658f Fix size of SpiderRequest Ai Lin Chia 2018-01-16 12:50:35 +01:00
  • 39ff6843b8 Fix setting of SpiderRequest Ai Lin Chia 2018-01-16 12:37:29 +01:00
  • ba75b694b9 Don't spawn out for HTML when it's not http status 200 Ai Lin Chia 2018-01-16 11:29:16 +01:00
  • d44e535862 Fix segfault in XmlDoc::isFirstUrlCanonical due to unhandled 'blocked' condition for getCanonicalUrl Ai Lin Chia 2018-01-16 10:47:33 +01:00
  • a963de6b9e added more trace log to XmlDoc Brian Rasmusson 2018-01-12 23:44:24 +01:00
  • a4bcb95a5b bugfix 4d140d2504 (not enough code was removed) Ivan Skytte Jørgensen 2018-01-12 17:10:22 +01:00
  • 13de19977c Log differently when it's a real fake sreq firstIP, and when it's different but not fake Ai Lin Chia 2018-01-12 16:11:24 +01:00
  • 5651bcc22f Fix admin/spiderdb json output Ai Lin Chia 2018-01-11 12:08:36 +01:00
  • 2fd2d52877 Merge branch 'master' of github.com:privacore/open-source-search-engine Ivan Skytte Jørgensen 2018-01-12 15:27:22 +01:00
  • 4d140d2504 Posdbtable: BF_NUMBER is no longer set nor needed Ivan Skytte Jørgensen 2018-01-12 15:27:18 +01:00
  • 39937833d2 Resurrect /admin/status (it is useful for automated tests) Ivan Skytte Jørgensen 2018-01-12 15:24:24 +01:00
  • 1b3ee614fe Cleanup after gbsortbyint/gbrevsortbyint are no longer supported Ivan Skytte Jørgensen 2018-01-12 15:04:24 +01:00
  • bdb70c3460 Removed support for 'gbsortbyint:' and 'gbrevsortbyint:' Ivan Skytte Jørgensen 2018-01-12 14:47:55 +01:00
  • 32fe677fa1 Removed 'secsback' cgi parameter Ivan Skytte Jørgensen 2018-01-12 14:13:01 +01:00
  • fa787843d3 Removed obsolete comment Ivan Skytte Jørgensen 2018-01-11 18:12:48 +01:00