Commit Graph

  • ef195980d3 Treat meta content as HTML instead of plain text Ai Lin Chia 2016-02-15 15:45:52 +01:00
  • 4d13cc2202 Explicit size_t casts to fix warnings Ivan Skytte Jørgensen 2016-02-15 13:27:04 +01:00
  • 10e75802f7 Use sizeof() instead of relying on a #define Ivan Skytte Jørgensen 2016-02-15 13:25:40 +01:00
  • 621addec06 Removed double-#define Ivan Skytte Jørgensen 2016-02-15 13:23:42 +01:00
  • cac6a57820 Fix reference to local stack variable Ivan Skytte Jørgensen 2016-02-15 13:19:15 +01:00
  • 80b8f1bb91 Changed Msg39::m_timeout (currently unused) to milliseconds Ivan Skytte Jørgensen 2016-02-15 12:12:06 +01:00
  • abd465f5a4 Removed unused members Msg39::m_nowUTC and Msg40::m_nowUTC Ivan Skytte Jørgensen 2016-02-15 11:40:22 +01:00
  • 048c51f930 Added options to emulate fx_country, fx_blang and fx_fetld to search result page Brian Rasmusson 2016-02-14 22:34:01 +01:00
  • fb20c5c15c Rolled back 76c972cf67 as it caused core dumps on the 'page info' page due to missing tag names Brian Rasmusson 2016-02-14 19:45:05 +01:00
  • da70357d21 Merge branch 'master' of https://github.com/privacore/open-source-search-engine Brian Rasmusson 2016-02-14 18:17:22 +01:00
  • da0f18c95a A bit more trace log added Brian Rasmusson 2016-02-14 18:17:19 +01:00
  • 3d5d7e04b0 Fix coredump message wrt. timeout validation in UdpServer::sendRequest() Ivan Skytte Jørgensen 2016-02-12 17:31:53 +01:00
  • 57ead244f2 Fix timeout in Images.cpp Ivan Skytte Jørgensen 2016-02-12 17:29:17 +01:00
  • 1c9472e38e Fix timeotu in Msg51::sendRequest() Ivan Skytte Jørgensen 2016-02-12 17:28:10 +01:00
  • aa96d2961c Changed UdpSlot timeout from seconds to milliseconds. Ivan Skytte Jørgensen 2016-02-12 17:25:07 +01:00
  • a33ba463a0 More thurough valgrind checks in Msg20 and UdpSlot::sendDatagramOrAck() Ivan Skytte Jørgensen 2016-02-12 15:08:11 +01:00
  • e9b62240aa More thurough valgrind checks in msg4 storeRec() / addMetaList2() Ivan Skytte Jørgensen 2016-02-12 15:07:34 +01:00
  • 6237a68556 valgrinding: Initialize LinkInfo instances Ivan Skytte Jørgensen 2016-02-11 16:16:11 +01:00
  • 73eaaeabed valgrind: allow questionable construct in Url::set() Ivan Skytte Jørgensen 2016-02-11 13:55:14 +01:00
  • 76122543e5 thanks for the bug fix, ivan! Matt 2016-02-09 10:38:46 -07:00
  • f662aed2af More differentiated trace log Brian Rasmusson 2016-02-10 14:49:51 +01:00
  • badf208efe Added tracing option for repairs Brian Rasmusson 2016-02-09 16:19:10 +01:00
  • c4c5c28512 Added c'tor to Inlink so we don't store uninitialized bytes in linkdb Ivan Skytte Jørgensen 2016-02-09 15:31:39 +01:00
  • b8871d7e73 valgrinding: more suppressions Ivan Skytte Jørgensen 2016-02-09 14:49:50 +01:00
  • d867efddb9 valgrinding: add sanity checks when reading bytes from the internet Ivan Skytte Jørgensen 2016-02-09 14:49:08 +01:00
  • a7818e978a valgrinding: Mark bytes form SSL_read() as defined Ivan Skytte Jørgensen 2016-02-09 14:48:16 +01:00
  • 011e09e95a Initialize HashTableX::m_vals in case the hashtable is used for unique posdb entries Ivan Skytte Jørgensen 2016-02-09 13:51:03 +01:00
  • 4e61bae633 Removed Links::linksToGigablast() and assocaited members Ivan Skytte Jørgensen 2016-02-09 13:22:40 +01:00
  • 9b68a04984 Ensure hash value arrays are set even when hashing the redirect URL Ivan Skytte Jørgensen 2016-02-09 13:19:53 +01:00
  • c89fcfc5ec Remvoed static s_pid variable from log.* Ivan Skytte Jørgensen 2016-02-05 17:04:47 +01:00
  • b2bd45abb7 Removed static variable s_pid from UdpServer.cpp Ivan Skytte Jørgensen 2016-02-05 17:03:57 +01:00
  • ac50e7d4d4 Initialize Links::m_doQuickSet Ivan Skytte Jørgensen 2016-02-09 11:57:53 +01:00
  • cba792a97c bugfix add-url reference to free'd memory Ivan Skytte Jørgensen 2016-02-08 17:28:52 +01:00
  • 865b869e16 Only accept valid character entity references terminated with semicolon Ivan Skytte Jørgensen 2016-02-08 16:45:34 +01:00
  • 2435c2d11c Log size of documents when spidering Ivan Skytte Jørgensen 2016-02-08 15:22:51 +01:00
  • e7d03b0e60 Changed UdpSlot::m_incoming from char to bool Ivan Skytte Jørgensen 2016-02-08 13:32:36 +01:00
  • ac68d01c6b Changed some default values, rolled-back checkbox 'fix' Brian Rasmusson 2016-02-05 16:47:44 +01:00
  • 331e66266c bugfix incorrect check on collection number (always false) Ivan Skytte Jørgensen 2016-02-05 15:19:13 +01:00
  • 3dcba0627e Join Msg3a::getDocIds() and msg3a::gotCacheReply() (tail call) Ivan Skytte Jørgensen 2016-02-05 14:59:16 +01:00
  • e53a1ff7bc Removed decl. of unimplemented Msg3a methods: gotReranked/gotClusterRecs/gotClusterLevels Ivan Skytte Jørgensen 2016-02-05 14:55:26 +01:00
  • 9aff3c6c6c Removed seo-related stuff from Msg3a Ivan Skytte Jørgensen 2016-02-05 14:52:45 +01:00
  • 9564127645 Made globals static in Msg3a.cpp Ivan Skytte Jørgensen 2016-02-05 14:40:51 +01:00
  • dbc12ebc80 Removed unused callbacks from Msg3a.cpp Ivan Skytte Jørgensen 2016-02-05 14:37:13 +01:00
  • 60d94b8ba1 Removed Msg39Request::m_useSeoResultsCache and related code Ivan Skytte Jørgensen 2016-02-05 14:33:31 +01:00
  • b0412f7ee3 Removed unused methods from XmlDoc Ivan Skytte Jørgensen 2016-02-05 14:27:56 +01:00
  • 0bbcefb7b5 HashTableX: more constness Ivan Skytte Jørgensen 2016-02-05 10:59:55 +01:00
  • a4c0c752e4 Fix: possible double free appchecker 2016-02-05 16:11:53 +03:00
  • 1fce015cab Minor spider code rearrangements Brian Rasmusson 2016-02-04 21:43:17 +01:00
  • df810c7d10 Cleanup of Msg39 (private members, friend declarations, etc) Ivan Skytte Jørgensen 2016-02-04 17:44:32 +01:00
  • 2946ed654e Removed even more unused memebrs from Msg39 Ivan Skytte Jørgensen 2016-02-04 17:17:29 +01:00
  • c472a824fb Remvoed 'hack' members of Msg39 Ivan Skytte Jørgensen 2016-02-04 17:09:58 +01:00
  • 9705538f32 Removed unused Msg39 members m_topScore50/m_topDocId50 Ivan Skytte Jørgensen 2016-02-04 16:58:22 +01:00
  • 203ebde2be Suppress valgrind warning until we figure out why siteterm is hashed twice Ivan Skytte Jørgensen 2016-02-04 16:36:23 +01:00
  • d7d771d15c Avoid hashing the smae URL twice eventually causing complaints from valgrind Ivan Skytte Jørgensen 2016-02-04 16:26:59 +01:00
  • 641274bb06 Avoid double-:// in index Ivan Skytte Jørgensen 2016-02-04 15:43:26 +01:00
  • 35b5906747 Fix hostname/url checks Ivan Skytte Jørgensen 2016-02-04 15:22:51 +01:00
  • 81e3edc13d Fix TimeZone.* Ivan Skytte Jørgensen 2016-02-04 15:06:47 +01:00
  • 95b3a6273d Changed Multicast:.send() parameter timeout to milliseconds Ivan Skytte Jørgensen 2016-02-04 13:57:17 +01:00
  • 7b9898f453 fix coredump when filtering weird summaries Ivan Skytte Jørgensen 2016-02-04 13:47:49 +01:00
  • 60efc4230e Removed unused memebrs XmlDoc::m_tempMsg25Page/m_tempMsg25Site Ivan Skytte Jørgensen 2016-02-02 14:21:38 +01:00
  • 4e17858ec9 Removed now unused methods XmlDoc::getAllInlinks/lookupTitles/gotLinkerTitle Ivan Skytte Jørgensen 2016-02-02 14:18:43 +01:00
  • 854f250060 Simplify and centalize check for if a host is on the intranet/internal net Ivan Skytte Jørgensen 2016-02-02 13:59:01 +01:00
  • ee6b16512d Simplify logic for selecting datagram size Ivan Skytte Jørgensen 2016-02-02 11:40:56 +01:00
  • fb9c56c156 Use symbolic constants for return values from ip_distance() Ivan Skytte Jørgensen 2016-02-02 11:34:37 +01:00
  • 82befe9ed9 Removed unused UdpSlot::m_firstReadTime Ivan Skytte Jørgensen 2016-02-01 16:54:45 +01:00
  • bffd2437db Removed unused and dubious debug feature Ivan Skytte Jørgensen 2016-02-01 16:50:27 +01:00
  • 95531bd1bb Initialize m:startTime in more cases when it will be used (found with valgrind) Ivan Skytte Jørgensen 2016-02-01 16:26:55 +01:00
  • 61b8dad74c Added more structured tests for IP location/dinstance Ivan Skytte Jørgensen 2016-02-01 15:33:40 +01:00
  • c58a73e633 Removed effectively unused members SpiderReply::m_sentToDiffbotThisTime and SpiderReply::m_hadDiffbotError Ivan Skytte Jørgensen 2016-02-01 14:42:19 +01:00
  • ff592b7a4a Fixed wrong hashes on TermInfo page for 'gbsortby' numbers Brian Rasmusson 2016-02-01 14:20:08 +01:00
  • 0bad5645d3 Split Msg12 from Spider.cpp Brian Rasmusson 2016-02-01 12:06:17 +01:00
  • da8734a848 More Spider splitting Brian Rasmusson 2016-01-30 21:29:32 +01:00
  • 33ca485b53 Removed unused code from Spider.cpp Brian Rasmusson 2016-01-30 20:18:07 +01:00
  • 6cee130646 Separated Doledb from Spider.cpp Brian Rasmusson 2016-01-30 20:11:15 +01:00
  • 8463f36179 Split Spider.cpp into Spider, SpiderColl and SpiderLoop. More to come.. Brian Rasmusson 2016-01-30 19:41:54 +01:00
  • 61f4b27aeb Don't replace '>' & '<' to '|' when converting from HTML entities Ai Lin Chia 2016-01-29 19:18:22 +01:00
  • 30dab477c6 Added differentiated trace log to Spider and RdbBase Brian Rasmusson 2016-01-29 16:43:03 +01:00
  • c3e98959eb Removed global variable 'g_r' Ivan Skytte Jørgensen 2016-01-29 16:15:07 +01:00
  • 3bae6aa5f6 Removed constant-value SearchInput::m_numTopicGroups (always 1) Ivan Skytte Jørgensen 2016-01-29 16:01:31 +01:00
  • 063464f786 Removed unused SearchInput members m_sbuf3 and m_cookieBuf Ivan Skytte Jørgensen 2016-01-29 15:53:40 +01:00
  • 057381eb3b Removed unused SearchInput members m_catId and m_isRTL Ivan Skytte Jørgensen 2016-01-29 15:44:23 +01:00
  • bcc1c34e10 Disable old hack regarding all-terms that indirectly disabled OR-logic Ivan Skytte Jørgensen 2016-01-29 15:37:18 +01:00
  • 29d54e1fb8 Removed unused member SearchInput::m_q2 Ivan Skytte Jørgensen 2016-01-29 15:06:27 +01:00
  • 3fc3d48c01 Remove conversion of "1<super>st</super>" into "1st" Ai Lin Chia 2016-01-29 12:52:09 +01:00
  • e33c1eb029 Cater for if mismatched attribute is at the start of tag Ai Lin Chia 2016-01-29 12:39:02 +01:00
  • 323b656353 Add more unhandled scenarios comments Ai Lin Chia 2016-01-28 17:41:29 +01:00
  • 6f30c454d1 Separated trace log for RdbMap and BigFile Brian Rasmusson 2016-01-29 12:32:26 +01:00
  • 3dd2eab5b3 removed excessive debug log from SafeBuf Brian Rasmusson 2016-01-29 11:42:29 +01:00
  • 3c374bc37c Removed references to un-implemented checkRegex() Ivan Skytte Jørgensen 2016-01-29 11:37:50 +01:00
  • d61a298c0d Removed most of diffbot references in XmlDoc.* Ivan Skytte Jørgensen 2016-01-29 11:35:05 +01:00
  • e4b68baac4 Auto-merge interval changed from 2 to 60 seconds Brian Rasmusson 2016-01-29 10:04:58 +01:00
  • ae51dac033 Removed hardcoded references to dead sites Ivan Skytte Jørgensen 2016-01-28 18:17:58 +01:00
  • 9063a23bf6 Remvoed hardcoded logicin in Msg13 concerning flurbit website Ivan Skytte Jørgensen 2016-01-28 18:12:35 +01:00
  • 62459eb0ae Removed functionally unused method XmlDoc::getIsBinary() Ivan Skytte Jørgensen 2016-01-28 17:43:38 +01:00
  • 47b779e659 Remvoed idffbot stuff from XmlDoc::print() Ivan Skytte Jørgensen 2016-01-28 14:36:43 +01:00
  • 2ea354df6d Removed some unused members of XmlDoc Ivan Skytte Jørgensen 2016-01-28 17:31:57 +01:00
  • 9d718cdfcf Removed code in XmlDoc::getLinkInfo2() after unconditional early return Ivan Skytte Jørgensen 2016-01-28 17:25:30 +01:00
  • 26d43c0245 Try to work around invalid quotes in meta tags Ai Lin Chia 2016-01-28 17:18:03 +01:00
  • 03a649d022 Added trace log Brian Rasmusson 2016-01-28 17:11:42 +01:00
  • 9c0efbeb9a Removed diffbotReply parameter from XmlDoc::injectDoc() - it was always unused in non-diffbot installations Ivan Skytte Jørgensen 2016-01-28 14:03:44 +01:00