Commit Graph

  • 97e336fd07 Remove commented out code Ai Lin Chia 2017-02-16 11:26:42 +01:00
  • 2cf1140d7b Use logError instead of log Ai Lin Chia 2017-02-16 11:26:25 +01:00
  • 3c8a5135cd Renamed Query::m_stackBuf to something sensible Ivan Skytte Jørgensen 2017-02-15 22:40:26 +01:00
  • 62d96d5fdf Encapsulate Query more Ivan Skytte Jørgensen 2017-02-15 22:39:06 +01:00
  • 2578f360e9 Renamed Query::m_osb to something sensible Ivan Skytte Jørgensen 2017-02-15 22:34:23 +01:00
  • c70e4ac681 Renamed Query::m_sb to something sensible and changed it to a SmallBuf<> Ivan Skytte Jørgensen 2017-02-15 22:26:41 +01:00
  • 2e15a02a9a More encapsulation of Query Ivan Skytte Jørgensen 2017-02-15 22:19:52 +01:00
  • 0cd90b5cad Encapsulated Query::m_orig Ivan Skytte Jørgensen 2017-02-15 22:18:18 +01:00
  • c44221dbb5 Modify urlblocklist.txt into an example file Ai Lin Chia 2017-02-15 11:41:53 +01:00
  • 2a59c6a951 Fix confusing comment + variable Ivan Skytte Jørgensen 2017-02-14 16:28:06 +01:00
  • dd76c517bb Fix casts and comments in Posdb::getTermFreq() Ivan Skytte Jørgensen 2017-02-14 16:23:20 +01:00
  • 9387a7d180 getTermFreqWeight: Swap y_min/y_max in call to scale_linear() os common terms have less weight Ivan Skytte Jørgensen 2017-02-14 16:11:52 +01:00
  • 7d1ad4e542 Fix slow shutdown of UrlRealtimeClassification Ivan Skytte Jørgensen 2017-02-14 15:44:49 +01:00
  • 7bb060caff RdbBuckets::deleteNode() needs mutex lock too Ivan Skytte Jørgensen 2017-02-14 13:36:17 +01:00
  • c2839dd7a3 Merge branch 'master' into nomerge2 Ivan Skytte Jørgensen 2017-02-14 11:33:14 +01:00
  • e0dde65209 More const Ivan Skytte Jørgensen 2017-02-14 11:32:24 +01:00
  • 8304ab6f19 Removed obsolete comment Ivan Skytte Jørgensen 2017-02-14 11:30:08 +01:00
  • f3a46cf325 added more unwanted domains with auto-generated data and another unwanted one Brian Rasmusson 2017-02-13 22:08:16 +01:00
  • afed10ceb1 added two more unwanted domains with auto-generated data Brian Rasmusson 2017-02-13 21:57:30 +01:00
  • 9502efb1b3 Fixed no longer correct comments Ivan Skytte Jørgensen 2017-02-13 16:59:55 +01:00
  • 8da6618685 If a word in a query is a high-freq-term then ignore it if possible Ivan Skytte Jørgensen 2017-02-13 16:44:34 +01:00
  • d59c1e1d33 Argh. Undid editor error Ivan Skytte Jørgensen 2017-02-13 14:25:05 +01:00
  • 812e570f81 Simplified logic for query word sysnonym lookup Ivan Skytte Jørgensen 2017-02-13 12:53:28 +01:00
  • b11038fc66 Turned word/phrase ignore reason into an enum Ivan Skytte Jørgensen 2017-02-13 12:26:23 +01:00
  • 6eb14aab03 More constness in Query Ivan Skytte Jørgensen 2017-02-13 11:37:10 +01:00
  • 709f85b039 Duplicated parm check: just lookup and check instead of check-then-lookup Ivan Skytte Jørgensen 2017-02-12 15:02:25 +01:00
  • 7224d41700 Check return value from fcntl(...nonblock) Ivan Skytte Jørgensen 2017-02-12 14:51:28 +01:00
  • 55564d9720 encoding fix in StopWords.cpp (da/de). More bugs remain Ivan Skytte Jørgensen 2017-02-10 17:28:46 +01:00
  • 05bf8044d5 Refine logging for URL classification Ivan Skytte Jørgensen 2017-02-10 15:37:38 +01:00
  • 46311f0a7a Support per-URL classification timeout Ivan Skytte Jørgensen 2017-02-10 15:24:55 +01:00
  • 8b1da79ea1 Support limiting the number of outstanding URL classification requests Ivan Skytte Jørgensen 2017-02-10 14:12:28 +01:00
  • 76a0d8d6a1 Initialize Cong::m_urlClassificationServerName/m_urlClassificationServerPort (harmless) Ivan Skytte Jørgensen 2017-02-10 14:06:15 +01:00
  • 039069c303 typo in log Ivan Skytte Jørgensen 2017-02-10 14:00:07 +01:00
  • ef223a0001 Renamed internal class in UrlRealtimeClassification.cpp Ivan Skytte Jørgensen 2017-02-10 13:59:34 +01:00
  • 50ce6bb326 Set thraed name of URL classification comm. thread Ivan Skytte Jørgensen 2017-02-09 15:40:22 +01:00
  • 194f25d8c9 Merge branch 'master' into nomerge2 Ivan Skytte Jørgensen 2017-02-09 15:37:11 +01:00
  • d389cc8ab7 Log when URLs are filtered out because they are classificed as malicious Ivan Skytte Jørgensen 2017-02-09 15:36:06 +01:00
  • eef755ee6b Make URL realtime classification configurable Ivan Skytte Jørgensen 2017-02-09 15:30:44 +01:00
  • ffd641e8a0 Tone down development-log for URL classificatioN Ivan Skytte Jørgensen 2017-02-09 15:06:35 +01:00
  • c1abb8be85 Do realtime URL classification Ivan Skytte Jørgensen 2017-02-09 15:03:20 +01:00
  • 0c41ef63f2 Deleted Msg3a serialization/deserialization (no longer used) Ivan Skytte Jørgensen 2017-02-09 14:41:07 +01:00
  • 3e63488dc5 Deleted Msg40 serialization/deserialization (no longer used) Ivan Skytte Jørgensen 2017-02-09 14:39:57 +01:00
  • fe31c5ccc1 goto -> for() loop in Msg40::mergeDocIdsIntoBaseMsg3a() Ivan Skytte Jørgensen 2017-02-09 12:52:39 +01:00
  • b0fc67f057 goto -> for() loop in Msg40::mergeDocIdsIntoBaseMsg3a() Ivan Skytte Jørgensen 2017-02-09 12:52:39 +01:00
  • 38cc1456c5 Unwrapped log strings in Msg40.cpp Ivan Skytte Jørgensen 2017-02-09 12:42:56 +01:00
  • 77d0ea538f Unwrapped log strings in Msg40.cpp Ivan Skytte Jørgensen 2017-02-09 12:42:56 +01:00
  • ac875d9e15 Merge branch 'master' into nomerge2 Ivan Skytte Jørgensen 2017-02-09 12:19:55 +01:00
  • 4ca6783a51 Fix coverity warning of uninitialized pointer Ai Lin Chia 2017-02-08 15:05:55 +01:00
  • 12ece72356 Add comment Ai Lin Chia 2017-02-08 13:59:33 +01:00
  • e3b8ed9999 Fix regex for url block list (add start anchor, fix subdomain regex, fix block all subdomain but www regex) Ai Lin Chia 2017-02-08 13:55:53 +01:00
  • 8949f82a40 Add more test scenarios for url block Ai Lin Chia 2017-02-08 13:53:24 +01:00
  • de352ce9e5 Drop blocked url from spiderdb Ai Lin Chia 2017-02-08 13:50:18 +01:00
  • 727aa51bc3 Split Msg40::gotSummary() into to in prepration for realtime url classification Ivan Skytte Jørgensen 2017-02-07 16:55:49 +01:00
  • 83fd61c73a Fix log message Ai Lin Chia 2017-02-07 16:48:02 +01:00
  • eecbd491af Fix unittest Ai Lin Chia 2017-02-07 16:38:43 +01:00
  • d98962bda2 UrlBlockList now uses GbRegex(pcre) Ai Lin Chia 2017-02-07 16:38:14 +01:00
  • 1def0b382a Add thin wrapper around pcre.h Ai Lin Chia 2017-02-07 16:37:41 +01:00
  • 484d4e6ae7 Added real-time URL classification (not used yet from msg40) Ivan Skytte Jørgensen 2017-02-07 16:22:49 +01:00
  • 97b7572c2f Added real-time URL classification (not used yet from msg40) Ivan Skytte Jørgensen 2017-02-07 16:17:27 +01:00
  • adc20e3129 Fix timeing bug in parallele make of Version.o Ivan Skytte Jørgensen 2017-02-07 16:00:03 +01:00
  • 4792e29548 Removed effectiely dead code for msg40 caching Ivan Skytte Jørgensen 2017-02-06 17:09:37 +01:00
  • 6df8b3b5d1 More constness i nmsg40::gotSummary() Ivan Skytte Jørgensen 2017-02-06 16:50:58 +01:00
  • e9280ec1c2 Remove master branch limitation for travis. We probably want to build when .travis.yml file is present Ai Lin Chia 2017-02-06 16:18:57 +01:00
  • 9c97fbd0cc Revert "new UrlBlockList temporarily disabled". Ai Lin Chia 2017-02-06 15:54:04 +01:00
  • 728046f472 Make sure warning suppression work with custom version of g++ (eg: on travis) Ai Lin Chia 2017-02-06 15:44:27 +01:00
  • 5c329eed42 Fix makefile to allow for custom CXX in version check Ai Lin Chia 2017-02-06 15:26:27 +01:00
  • b28970555f Make travis use g++5 Ai Lin Chia 2017-02-06 15:20:09 +01:00
  • b27ec95e30 Use shrink_to_fit instead Ai Lin Chia 2017-02-06 15:09:12 +01:00
  • 1470188e18 Removed hardcoded filters in msg40 for URLs with special sequences in them. Ivan Skytte Jørgensen 2017-02-06 15:18:48 +01:00
  • aa91a1b495 Put <a> href value in quotes Ivan Skytte Jørgensen 2017-02-06 15:17:40 +01:00
  • 589c450fcb bugfix cdataEncode() to handle inputs shorter than 3 bytes Ivan Skytte Jørgensen 2017-02-06 15:07:17 +01:00
  • 33267b6ad2 Simplify logic in SafeBuf::safeUtf8ToJSON() by using a switch instead of series of if() Ivan Skytte Jørgensen 2017-02-06 12:30:11 +01:00
  • 41904c3a49 Avoid nulterm hack Ivan Skytte Jørgensen 2017-02-06 12:19:09 +01:00
  • d5424a997f Try to use g++ 4.9 for travis Ai Lin Chia 2017-02-06 11:48:14 +01:00
  • 18adcc4602 Add unit test for empty cookie attribute value Ai Lin Chia 2017-02-06 11:25:44 +01:00
  • 09875a62e0 Avoid nulterm hack, again Ivan Skytte Jørgensen 2017-02-06 11:19:05 +01:00
  • c46322d808 Avoid nulterm hack Ivan Skytte Jørgensen 2017-02-06 11:17:24 +01:00
  • fd1b7d15c4 Merge branch 'master' into nomerge2 Ivan Skytte Jørgensen 2017-02-05 23:18:50 +01:00
  • 830a76bc45 Use :: operator for static methods Ivan Skytte Jørgensen 2017-02-05 23:08:30 +01:00
  • 489f401966 Remvoed this!=NULL check in LinkInfo::getNextInlink() Ivan Skytte Jørgensen 2017-02-05 22:52:07 +01:00
  • a6d4e936fa Make return statement clearer Ivan Skytte Jørgensen 2017-02-05 22:37:37 +01:00
  • b7e55c6609 Removed unused parameter 'coll' from makeLinkInfo() Ivan Skytte Jørgensen 2017-02-05 22:36:55 +01:00
  • 004eb2003a constness in Linkdb and Msg25 Ivan Skytte Jørgensen 2017-02-05 22:33:57 +01:00
  • 4eb1d01361 make double->unsigned conversion explicit Ivan Skytte Jørgensen 2017-02-05 19:21:33 +01:00
  • 4d71850394 Prtect RdbBuckets with a mutex Ivan Skytte Jørgensen 2017-02-03 16:42:44 +01:00
  • 5377c56fcf Dropped local shortcut variable Ivan Skytte Jørgensen 2017-02-03 14:46:03 +01:00
  • 5268f4e9f5 Simplify logic a bit about streaming results in Msg40::gotSummary() Ivan Skytte Jørgensen 2017-02-03 14:41:12 +01:00
  • 9f1dbd51a1 Dropped local shortcut variable Ivan Skytte Jørgensen 2017-02-03 14:34:13 +01:00
  • 3cc393e1b8 Simplify logic a bit about streaming results in Msg40::gotSummary() Ivan Skytte Jørgensen 2017-02-03 14:12:48 +01:00
  • 8741e39bf0 Fix indentation Ivan Skytte Jørgensen 2017-02-03 14:03:29 +01:00
  • a9d3ed48fb Removed rambling and mostly incorrect comment Ivan Skytte Jørgensen 2017-02-03 13:43:28 +01:00
  • 643ca7e307 Dropped local shortcut variable Ivan Skytte Jørgensen 2017-02-03 13:41:01 +01:00
  • b839c912fb Don't check for m_si==NULL in Msg40::gotSummary() Ivan Skytte Jørgensen 2017-02-03 13:31:26 +01:00
  • d5e5d101ef Dropped local shortcut variable Ivan Skytte Jørgensen 2017-02-03 14:46:03 +01:00
  • 959ffbee0a Simplify logic a bit about streaming results in Msg40::gotSummary() Ivan Skytte Jørgensen 2017-02-03 14:41:12 +01:00
  • 285f8acf93 Dropped local shortcut variable Ivan Skytte Jørgensen 2017-02-03 14:34:13 +01:00
  • d5777114f3 Simplify logic a bit about streaming results in Msg40::gotSummary() Ivan Skytte Jørgensen 2017-02-03 14:12:48 +01:00
  • ef22f6eb86 Fix indentation Ivan Skytte Jørgensen 2017-02-03 14:03:29 +01:00
  • 5d6952d564 added tracelog to RdbBuckets after tracing down memmove problem Brian Rasmusson 2017-02-03 14:01:26 +01:00
  • 75cf382717 Removed rambling and mostly incorrect comment Ivan Skytte Jørgensen 2017-02-03 13:43:28 +01:00