Commit Graph

  • 7d812ceb84 Remove all ending whitespace (including line break) Ai Lin Chia 2017-10-02 11:50:57 +02:00
  • cb062e1a0a Fix bug where ending spaces are not stripped Ai Lin Chia 2017-10-02 11:18:50 +02:00
  • 59aa23f133 avoid buffer overrun in getPhrase Brian Rasmusson 2017-10-01 18:04:56 +02:00
  • 4af9c0c586 We still need to sort to use set_difference Ai Lin Chia 2017-09-29 14:55:15 +02:00
  • 68817afae4 Separate differences whether it's old / new Ai Lin Chia 2017-09-29 14:48:51 +02:00
  • bb400aafcb Fix url unit test Ai Lin Chia 2017-09-29 11:57:01 +02:00
  • 0aead90647 Use two specific versions to generate links to avoid dumping links for parameters that are stripped in v124 Ai Lin Chia 2017-09-29 11:35:30 +02:00
  • 2dfe26e5a3 Dump the difference between two versions Ai Lin Chia 2017-09-29 11:29:53 +02:00
  • fed73f1bf8 First implementation of dump_badlinks (doesn't actually check for bad links yet. only dump all links) Ai Lin Chia 2017-09-29 10:50:03 +02:00
  • af7d835b50 Fix coverity warning (Resource leak) Ai Lin Chia 2017-09-28 17:35:20 +02:00
  • 4bd1ecf25a Merge branch 'dev-url' Ai Lin Chia 2017-09-28 17:16:07 +02:00
  • 4649553635 Make stripping of tabs/cr/lf works Ai Lin Chia 2017-09-28 17:03:32 +02:00
  • 1e71ec1201 Removed defunct member SpiderRequest::m_parentPrevSpiderTime Ivan Skytte Jørgensen 2017-09-28 16:21:54 +02:00
  • 8d1aeb55d3 Removed SpiderRequest/SpierReply::m_isPingServer Ivan Skytte Jørgensen 2017-09-28 16:17:28 +02:00
  • a5c1cfaab0 Removed SpiderRequest::m_isWWWSubdomain Ivan Skytte Jørgensen 2017-09-28 16:08:29 +02:00
  • 358fe1ec56 Fix segfault from previous commit Ai Lin Chia 2017-09-28 16:06:04 +02:00
  • 469e214920 Strip tabs & newline from Url Ai Lin Chia 2017-09-28 15:48:55 +02:00
  • 539657d785 Enable Url validity check Ai Lin Chia 2017-09-28 12:12:59 +02:00
  • 0bc327ac51 Fix memory leak from RdbCache Ai Lin Chia 2017-09-28 12:06:06 +02:00
  • 8c333894b3 Fix valgrind error of reading uninitialized bytes (struct padding) Ai Lin Chia 2017-09-28 11:25:10 +02:00
  • bdc123190b Fix error from AddressSanitizer: stack-buffer-overflow Ai Lin Chia 2017-09-27 13:55:33 +02:00
  • 0fcf73cc70 Use serializeMsg/deserializeMsg instead Ai Lin Chia 2017-09-27 12:32:51 +02:00
  • b67384ddae First implementation of adding error code to HTTP response cache Ai Lin Chia 2017-09-27 12:02:34 +02:00
  • fc2bf1a205 Add Url validity check (currently disabled) Ai Lin Chia 2017-09-26 22:24:42 +02:00
  • 4d12dcc85d Moved Rdb initialization from main to Rdb.cpp in new function initialiseAllPrimaryRdbs() Ivan Skytte Jørgensen 2017-09-26 16:29:49 +02:00
  • d03334fd1b Removed 'h9' local variable Ivan Skytte Jørgensen 2017-09-26 16:03:03 +02:00
  • d7d38b8347 Removed commented-out code from main() Ivan Skytte Jørgensen 2017-09-26 16:00:08 +02:00
  • 4000f250da Removed static variable 's_recoveryLevel' from main.cpp Ivan Skytte Jørgensen 2017-09-26 15:42:26 +02:00
  • c86036d60a Abort if cmdline command is unknown Ivan Skytte Jørgensen 2017-09-26 15:36:30 +02:00
  • 898c37d1ed More stuff for dumpcsv Ivan Skytte Jørgensen 2017-09-26 15:05:10 +02:00
  • 1676365bb9 Fix gcc version check for fedora Ai Lin Chia 2017-09-26 13:39:42 +02:00
  • d07ce5c4df Made spider status ¤defines into strongly typed enum Ivan Skytte Jørgensen 2017-09-26 13:29:36 +02:00
  • 0e9cec3133 Changed getSpiderStatusMsg() from taking a SafeBuf* to a simple char** Ivan Skytte Jørgensen 2017-09-25 17:17:08 +02:00
  • fb691c97b1 Moved definition of UrlLock structure from Spider.h to SpiderLoop.cpp Ivan Skytte Jørgensen 2017-09-25 17:04:42 +02:00
  • 9a05e19e63 dumpcsv: m_isPingServer was pritned twice Ivan Skytte Jørgensen 2017-09-25 15:35:33 +02:00
  • 22c6f52d4c Implemented 'gb dumpcsv s' Ivan Skytte Jørgensen 2017-09-25 15:20:57 +02:00
  • 469e2f1559 Made UrlRealtimeClassification finalization slightly safer Ivan Skytte Jørgensen 2017-09-25 14:16:50 +02:00
  • 676abf6650 Stop coutndoains() from coredumping (still doesn't work though) Ivan Skytte Jørgensen 2017-09-25 13:48:34 +02:00
  • 70bc95d0c9 Simplified looping in dumpSpiderDb() Ivan Skytte Jørgensen 2017-09-25 12:50:27 +02:00
  • 3bdd79188d unwrap log lines in RdbCache.cp so grepping for log text is easier Ivan Skytte Jørgensen 2017-09-22 16:47:10 +02:00
  • c4eff1c12c Show count of wiatingtree nodes with spidertimems==0 Ivan Skytte Jørgensen 2017-09-22 16:22:24 +02:00
  • 4381f6a417 Set explicit dependency for Pages.o because the .d file is generated during compilation Ivan Skytte Jørgensen 2017-09-22 14:01:15 +02:00
  • 524d4c0d91 Fix GCC version checks. Was resultign int 'syntax error' from bc Ivan Skytte Jørgensen 2017-09-22 13:58:26 +02:00
  • fa61eea9d3 Changes spiderdblookup to use a bit if css. One step at a time Ivan Skytte Jørgensen 2017-09-21 17:18:49 +02:00
  • 3ca38a8cbb Removed old compatibility paths (/index.php, /cgi/0.cg, /search.csv, ...) Ivan Skytte Jørgensen 2017-09-21 16:32:02 +02:00
  • faa2b8777e Removed width limit on warnign boxes so they don't take up so much vertical space Ivan Skytte Jørgensen 2017-09-21 16:20:22 +02:00
  • 47383aa267 Fix bug with checking for about: javascript: urls Ai Lin Chia 2017-09-21 16:07:43 +02:00
  • f2a0aec0f3 Fix length, we should compare 11 instead of 6 Ai Lin Chia 2017-09-21 16:00:19 +02:00
  • b85c30180c Added more trace log to track down where m_prevErrorCode=55 comes from when requesting spdierdb reqcords remotely Ivan Skytte Jørgensen 2017-09-21 15:49:51 +02:00
  • a48c70ec4a fix commented-out safePrintf (random hotkey in kdevelop?) Ivan Skytte Jørgensen 2017-09-21 15:22:27 +02:00
  • cf95663c67 Code style changes Ai Lin Chia 2017-09-21 14:35:03 +02:00
  • 17d8b232d9 Skip expansion of iframe with src starting with about: & javascript: Ai Lin Chia 2017-09-21 13:58:34 +02:00
  • 5c062d5249 admui:spiderdblookup: don't show mm_discoveryTime (always garbage). Show more flag fields Ivan Skytte Jørgensen 2017-09-21 14:33:59 +02:00
  • dba5f9f2b4 href= value should be in quotes Ivan Skytte Jørgensen 2017-09-21 14:21:02 +02:00
  • de074de6c7 admui:spiderdblookup: show shard and host number Ivan Skytte Jørgensen 2017-09-21 13:52:32 +02:00
  • 85ca55c5bd Use dumpfullversion only for g++ 7 Ai Lin Chia 2017-09-21 12:17:31 +02:00
  • e98d4cb022 Use fullversion instead of version Ai Lin Chia 2017-09-21 12:07:08 +02:00
  • e6d1711e07 Clear ptr passed in to avoid using old value Ai Lin Chia 2017-09-20 16:58:59 +02:00
  • 5d313d04b9 We don't need to call reset after declaring SpiderRequest/SpiderReply. It's now called in the constructor Ai Lin Chia 2017-09-20 12:31:56 +02:00
  • fb71be8f25 Use KEYNEG instead of checking manually Ai Lin Chia 2017-09-20 12:07:32 +02:00
  • 69d0f06077 Code style changes Ai Lin Chia 2017-09-20 12:07:15 +02:00
  • 8c65d3c4f4 Enable more warnings for g++ 7.1 Ai Lin Chia 2017-09-20 10:50:51 +02:00
  • b5f0691840 Merge conditions Ai Lin Chia 2017-09-19 15:59:25 +02:00
  • e66a473bcb spiderdb-lookup: format timestamps more nicely Ivan Skytte Jørgensen 2017-09-19 16:03:45 +02:00
  • cf7257b53b Enable install of file to a range of hosts Ai Lin Chia 2017-09-19 15:53:34 +02:00
  • 2349c8edf1 msg0 should not treat spiderdb reads as local if the host is not a spider host Ivan Skytte Jørgensen 2017-09-19 15:45:48 +02:00
  • d99815d902 Fix double-free in spiderdb-lookup Ivan Skytte Jørgensen 2017-09-19 15:30:24 +02:00
  • 6b97482e77 First shot at spiderdb-lookup UI Ivan Skytte Jørgensen 2017-09-19 15:11:32 +02:00
  • c1b2d98a9b Added ovlerloads to Spiderdb::makeFirstKey/makeLastKey() so urlHash28 can be specified Ivan Skytte Jørgensen 2017-09-19 13:59:08 +02:00
  • 561c18b54f #include cleanup in Tagdb.h Ivan Skytte Jørgensen 2017-09-19 13:03:52 +02:00
  • a416c88cb9 tag: more const Ivan Skytte Jørgensen 2017-09-19 13:00:50 +02:00
  • b8abe1b367 We don't support comments for UrlMatchHostList Ai Lin Chia 2017-09-19 12:51:33 +02:00
  • 4dcc57ebb6 More logs Ai Lin Chia 2017-09-19 12:49:35 +02:00
  • 2d5155d9a0 Remvoed unused SiteGetter::m_timestamp Ivan Skytte Jørgensen 2017-09-19 12:41:23 +02:00
  • eea0db20fd Don't force a merge when we're resuming. It will cause a second unnecessary merge to start Ai Lin Chia 2017-09-19 12:31:02 +02:00
  • 62fbbd432e Modify log lines Ai Lin Chia 2017-09-19 12:30:47 +02:00
  • 838645b9ae Don't expand iframe url if src is blocked Ai Lin Chia 2017-09-19 12:13:18 +02:00
  • bc80ae86cd Remove config files after test Ai Lin Chia 2017-09-19 11:52:37 +02:00
  • 681b89610b Use pthread_cond_timedwait instead of sleep so we can shutdown without waiting for sleep to complete Ai Lin Chia 2017-09-19 11:19:14 +02:00
  • 02a9cef7be Don't index words between iframe tags Ai Lin Chia 2017-09-18 18:56:46 +02:00
  • 958dfb0df1 Add SpiderdbHostDelete feature Ai Lin Chia 2017-09-18 16:20:12 +02:00
  • ec5f57908b Increase Mem::m_memtablesize (we're running out of slots when using std::set or sparse_hash_set with large dataset) Ai Lin Chia 2017-09-18 13:14:21 +02:00
  • 88db632af9 Make s_lastModifiedTime static, as it should have been Ai Lin Chia 2017-09-18 13:13:57 +02:00
  • 9a44b558cd Remove unused include Ai Lin Chia 2017-09-15 16:09:26 +02:00
  • 7fb8e24200 Add sparsepp as third-party project Ai Lin Chia 2017-09-15 16:08:55 +02:00
  • f044dade0a Removed unused memebrs from WebPage Ivan Skytte Jørgensen 2017-09-18 16:06:18 +02:00
  • 4d1c7844e9 Made WebPage:m_use_post more typesafe, and found a bug/missing code by that Ivan Skytte Jørgensen 2017-09-18 15:45:10 +02:00
  • a49354f6d2 Removed obsolete comment Ivan Skytte Jørgensen 2017-09-18 15:27:07 +02:00
  • d3b705b82e Removed 'datedb date' from titledb dump page Ivan Skytte Jørgensen 2017-09-18 14:57:58 +02:00
  • f789228d98 Removed a few unused css styles Ivan Skytte Jørgensen 2017-09-18 14:47:26 +02:00
  • aad1cc71f0 Removed last traces of name+token support (used by crawlbot/diffbot) Ivan Skytte Jørgensen 2017-09-18 14:35:28 +02:00
  • f15a2838bb Moved httprequest::getString() call in Collectiondb::getDefaultColl() to caller point Ivan Skytte Jørgensen 2017-09-18 14:27:27 +02:00
  • f3143660ec aarrghhh. why is git pedantic about merges Ivan Skytte Jørgensen 2017-09-15 14:49:01 +02:00
  • 3c6e49e3c6 Fix potential infiniteloop when multiple canonical link is found Ai Lin Chia 2017-09-15 12:05:51 +02:00
  • 22f9f84ec8 Make sure we store empty document for simplified redirect & non-canonical urls Ai Lin Chia 2017-09-14 13:53:20 +02:00
  • 3414b9e0b6 We're only canonical url if it's present Ai Lin Chia 2017-09-14 10:58:53 +02:00
  • c0d36f9ec0 Rename EDOCBLOCKEDSHLICONTENT to EDOCBLOCKEDSHLIBCONTENT Ai Lin Chia 2017-09-12 16:45:20 +02:00
  • 1aa397212e Various bug fixes on canonical url - canonical url with base url - canonical url that redirects Ai Lin Chia 2017-09-12 16:37:28 +02:00
  • 6e55e99389 Fix left-over debug log for shlib content blocking Ivan Skytte Jørgensen 2017-09-12 16:27:53 +02:00
  • 7b6ba45c27 wantedcheck shlib: check single content, example with cellery Ivan Skytte Jørgensen 2017-09-12 16:24:40 +02:00