Commit Graph

598 Commits

Author SHA1 Message Date
81f72b13e0 Encapsulate RdbTree 2016-08-31 11:13:31 +02:00
af48ba7e17 Remove always true dedup from Rdb::init 2016-08-31 11:13:31 +02:00
b0d66c7eb4 Remove always true isTreeBalanced from Rdb::init 2016-08-31 11:13:31 +02:00
dcb2977f3c Remove unused m_preloadCache, m_biasDiskPageCache from Rdb 2016-08-31 11:13:31 +02:00
01c655dd5b Remove unused loadFromDiskCache & pc from Rdb::init 2016-08-31 11:13:31 +02:00
230e552393 Remove unused maxCacheMem & maxCacheNodes from Rdb::init 2016-08-31 11:13:31 +02:00
e849524fa4 constness; and removed unused functions in Spider.* 2016-08-25 16:50:18 +02:00
08ecbe08da make declaration+definition of static functions in Spider.cpp consistent 2016-08-25 16:42:35 +02:00
cbe99cbbb4 put State11 in anonymous namespace 2016-08-25 16:40:31 +02:00
171916b0f5 Removed unused function doesStringContainPattern() 2016-08-16 00:13:10 +02:00
13840a06f7 Add constness to some tld/domain/url functions 2016-08-11 12:58:51 +02:00
a9f1d37cbc #include cleanup of PingServer.h (XmlDoc.h was including it for no reason) 2016-08-10 13:07:40 +02:00
5c73e9e611 More thread safety: gmtime() -> gmtime_r(), asctime() -> asctime_r() 2016-08-09 16:44:39 +02:00
e6f510c594 Remvoed Msg5::m_addToCache (and associated parameter in getList() 2016-08-04 12:36:38 +02:00
4e4e3371e6 Log function will now return void instead of a boolean 2016-08-01 18:12:10 +02:00
8e4254f52c Removed default parameter values from Msg5::getList() (variant ) 2016-08-01 13:49:48 +02:00
d6796099bb Removed default parameter values from Msg5::getList() (variant ) 2016-08-01 13:34:10 +02:00
3c2773a928 Removed gbstrlen()
gbstrlen() just checked for NULL and called gbshutdownAbort(). Dereferncing NULL on mordern platforms cases a SIGSEGV which is cought by our signal handler and .... gbshutdownAbort() is called. So gbstrlen() was superfluous and complicated static analysis.
2016-07-28 17:04:35 +02:00
27d90d89d1 Fix compilation error 2016-07-26 15:30:58 +02:00
907bd15003 Only cleanup spiderdb when we're compiling with PRIVACORE_SAFE_VERSION (tmp code to clear out unwanted tld) 2016-07-26 15:19:01 +02:00
89a247c110 Modify log line to be more similar so we can grep all url that are unwanted for indexing 2016-07-26 15:19:01 +02:00
37d203f540 Add privacore tld blacklist 2016-07-26 15:19:01 +02:00
fd6e8cbb21 Remove more qa specific code 2016-07-26 15:19:01 +02:00
35d4becbbc Remove g_stats.addSpiderPoint and related spider statistics 2016-07-25 17:03:51 +02:00
e9eee473dc Remove diffbot specific code. -> maxcrawl, maxprocess, crawl pattern, crawl done notification, etc... 2016-07-22 13:59:55 +02:00
594c6194fd Remove unused regex structure & related safebuf 2016-07-22 13:59:55 +02:00
570838d7f6 Remove m_isCustomCrawl & logic surrounding it 2016-07-22 13:59:55 +02:00
14c0f874dd Better encapsulation of Msg5 2016-07-22 13:40:55 +02:00
d9b0e10335 #include cleanup 2016-07-11 12:48:53 +02:00
df8e9f9d17 Remove code to clean out existing spiderdb records that should not be inserted anymore. Spiderdb on all shards have been merged. 2016-07-08 15:51:40 +02:00
d6da804e5d Remove some EDIFFBOT* errors. g_errno is never set to them 2016-06-29 16:48:50 +02:00
fad7c45282 Remove Msg12 2016-06-29 12:34:23 +02:00
76dc08637f Code style changes 2016-06-29 12:05:02 +02:00
81e3af7cc3 Fix compilation warnings 2016-06-22 15:22:40 +02:00
f3f5eefcb6 First batch of changes streamlining emergency shutdown code 2016-06-20 12:30:26 +02:00
62df7d2808 Remove diffbot specific matchesucp & matchesupp url filters 2016-06-10 14:16:27 +02:00
4b5320a98a Rename lists to a more accurate name (spiderRequests) 2016-06-09 18:11:01 +02:00
491a79ecae Remove commented out code 2016-06-09 16:28:13 +02:00
a17f6b38f1 Simplify logic (remove goto) 2016-06-09 16:16:49 +02:00
6fc8f508cd Replace class Link with stl list 2016-06-09 16:04:08 +02:00
dd9f4063fd Code style changes & remove commented out code 2016-06-09 11:14:44 +02:00
4ef94d7e28 Clean up unwanted existing spiderdb records (that are not inserted anymore) 2016-06-08 16:12:59 +02:00
3d29f7ac87 Remove isnewoutlink 2016-06-08 12:09:39 +02:00
994e0777b5 Remove redundent ismedia url filter. we're not inserting media url anymore into spiderdb 2016-06-08 11:32:55 +02:00
7e432a14a1 Remove menuoutlink 2016-06-07 23:15:17 +02:00
4c99ef3d2e Remove wasparentsindexed 2016-06-07 23:07:20 +02:00
52f7bd856a Remove parentlang support from url filters 2016-06-07 13:35:05 +02:00
3f234a483d Remove samedom/samesite/samehost 2016-06-07 13:35:05 +02:00
dbfd6c2e68 Remove support for isparentpingserver from url filters. 2016-06-03 16:31:30 +02:00
e10bd93482 Remove support for isparentsitemap from url filters. 2016-06-03 16:31:29 +02:00