81f72b13e0
Encapsulate RdbTree
2016-08-31 11:13:31 +02:00
af48ba7e17
Remove always true dedup from Rdb::init
2016-08-31 11:13:31 +02:00
b0d66c7eb4
Remove always true isTreeBalanced from Rdb::init
2016-08-31 11:13:31 +02:00
dcb2977f3c
Remove unused m_preloadCache, m_biasDiskPageCache from Rdb
2016-08-31 11:13:31 +02:00
01c655dd5b
Remove unused loadFromDiskCache & pc from Rdb::init
2016-08-31 11:13:31 +02:00
230e552393
Remove unused maxCacheMem & maxCacheNodes from Rdb::init
2016-08-31 11:13:31 +02:00
e849524fa4
constness; and removed unused functions in Spider.*
2016-08-25 16:50:18 +02:00
08ecbe08da
make declaration+definition of static functions in Spider.cpp consistent
2016-08-25 16:42:35 +02:00
cbe99cbbb4
put State11 in anonymous namespace
2016-08-25 16:40:31 +02:00
171916b0f5
Removed unused function doesStringContainPattern()
2016-08-16 00:13:10 +02:00
13840a06f7
Add constness to some tld/domain/url functions
2016-08-11 12:58:51 +02:00
a9f1d37cbc
#include cleanup of PingServer.h (XmlDoc.h was including it for no reason)
2016-08-10 13:07:40 +02:00
5c73e9e611
More thread safety: gmtime() -> gmtime_r(), asctime() -> asctime_r()
2016-08-09 16:44:39 +02:00
e6f510c594
Remvoed Msg5::m_addToCache (and associated parameter in getList()
2016-08-04 12:36:38 +02:00
4e4e3371e6
Log function will now return void instead of a boolean
2016-08-01 18:12:10 +02:00
8e4254f52c
Removed default parameter values from Msg5::getList() (variant #2 )
2016-08-01 13:49:48 +02:00
d6796099bb
Removed default parameter values from Msg5::getList() (variant #1 )
2016-08-01 13:34:10 +02:00
3c2773a928
Removed gbstrlen()
...
gbstrlen() just checked for NULL and called gbshutdownAbort(). Dereferncing NULL on mordern platforms cases a SIGSEGV which is cought by our signal handler and .... gbshutdownAbort() is called. So gbstrlen() was superfluous and complicated static analysis.
2016-07-28 17:04:35 +02:00
27d90d89d1
Fix compilation error
2016-07-26 15:30:58 +02:00
907bd15003
Only cleanup spiderdb when we're compiling with PRIVACORE_SAFE_VERSION (tmp code to clear out unwanted tld)
2016-07-26 15:19:01 +02:00
89a247c110
Modify log line to be more similar so we can grep all url that are unwanted for indexing
2016-07-26 15:19:01 +02:00
37d203f540
Add privacore tld blacklist
2016-07-26 15:19:01 +02:00
fd6e8cbb21
Remove more qa specific code
2016-07-26 15:19:01 +02:00
35d4becbbc
Remove g_stats.addSpiderPoint and related spider statistics
2016-07-25 17:03:51 +02:00
e9eee473dc
Remove diffbot specific code. -> maxcrawl, maxprocess, crawl pattern, crawl done notification, etc...
2016-07-22 13:59:55 +02:00
594c6194fd
Remove unused regex structure & related safebuf
2016-07-22 13:59:55 +02:00
570838d7f6
Remove m_isCustomCrawl & logic surrounding it
2016-07-22 13:59:55 +02:00
14c0f874dd
Better encapsulation of Msg5
2016-07-22 13:40:55 +02:00
d9b0e10335
#include cleanup
2016-07-11 12:48:53 +02:00
df8e9f9d17
Remove code to clean out existing spiderdb records that should not be inserted anymore. Spiderdb on all shards have been merged.
2016-07-08 15:51:40 +02:00
d6da804e5d
Remove some EDIFFBOT* errors. g_errno is never set to them
2016-06-29 16:48:50 +02:00
fad7c45282
Remove Msg12
2016-06-29 12:34:23 +02:00
76dc08637f
Code style changes
2016-06-29 12:05:02 +02:00
81e3af7cc3
Fix compilation warnings
2016-06-22 15:22:40 +02:00
f3f5eefcb6
First batch of changes streamlining emergency shutdown code
2016-06-20 12:30:26 +02:00
62df7d2808
Remove diffbot specific matchesucp & matchesupp url filters
2016-06-10 14:16:27 +02:00
4b5320a98a
Rename lists to a more accurate name (spiderRequests)
2016-06-09 18:11:01 +02:00
491a79ecae
Remove commented out code
2016-06-09 16:28:13 +02:00
a17f6b38f1
Simplify logic (remove goto)
2016-06-09 16:16:49 +02:00
6fc8f508cd
Replace class Link with stl list
2016-06-09 16:04:08 +02:00
dd9f4063fd
Code style changes & remove commented out code
2016-06-09 11:14:44 +02:00
4ef94d7e28
Clean up unwanted existing spiderdb records (that are not inserted anymore)
2016-06-08 16:12:59 +02:00
3d29f7ac87
Remove isnewoutlink
2016-06-08 12:09:39 +02:00
994e0777b5
Remove redundent ismedia url filter. we're not inserting media url anymore into spiderdb
2016-06-08 11:32:55 +02:00
7e432a14a1
Remove menuoutlink
2016-06-07 23:15:17 +02:00
4c99ef3d2e
Remove wasparentsindexed
2016-06-07 23:07:20 +02:00
52f7bd856a
Remove parentlang support from url filters
2016-06-07 13:35:05 +02:00
3f234a483d
Remove samedom/samesite/samehost
2016-06-07 13:35:05 +02:00
dbfd6c2e68
Remove support for isparentpingserver from url filters.
2016-06-03 16:31:30 +02:00
e10bd93482
Remove support for isparentsitemap from url filters.
2016-06-03 16:31:29 +02:00