Commit Graph

248 Commits

Author SHA1 Message Date
Ivan Skytte Jørgensen
f6e8d992ae Moved serialization/deserialization functions to separate file 2018-07-30 13:07:02 +02:00
Ivan Skytte Jørgensen
beeddcf35d Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Ivan Skytte Jørgensen
53b9973d2f Changed calls to gbmemcpy() where it was obvious if memcpy or memmove were applicable 2018-07-26 16:19:54 +02:00
Ai Lin Chia
81cd4ac0be Don't retry again if we've already tried it once 2018-06-03 21:34:49 +02:00
Ai Lin Chia
822ef732c9 Log both original url & redirected url 2018-06-03 21:28:04 +02:00
Ai Lin Chia
c4a4fe47b2 Fix segfault when ts is null 2018-06-01 20:26:03 +02:00
Ai Lin Chia
850b7962f8 Log url that is being 'detected' to be retried to proxy 2018-05-31 16:55:09 +02:00
Ai Lin Chia
07e41027da More fixes for redirects proxy 2018-05-31 15:57:46 +02:00
Ai Lin Chia
c206e0c191 When using getLocation base url needs to be set 2018-05-31 15:29:00 +02:00
Ai Lin Chia
253a7cf736 Add msg when returning true 2018-05-31 12:46:44 +02:00
Ai Lin Chia
dcb7aa46ee Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
Ai Lin Chia
a990000fe4 Retry when redirected url match urlretryproxylist 2018-05-31 11:04:42 +02:00
Ai Lin Chia
c6c4c2ba2a Add urlproxylist.txt to decide if a url should be spidered using a proxy or not 2018-05-30 12:21:24 +02:00
Ivan Skytte Jørgensen
9dee894cae bugfix: Use mfree() isntead of delete for copied-out data from RdbCache 2018-05-18 16:38:51 +02:00
Ai Lin Chia
cc0613481f Remove hop count (not stored in sqlite based spiderdb) 2018-02-02 15:50:03 +01:00
Ai Lin Chia
02a127a5d2 Add check to see if ts is nullptr before using it 2018-01-26 16:38:11 +01:00
Ivan Skytte Jørgensen
9a5b52fa8a Removed superfluous null-check (coverity) 2017-12-01 16:41:47 +01:00
Ai Lin Chia
3058981dae Make sure EDOCBADCONTENTTYPE doesn't return as EDOCTOOBIG 2017-11-20 15:22:47 +01:00
Ai Lin Chia
c3fe374641 Initial bug fix of too big non html doc 2017-11-19 22:32:19 +01:00
Ai Lin Chia
563670d76f Remove commented out code 2017-11-19 16:09:40 +01:00
Ivan Skytte Jørgensen
32d68c0322 Removed 'dataKeySize' parameter from RdbCache::init() 2017-10-27 16:35:18 +02:00
Ivan Skytte Jørgensen
0c81208e1c RdbCache: removed default values for 3 parameters to init()
The default values made it difficult to see where changes have an affect.
2017-10-27 16:29:38 +02:00
Ivan Skytte Jørgensen
96c749d79b Removed unused 'useHalfKeys' parameter+member from RdbCache 2017-10-20 15:03:05 +02:00
Ivan Skytte Jørgensen
59b28df56e Removed unused 'supportLists' parameter+member from RdbCache 2017-10-20 14:41:16 +02:00
Brian Rasmusson
3f48a6e425 fix crash when viewing Page Info in a setup with no spider hosts. getHostIdWithSpideringEnabled shut down if no spider host was found - now a new function param specifies if a spider host is required or not 2017-10-09 11:25:17 +02:00
Ai Lin Chia
0bc327ac51 Fix memory leak from RdbCache 2017-09-28 12:06:06 +02:00
Ai Lin Chia
8c333894b3 Fix valgrind error of reading uninitialized bytes (struct padding) 2017-09-28 11:25:10 +02:00
Ai Lin Chia
0fcf73cc70 Use serializeMsg/deserializeMsg instead 2017-09-27 12:32:51 +02:00
Ai Lin Chia
b67384ddae First implementation of adding error code to HTTP response cache 2017-09-27 12:02:34 +02:00
Ivan Skytte Jørgensen
dba5f9f2b4 href= value should be in quotes 2017-09-21 14:21:02 +02:00
Ai Lin Chia
5d313d04b9 We don't need to call reset after declaring SpiderRequest/SpiderReply. It's now called in the constructor 2017-09-21 12:11:51 +02:00
Ai Lin Chia
c0d36f9ec0 Rename EDOCBLOCKEDSHLICONTENT to EDOCBLOCKEDSHLIBCONTENT 2017-09-12 16:45:20 +02:00
Ivan Skytte Jørgensen
6e55e99389 Fix left-over debug log for shlib content blocking 2017-09-12 16:27:57 +02:00
Ivan Skytte Jørgensen
7b6ba45c27 wantedcheck shlib: check single content, example with cellery 2017-09-12 16:24:40 +02:00
Ivan Skytte Jørgensen
6a5a1b4f9e Detect Wordfence capthca 2017-09-08 14:12:18 +02:00
Ivan Skytte Jørgensen
72a58f1b2b Keep statistics on crawl bans 2017-09-08 13:18:45 +02:00
Ivan Skytte Jørgensen
d1e8fededa Append detected blocks/captchas to crawlban.* files 2017-09-08 12:28:26 +02:00
Ivan Skytte Jørgensen
7a979593ab Explain crawl-ban better 2017-09-08 12:28:26 +02:00
Ivan Skytte Jørgensen
1bbe5e6d77 Detect Distil networks captcha-blocks 2017-09-07 16:20:06 +02:00
Ivan Skytte Jørgensen
9f1c2f80ac Detect blocked-by-cloudflare and avoid deleting the already-indexed document 2017-09-07 14:21:37 +02:00
Ivan Skytte Jørgensen
99e6b7bdd9 Msg13.cpp: move #includes to top of file 2017-09-05 15:18:13 +02:00
Ai Lin Chia
cc0967510b Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header) 2017-05-30 12:12:32 +02:00
Ai Lin Chia
5ced4d237a Reset Msg13::m_replyBufSize & Msg13::m_replyBufAllocSize when Msg13::m_replyBuf size is set to NULL 2017-05-17 12:31:04 +02:00
Ai Lin Chia
22b567617c Reset m_readBufMaxSize & m_readBufSize whenever m_readBuf is set to NULL 2017-05-17 12:20:54 +02:00
Ivan Skytte Jørgensen
7db9c4354e Removed non-renetrant version of iptoa()
Mass-change. Many places it could have been done in a better way (eg. calculate nice name for UdpSlot peer once and not for every log line).
2017-05-10 17:54:00 +02:00
Ivan Skytte Jørgensen
45ad44939a Catch std::bad_alloc and not '...' 2017-05-07 20:51:33 +02:00
Ai Lin Chia
0f0e92ea0f Fix infinite loop in commit fb1e1ac611 2017-04-12 16:46:29 +02:00
Ivan Skytte Jørgensen
4027028bbd Fix comments where an acient mass-replace had chagned 'int' and 'long' to int32_t in comments 2017-04-11 14:18:45 +02:00
Ivan Skytte Jørgensen
fb1e1ac611 goto -> for() 2017-04-11 14:12:45 +02:00
Ivan Skytte Jørgensen
f69b76bb20 goto -> for() 2017-04-11 13:36:05 +02:00