Ivan Skytte Jørgensen
|
f6e8d992ae
|
Moved serialization/deserialization functions to separate file
|
2018-07-30 13:07:02 +02:00 |
|
Ivan Skytte Jørgensen
|
beeddcf35d
|
Got rid of gb-include.h
|
2018-07-26 17:29:51 +02:00 |
|
Ivan Skytte Jørgensen
|
53b9973d2f
|
Changed calls to gbmemcpy() where it was obvious if memcpy or memmove were applicable
|
2018-07-26 16:19:54 +02:00 |
|
Ai Lin Chia
|
81cd4ac0be
|
Don't retry again if we've already tried it once
|
2018-06-03 21:34:49 +02:00 |
|
Ai Lin Chia
|
822ef732c9
|
Log both original url & redirected url
|
2018-06-03 21:28:04 +02:00 |
|
Ai Lin Chia
|
c4a4fe47b2
|
Fix segfault when ts is null
|
2018-06-01 20:26:03 +02:00 |
|
Ai Lin Chia
|
850b7962f8
|
Log url that is being 'detected' to be retried to proxy
|
2018-05-31 16:55:09 +02:00 |
|
Ai Lin Chia
|
07e41027da
|
More fixes for redirects proxy
|
2018-05-31 15:57:46 +02:00 |
|
Ai Lin Chia
|
c206e0c191
|
When using getLocation base url needs to be set
|
2018-05-31 15:29:00 +02:00 |
|
Ai Lin Chia
|
253a7cf736
|
Add msg when returning true
|
2018-05-31 12:46:44 +02:00 |
|
Ai Lin Chia
|
dcb7aa46ee
|
Add g_contentRetryProxyList & rename BlockList to MatchList
|
2018-05-31 12:44:20 +02:00 |
|
Ai Lin Chia
|
a990000fe4
|
Retry when redirected url match urlretryproxylist
|
2018-05-31 11:04:42 +02:00 |
|
Ai Lin Chia
|
c6c4c2ba2a
|
Add urlproxylist.txt to decide if a url should be spidered using a proxy or not
|
2018-05-30 12:21:24 +02:00 |
|
Ivan Skytte Jørgensen
|
9dee894cae
|
bugfix: Use mfree() isntead of delete for copied-out data from RdbCache
|
2018-05-18 16:38:51 +02:00 |
|
Ai Lin Chia
|
cc0613481f
|
Remove hop count (not stored in sqlite based spiderdb)
|
2018-02-02 15:50:03 +01:00 |
|
Ai Lin Chia
|
02a127a5d2
|
Add check to see if ts is nullptr before using it
|
2018-01-26 16:38:11 +01:00 |
|
Ivan Skytte Jørgensen
|
9a5b52fa8a
|
Removed superfluous null-check (coverity)
|
2017-12-01 16:41:47 +01:00 |
|
Ai Lin Chia
|
3058981dae
|
Make sure EDOCBADCONTENTTYPE doesn't return as EDOCTOOBIG
|
2017-11-20 15:22:47 +01:00 |
|
Ai Lin Chia
|
c3fe374641
|
Initial bug fix of too big non html doc
|
2017-11-19 22:32:19 +01:00 |
|
Ai Lin Chia
|
563670d76f
|
Remove commented out code
|
2017-11-19 16:09:40 +01:00 |
|
Ivan Skytte Jørgensen
|
32d68c0322
|
Removed 'dataKeySize' parameter from RdbCache::init()
|
2017-10-27 16:35:18 +02:00 |
|
Ivan Skytte Jørgensen
|
0c81208e1c
|
RdbCache: removed default values for 3 parameters to init()
The default values made it difficult to see where changes have an affect.
|
2017-10-27 16:29:38 +02:00 |
|
Ivan Skytte Jørgensen
|
96c749d79b
|
Removed unused 'useHalfKeys' parameter+member from RdbCache
|
2017-10-20 15:03:05 +02:00 |
|
Ivan Skytte Jørgensen
|
59b28df56e
|
Removed unused 'supportLists' parameter+member from RdbCache
|
2017-10-20 14:41:16 +02:00 |
|
Brian Rasmusson
|
3f48a6e425
|
fix crash when viewing Page Info in a setup with no spider hosts. getHostIdWithSpideringEnabled shut down if no spider host was found - now a new function param specifies if a spider host is required or not
|
2017-10-09 11:25:17 +02:00 |
|
Ai Lin Chia
|
0bc327ac51
|
Fix memory leak from RdbCache
|
2017-09-28 12:06:06 +02:00 |
|
Ai Lin Chia
|
8c333894b3
|
Fix valgrind error of reading uninitialized bytes (struct padding)
|
2017-09-28 11:25:10 +02:00 |
|
Ai Lin Chia
|
0fcf73cc70
|
Use serializeMsg/deserializeMsg instead
|
2017-09-27 12:32:51 +02:00 |
|
Ai Lin Chia
|
b67384ddae
|
First implementation of adding error code to HTTP response cache
|
2017-09-27 12:02:34 +02:00 |
|
Ivan Skytte Jørgensen
|
dba5f9f2b4
|
href= value should be in quotes
|
2017-09-21 14:21:02 +02:00 |
|
Ai Lin Chia
|
5d313d04b9
|
We don't need to call reset after declaring SpiderRequest/SpiderReply. It's now called in the constructor
|
2017-09-21 12:11:51 +02:00 |
|
Ai Lin Chia
|
c0d36f9ec0
|
Rename EDOCBLOCKEDSHLICONTENT to EDOCBLOCKEDSHLIBCONTENT
|
2017-09-12 16:45:20 +02:00 |
|
Ivan Skytte Jørgensen
|
6e55e99389
|
Fix left-over debug log for shlib content blocking
|
2017-09-12 16:27:57 +02:00 |
|
Ivan Skytte Jørgensen
|
7b6ba45c27
|
wantedcheck shlib: check single content, example with cellery
|
2017-09-12 16:24:40 +02:00 |
|
Ivan Skytte Jørgensen
|
6a5a1b4f9e
|
Detect Wordfence capthca
|
2017-09-08 14:12:18 +02:00 |
|
Ivan Skytte Jørgensen
|
72a58f1b2b
|
Keep statistics on crawl bans
|
2017-09-08 13:18:45 +02:00 |
|
Ivan Skytte Jørgensen
|
d1e8fededa
|
Append detected blocks/captchas to crawlban.* files
|
2017-09-08 12:28:26 +02:00 |
|
Ivan Skytte Jørgensen
|
7a979593ab
|
Explain crawl-ban better
|
2017-09-08 12:28:26 +02:00 |
|
Ivan Skytte Jørgensen
|
1bbe5e6d77
|
Detect Distil networks captcha-blocks
|
2017-09-07 16:20:06 +02:00 |
|
Ivan Skytte Jørgensen
|
9f1c2f80ac
|
Detect blocked-by-cloudflare and avoid deleting the already-indexed document
|
2017-09-07 14:21:37 +02:00 |
|
Ivan Skytte Jørgensen
|
99e6b7bdd9
|
Msg13.cpp: move #includes to top of file
|
2017-09-05 15:18:13 +02:00 |
|
Ai Lin Chia
|
cc0967510b
|
Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header)
|
2017-05-30 12:12:32 +02:00 |
|
Ai Lin Chia
|
5ced4d237a
|
Reset Msg13::m_replyBufSize & Msg13::m_replyBufAllocSize when Msg13::m_replyBuf size is set to NULL
|
2017-05-17 12:31:04 +02:00 |
|
Ai Lin Chia
|
22b567617c
|
Reset m_readBufMaxSize & m_readBufSize whenever m_readBuf is set to NULL
|
2017-05-17 12:20:54 +02:00 |
|
Ivan Skytte Jørgensen
|
7db9c4354e
|
Removed non-renetrant version of iptoa()
Mass-change. Many places it could have been done in a better way (eg. calculate nice name for UdpSlot peer once and not for every log line).
|
2017-05-10 17:54:00 +02:00 |
|
Ivan Skytte Jørgensen
|
45ad44939a
|
Catch std::bad_alloc and not '...'
|
2017-05-07 20:51:33 +02:00 |
|
Ai Lin Chia
|
0f0e92ea0f
|
Fix infinite loop in commit fb1e1ac611
|
2017-04-12 16:46:29 +02:00 |
|
Ivan Skytte Jørgensen
|
4027028bbd
|
Fix comments where an acient mass-replace had chagned 'int' and 'long' to int32_t in comments
|
2017-04-11 14:18:45 +02:00 |
|
Ivan Skytte Jørgensen
|
fb1e1ac611
|
goto -> for()
|
2017-04-11 14:12:45 +02:00 |
|
Ivan Skytte Jørgensen
|
f69b76bb20
|
goto -> for()
|
2017-04-11 13:36:05 +02:00 |
|