bad4eac887
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
804d838cd9
Msg13Request: use offsetof() instead of pointer calculations
2017-03-10 12:56:32 +01:00
2788f518c4
#include cleanup in Msg13.*
2016-11-11 15:51:30 +01:00
fd6e8cbb21
Remove more qa specific code
2016-07-26 15:19:01 +02:00
570838d7f6
Remove m_isCustomCrawl & logic surrounding it
2016-07-22 13:59:55 +02:00
4cd54ec39e
Include RdbCache.h only in files where needed. Use fwd-decl elsewhere.
2016-07-18 16:22:06 +02:00
feafdf1dca
Partial fix conversion from string literal to 'char *' for XmlDoc
2016-05-31 11:26:01 +02:00
8dab0db467
Remove extra semicolons
2016-05-19 18:37:26 +02:00
6116fc502f
Remove always true m_forwardDownloadRequest variable
2016-05-10 16:15:32 +02:00
5e75b2c607
Remove unused variable & commented out code
2016-05-10 16:15:32 +02:00
8e559558e4
Marked a portion of global variabels and functions as static
...
Found with Flexelint. Some of them were not used at all.
2016-04-27 11:50:07 +02:00
ef3b8a343b
Removed explicit m_buf[0] from Msg13Request
2016-04-04 13:43:09 +02:00
ab0b9d03ea
Standardize header guards
2016-03-08 22:16:02 +01:00
166f7a80a0
1-bit bit fields must be unsigned
...
Using int32_t or any signed type leads to undefined behaviour.
In gcc's case the possible vales are actually 0 and -1
2016-03-01 14:51:00 +01:00
cd095e66bc
Remove FORMAT_PROCOG & related codes. Remove more scraping code/google detection code
2016-01-05 12:17:17 +01:00
7fcc2ab4e1
in the sockets table page,
...
show url download requests that are queued up to prevent
hammering an ip. also show the first 500 bytes of the send buf
in the http server sockets table.
2015-08-25 09:34:45 -07:00
86800a0656
if a root/seed url has no outlinks, assumed banned.
2015-05-04 14:23:28 -07:00
1825f6bd27
retry download if was in the twitchy table
...
at start of download, and not using proxies at all.
2015-04-30 16:06:13 -07:00
6d8bb19962
checkpoint for auto proxy logic
2015-04-30 13:28:57 -07:00
6fc83566e2
more fixes
2015-02-02 14:06:38 -08:00
c15bd53e52
added support for supplying basic proxy authorization
...
to spider proxies. username:password@1.2.3.4:80
2015-02-02 13:23:38 -08:00
8e315504a2
fix empty rdbcache bug of not enough buf mem.
2014-11-27 13:17:00 -08:00
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
65800b65cf
fix so diffbot doesn't timeout due
...
to large floater/proxy backoff crawl delay.
append &timeout=MAXCRAWLDELAY to diffbot api url.
2014-10-07 14:32:38 -07:00
c2f98a81b6
fix floater bug from reading hashtable off disk.
...
force use floaters if ! useRobots and is diffbot crawl.
2014-09-26 15:30:42 -07:00
6a28250e94
get qa test working after nyt bug fix
2014-08-06 16:00:25 -07:00
947be58f10
Merge branch 'diffbot-testing' into testing
...
Conflicts:
HttpRequest.cpp
Msg13.cpp
XmlDoc.cpp
2014-08-05 17:19:53 -07:00
cc1ceaaac2
fix nyt.com cookie redir bug.
...
fixed bug when POSTing injection request with multipart/form-data.
2014-08-05 17:04:11 -07:00
05fcef9651
more vote infusion and squid proxy fixes.
2014-07-09 14:57:58 -07:00
ea90e7f755
more fixes for sectiondb markup code
2014-06-12 13:05:45 -07:00
7d452a766c
completed squid proxy simulation code
2014-06-09 12:42:05 -07:00
965d992f98
Merge branch 'diffbot-testing' into diffbot-matt
...
Conflicts:
Msg13.cpp
2014-06-06 15:14:41 -07:00
3f2dcda4e1
got new floater/proxy logic compiling.
2014-06-06 15:11:51 -07:00
ce7294e9a9
more mem leak fixes for fake
...
bulk job empty http replies
2014-06-05 20:09:12 -07:00
ee5af6b30e
more spider proxy fixes
2014-06-02 14:59:15 -07:00
ca450e6bbd
using msg55 when done downloading through a proxy to record
...
stats for load balancing on host #0
2014-06-02 13:48:33 -07:00
b6e5424e32
do not download bulkjob urls in crawlbot.
...
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
0f3374e3f3
measure crawl delay by default from
...
start of each download now. it is
a parm in msg13request.
2013-11-26 14:07:28 -08:00
e8065a0f0a
enforce crawl delay perfectly.
2013-11-22 18:26:34 -08:00
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00