8e40e41aa7
make a note if this future time bug happens again
Matt Wells
2015-09-04 13:25:18 -07:00
790d6a3b5f
revert removal of pausing spiders off if too many udp slots in use. added new spider request corruption detection.
Matt Wells
2015-09-04 13:19:03 -07:00
c653b0989c
undo some possible averse changes
Matt Wells
2015-09-04 11:31:43 -07:00
e7f1c75855
Add logic to limit number of msg7s to 100 per hosts, then we drop the requests.
Zak Betz
2015-09-03 22:17:16 -06:00
adac0033d9
remove a couple unused parms
Matt Wells
2015-09-03 20:41:40 -07:00
f157be1742
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-09-03 20:28:47 -07:00
611eaeca8e
added 'separate disk reads' parm so we can allow spider disk threads to compete with query disk threads. that way ppl constantly doing queries won't slow the spiders but their queries might be slower.
Matt Wells
2015-09-03 20:27:54 -07:00
26404443a8
if disk thread took 0 ms then put *>*XMB/s in thread table
Matt
2015-09-03 14:50:56 -06:00
d9a8a16751
unit fix
Matt
2015-09-03 14:30:39 -06:00
1fa7dc7cea
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt
2015-09-03 14:28:17 -06:00
6cf69399e5
handle bad host sending us a c1 request so we dont core
Matt
2015-09-03 14:27:33 -06:00
7ca4e753aa
little core fix
Matt Wells
2015-09-03 12:52:42 -07:00
ad95058fd5
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt
2015-09-03 12:01:15 -06:00
b50f36cd73
a whole new threads stack
Matt
2015-09-03 11:59:43 -06:00
254eba2a37
fix core
Matt Wells
2015-09-02 15:26:43 -07:00
a38baaac0c
spider time speed ups for crawlbot jobs
Matt Wells
2015-09-02 15:03:46 -07:00
e92aefebcf
only add docs indexed stats on host #0 statsdb
Matt Wells
2015-09-02 13:56:25 -07:00
8d69b6a867
profiler fix
Matt Wells
2015-09-02 13:45:13 -07:00
1cce6d510e
do not be so easy to say crawling is paused because a shard is down.
Matt Wells
2015-09-02 13:39:40 -07:00
be1d81a0db
Merge branch 'diffbot-testing' into diffbot
Matt Wells
2015-09-02 09:27:47 -07:00
129c9d65db
fix default hosts.conf generation
Matt Wells
2015-09-02 09:26:03 -07:00
1dc68de0fa
more fixes to pausing spiders if too many incoming udp slots. raised limit from 200 to 300.
Matt Wells
2015-09-02 07:43:18 -07:00
aeb5039470
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt
2015-09-02 07:21:44 -06:00
a1a38bd2b2
fix attempt merge some more
Matt
2015-09-02 07:21:32 -06:00
98b5c05c84
done not construct waiting tree if too many udp incoming requests
Matt Wells
2015-09-01 13:22:27 -07:00
88ddfb14da
if 200+ incoming udp slots then pause spidering on that host. should make it so it calling getUrlFilterNum() will not slow down the whole network as much.
Matt Wells
2015-09-01 12:58:39 -07:00
586fe15cb2
parm updates
Matt
2015-09-01 09:42:18 -06:00
4d9c35098d
update new parm
Matt
2015-09-01 09:32:53 -06:00
5a7b01585d
Work on graph axis autoscaling.
Zak Betz
2015-08-31 23:19:28 -06:00
de51769e5a
add switch to turn off site num inlink computation and just use sitelinks.txt for speed
Matt
2015-08-31 22:29:51 -06:00
0d7b465f17
fix coll getting starved by other coll on max ips limitation.
Matt Wells
2015-08-31 15:03:05 -07:00
9de719b050
Merge branch 'testing' into diffbot-testing
Matt
2015-08-31 14:07:16 -06:00
c803e0906e
fix </script> tag detection stuff again.
Matt
2015-08-31 14:06:44 -06:00
efa93aad18
prevent double ./gb start calls from messing things up.
Matt
2015-08-31 11:13:33 -06:00
ddf4ae2240
More testing on nospider, noquery. Add flags to make the nospider and noquery visible on hosts page.
Zak Betz
2015-08-31 10:47:19 -06:00
994bdbdd54
fix logging deadlock bug.
Matt
2015-08-31 09:56:34 -06:00
e373f28728
update hosts.conf generation. removed old stuff.
Matt
2015-08-31 09:29:28 -06:00
744cd54131
Merge branch 'ia' into ia-zak
Matt
2015-08-31 09:14:27 -06:00
792f12587e
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-08-29 14:28:25 -07:00
cbf01ab77c
add download new urls.csv link to crawlbot page
Matt Wells
2015-08-29 14:28:01 -07:00
e3526fdacb
bring back flush disk writes parm for experimenting with.
Matt
2015-08-28 22:43:45 -06:00
b6b31c7be0
fixes for comments in script tags.
Matt
2015-08-28 18:07:50 -06:00
8299197cca
comments in <script> tags are a convultion. deal with all four types and their precedence issues. all of this is to find the proper end of the </script> and not a </script> or <script> that is being printed out in the javascript in the <script> tag.
Matt
2015-08-28 16:31:22 -06:00
a7222dcf3f
nothing really.
Matt
2015-08-28 13:32:40 -06:00
7418d6e9e2
better parsing of <script> tags. now we use single and double quotes and comments so we ignore '</script>' or '<script>' if in a writeln statement, or comment, etc.
Matt
2015-08-28 11:45:43 -06:00
b7e4ab9848
fix having <script> tags in a <script> tag if it is in single or double quotes. ignore escaped quotes.
Matt
2015-08-28 09:08:39 -06:00
41268aeba7
Changes to script to copy to back twins.
Zak Betz
2015-08-28 09:06:16 -06:00
3e47276950
update makefile for 32 bit compilation
Matt
2015-08-25 15:01:27 -06:00
c766e40357
set g_errno to ENOCOLLREC if getRdbBase() returns null.
Matt Wells
2015-08-25 11:41:17 -07:00
35a3ce14ad
fix infinite loop when coll rec is deleted during a merge.
Matt Wells
2015-08-25 11:15:02 -07:00
962f8672bf
fix getting xml status msg
Matt
2015-08-25 11:27:07 -06:00
1badb8cd07
fix up hammer queue table print out on sockets pages. make crawl delay link to the robots.txt.
Matt
2015-08-25 11:07:54 -06:00
ea2c2d7190
show read buf of http sockets as well as the send buf in the tool tip.
Matt
2015-08-25 10:53:16 -06:00
7fcc2ab4e1
in the sockets table page, show url download requests that are queued up to prevent hammering an ip. also show the first 500 bytes of the send buf in the http server sockets table.
Matt Wells
2015-08-25 09:34:45 -07:00
c7546cc646
save a malloc in bigfile
Matt Wells
2015-08-25 08:37:53 -07:00
9180c97a2c
by default do not make static gb any more, not even on debian/ubuntu. we were not always detecting redhat installs correctly like on aws.
Matt Wells
2015-08-25 08:33:48 -07:00
e140b001d8
try merging 1000 collections per call to preserve cpu
Matt Wells
2015-08-25 08:25:55 -07:00
01d77ee220
make umsg00 electric fence code SPECIAL
Matt
2015-08-24 18:37:30 -06:00
289c6b90cf
ssl connect calls malloc
Matt
2015-08-24 18:12:28 -06:00
3db2ce8e24
errnotest.cpp fix
Matt
2015-08-24 16:22:11 -06:00
f12d3ffd01
use useElectricFence var for clarity
Matt
2015-08-24 14:35:23 -06:00
9c686a40d3
fix realloc core from special umsg00 electric fence code
Matt
2015-08-24 14:25:04 -06:00
76bfe4a8ba
elec free is delayed.
Matt
2015-08-24 14:08:56 -06:00
49e9f5a827
fixes for umsg00 electric fence. take out catdb/statsdb merging attempts.
Matt
2015-08-24 11:35:33 -06:00
65f61351ee
efence only on umsg00 buffers. fix BigFile so we dont realloc file buf which could change file ptrs that are engaged in outstanding reads, and go back to using file ptrs again.
Matt Wells
2015-08-24 09:23:40 -07:00
a4bfbb31f8
fix save prevention when coring in malloc/free.
Matt Wells
2015-08-23 11:51:46 -07:00
a5a9820441
ignore tagdb tag rec bad recsize core. do not save conf if crawlbot and not host id 0 and cored in mem function, otherwise it just hangs and gb can't restart.
Matt Wells
2015-08-23 09:40:11 -07:00
493a816be8
cut down on the mallocs for BigFile::m_baseFilename safebuf
Matt Wells
2015-08-22 17:30:09 -07:00
0d320acebf
do not access BigFile::m_fileBuf when in thread, it might have been reallocated or closed, etc. juse use getCloseCount_r().
Matt Wells
2015-08-22 16:32:25 -07:00
e252dfb088
Add docs per second stat. Fix auto update on statsdb graph. Add Stat toggles for statsdb graph. Add a unit test for indexing an array in metadata.
Zak Betz
2015-08-22 12:05:20 -06:00
bb16341f51
try to fix core dumps. not sure how mem is getting corrupted.
Matt Wells
2015-08-22 08:52:28 -07:00
0f7910125b
make it so we can still save coll.conf on malloc/free cores. do not call RdbMap::reduceMemFootprint() on maps that are from files being merged into and we're resuming the killed merge at startup.
Matt Wells
2015-08-21 18:07:07 -07:00
23c7862892
added more quickpolls.
Matt Wells
2015-08-21 16:52:23 -07:00
035d232673
fix another core in crawlbot
Matt Wells
2015-08-21 14:30:13 -07:00
74ec812959
try to fix core from adding a file that already exists. just return an error now. hopefully merge will try again later. also core if you try to write recs to an rdbmap that has already had its memory footprint reduced so we can find that overrun bug.
Matt Wells
2015-08-21 14:00:40 -07:00
ad695c001a
fix stolen fd bug.
Matt Wells
2015-08-20 20:07:30 -07:00
d4468c8d66
File::m_filename as a safebuf was causing problems. reverted to char buf of fixed length. only autosave coll.confs if there are not too many or we are host #0. otherwise it blocks too long.
Matt Wells
2015-08-19 14:30:18 -07:00
1b19d53286
give safebuf buffer for File::m_filename[]. easier to save if core in malloc/free and less mallocs in general.
Matt Wells
2015-08-19 10:14:11 -07:00
adbec58f41
fix core from asking for too many docids
Matt
2015-08-19 08:53:39 -07:00
6c14d659b8
move 2nd occurence of same collnum_t collection id on the same shard to the trash/ subdir. put call to syncParmsWithHost0 in a sleep loop in case host #0 has error, although the timeout is really high.
Matt Wells
2015-08-18 18:59:01 -07:00
9642947136
fix so host #0 will delete then re-add collections that use the same collnum but have a different name. fixed some unlabelled safebufs. fix core when deleting collnum from tree/buckets that is higher than Collectiondb.m_numRecs. fix File::m_filename safebufs that were not freed on exit.
Matt Wells
2015-08-18 14:09:16 -07:00
dd9b4e0ca2
fix little core
Matt Wells
2015-08-17 15:04:16 -07:00
30693c3cf7
use setBuf() func instead
Matt Wells
2015-08-16 22:19:30 -07:00
28644f127e
fix problem of saving rdbmap when coring in a malloc/free.
Matt Wells
2015-08-16 22:14:53 -07:00
be1ebfbcd0
do not execute backtrace function if core was in Mem.cpp basically otherwise we don't save state.
Matt Wells
2015-08-16 20:29:14 -07:00
3a67480b63
for BigFile::m_fileBuf array of Files make sure to clear it for files that do not exist so File::m_calledSet is false on them. so BigFile::getFile(j) returns a File ptr whose m_calledSet is false if the file does not exist on disk. and BigFile::removePart(j) sets ((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
Matt Wells
2015-08-16 19:40:08 -07:00