1dc68de0fa
more fixes to pausing spiders if too many incoming udp slots. raised limit from 200 to 300.
Matt Wells
2015-09-02 07:43:18 -07:00
aeb5039470
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt
2015-09-02 07:21:44 -06:00
a1a38bd2b2
fix attempt merge some more
Matt
2015-09-02 07:21:32 -06:00
98b5c05c84
done not construct waiting tree if too many udp incoming requests
Matt Wells
2015-09-01 13:22:27 -07:00
88ddfb14da
if 200+ incoming udp slots then pause spidering on that host. should make it so it calling getUrlFilterNum() will not slow down the whole network as much.
Matt Wells
2015-09-01 12:58:39 -07:00
586fe15cb2
parm updates
Matt
2015-09-01 09:42:18 -06:00
4d9c35098d
update new parm
Matt
2015-09-01 09:32:53 -06:00
5a7b01585d
Work on graph axis autoscaling.
Zak Betz
2015-08-31 23:19:28 -06:00
de51769e5a
add switch to turn off site num inlink computation and just use sitelinks.txt for speed
Matt
2015-08-31 22:29:51 -06:00
0d7b465f17
fix coll getting starved by other coll on max ips limitation.
Matt Wells
2015-08-31 15:03:05 -07:00
9de719b050
Merge branch 'testing' into diffbot-testing
Matt
2015-08-31 14:07:16 -06:00
c803e0906e
fix </script> tag detection stuff again.
Matt
2015-08-31 14:06:44 -06:00
efa93aad18
prevent double ./gb start calls from messing things up.
Matt
2015-08-31 11:13:33 -06:00
ddf4ae2240
More testing on nospider, noquery. Add flags to make the nospider and noquery visible on hosts page.
Zak Betz
2015-08-31 10:47:19 -06:00
994bdbdd54
fix logging deadlock bug.
Matt
2015-08-31 09:56:34 -06:00
e373f28728
update hosts.conf generation. removed old stuff.
Matt
2015-08-31 09:29:28 -06:00
744cd54131
Merge branch 'ia' into ia-zak
Matt
2015-08-31 09:14:27 -06:00
792f12587e
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-08-29 14:28:25 -07:00
cbf01ab77c
add download new urls.csv link to crawlbot page
Matt Wells
2015-08-29 14:28:01 -07:00
e3526fdacb
bring back flush disk writes parm for experimenting with.
Matt
2015-08-28 22:43:45 -06:00
b6b31c7be0
fixes for comments in script tags.
Matt
2015-08-28 18:07:50 -06:00
8299197cca
comments in <script> tags are a convultion. deal with all four types and their precedence issues. all of this is to find the proper end of the </script> and not a </script> or <script> that is being printed out in the javascript in the <script> tag.
Matt
2015-08-28 16:31:22 -06:00
a7222dcf3f
nothing really.
Matt
2015-08-28 13:32:40 -06:00
7418d6e9e2
better parsing of <script> tags. now we use single and double quotes and comments so we ignore '</script>' or '<script>' if in a writeln statement, or comment, etc.
Matt
2015-08-28 11:45:43 -06:00
b7e4ab9848
fix having <script> tags in a <script> tag if it is in single or double quotes. ignore escaped quotes.
Matt
2015-08-28 09:08:39 -06:00
41268aeba7
Changes to script to copy to back twins.
Zak Betz
2015-08-28 09:06:16 -06:00
3e47276950
update makefile for 32 bit compilation
Matt
2015-08-25 15:01:27 -06:00
c766e40357
set g_errno to ENOCOLLREC if getRdbBase() returns null.
Matt Wells
2015-08-25 11:41:17 -07:00
35a3ce14ad
fix infinite loop when coll rec is deleted during a merge.
Matt Wells
2015-08-25 11:15:02 -07:00
962f8672bf
fix getting xml status msg
Matt
2015-08-25 11:27:07 -06:00
1badb8cd07
fix up hammer queue table print out on sockets pages. make crawl delay link to the robots.txt.
Matt
2015-08-25 11:07:54 -06:00
ea2c2d7190
show read buf of http sockets as well as the send buf in the tool tip.
Matt
2015-08-25 10:53:16 -06:00
7fcc2ab4e1
in the sockets table page, show url download requests that are queued up to prevent hammering an ip. also show the first 500 bytes of the send buf in the http server sockets table.
Matt Wells
2015-08-25 09:34:45 -07:00
c7546cc646
save a malloc in bigfile
Matt Wells
2015-08-25 08:37:53 -07:00
9180c97a2c
by default do not make static gb any more, not even on debian/ubuntu. we were not always detecting redhat installs correctly like on aws.
Matt Wells
2015-08-25 08:33:48 -07:00
e140b001d8
try merging 1000 collections per call to preserve cpu
Matt Wells
2015-08-25 08:25:55 -07:00
01d77ee220
make umsg00 electric fence code SPECIAL
Matt
2015-08-24 18:37:30 -06:00
289c6b90cf
ssl connect calls malloc
Matt
2015-08-24 18:12:28 -06:00
3db2ce8e24
errnotest.cpp fix
Matt
2015-08-24 16:22:11 -06:00
f12d3ffd01
use useElectricFence var for clarity
Matt
2015-08-24 14:35:23 -06:00
9c686a40d3
fix realloc core from special umsg00 electric fence code
Matt
2015-08-24 14:25:04 -06:00
76bfe4a8ba
elec free is delayed.
Matt
2015-08-24 14:08:56 -06:00
49e9f5a827
fixes for umsg00 electric fence. take out catdb/statsdb merging attempts.
Matt
2015-08-24 11:35:33 -06:00
65f61351ee
efence only on umsg00 buffers. fix BigFile so we dont realloc file buf which could change file ptrs that are engaged in outstanding reads, and go back to using file ptrs again.
Matt Wells
2015-08-24 09:23:40 -07:00
a4bfbb31f8
fix save prevention when coring in malloc/free.
Matt Wells
2015-08-23 11:51:46 -07:00
a5a9820441
ignore tagdb tag rec bad recsize core. do not save conf if crawlbot and not host id 0 and cored in mem function, otherwise it just hangs and gb can't restart.
Matt Wells
2015-08-23 09:40:11 -07:00
493a816be8
cut down on the mallocs for BigFile::m_baseFilename safebuf
Matt Wells
2015-08-22 17:30:09 -07:00
0d320acebf
do not access BigFile::m_fileBuf when in thread, it might have been reallocated or closed, etc. juse use getCloseCount_r().
Matt Wells
2015-08-22 16:32:25 -07:00
e252dfb088
Add docs per second stat. Fix auto update on statsdb graph. Add Stat toggles for statsdb graph. Add a unit test for indexing an array in metadata.
Zak Betz
2015-08-22 12:05:20 -06:00
bb16341f51
try to fix core dumps. not sure how mem is getting corrupted.
Matt Wells
2015-08-22 08:52:28 -07:00
0f7910125b
make it so we can still save coll.conf on malloc/free cores. do not call RdbMap::reduceMemFootprint() on maps that are from files being merged into and we're resuming the killed merge at startup.
Matt Wells
2015-08-21 18:07:07 -07:00
23c7862892
added more quickpolls.
Matt Wells
2015-08-21 16:52:23 -07:00
035d232673
fix another core in crawlbot
Matt Wells
2015-08-21 14:30:13 -07:00
74ec812959
try to fix core from adding a file that already exists. just return an error now. hopefully merge will try again later. also core if you try to write recs to an rdbmap that has already had its memory footprint reduced so we can find that overrun bug.
Matt Wells
2015-08-21 14:00:40 -07:00
ad695c001a
fix stolen fd bug.
Matt Wells
2015-08-20 20:07:30 -07:00
d4468c8d66
File::m_filename as a safebuf was causing problems. reverted to char buf of fixed length. only autosave coll.confs if there are not too many or we are host #0. otherwise it blocks too long.
Matt Wells
2015-08-19 14:30:18 -07:00
1b19d53286
give safebuf buffer for File::m_filename[]. easier to save if core in malloc/free and less mallocs in general.
Matt Wells
2015-08-19 10:14:11 -07:00
adbec58f41
fix core from asking for too many docids
Matt
2015-08-19 08:53:39 -07:00
6c14d659b8
move 2nd occurence of same collnum_t collection id on the same shard to the trash/ subdir. put call to syncParmsWithHost0 in a sleep loop in case host #0 has error, although the timeout is really high.
Matt Wells
2015-08-18 18:59:01 -07:00
9642947136
fix so host #0 will delete then re-add collections that use the same collnum but have a different name. fixed some unlabelled safebufs. fix core when deleting collnum from tree/buckets that is higher than Collectiondb.m_numRecs. fix File::m_filename safebufs that were not freed on exit.
Matt Wells
2015-08-18 14:09:16 -07:00
dd9b4e0ca2
fix little core
Matt Wells
2015-08-17 15:04:16 -07:00
30693c3cf7
use setBuf() func instead
Matt Wells
2015-08-16 22:19:30 -07:00
28644f127e
fix problem of saving rdbmap when coring in a malloc/free.
Matt Wells
2015-08-16 22:14:53 -07:00
be1ebfbcd0
do not execute backtrace function if core was in Mem.cpp basically otherwise we don't save state.
Matt Wells
2015-08-16 20:29:14 -07:00
3a67480b63
for BigFile::m_fileBuf array of Files make sure to clear it for files that do not exist so File::m_calledSet is false on them. so BigFile::getFile(j) returns a File ptr whose m_calledSet is false if the file does not exist on disk. and BigFile::removePart(j) sets ((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
Matt Wells
2015-08-16 19:40:08 -07:00
63c7752734
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-08-16 17:14:33 -07:00
e671be17ca
fix log msg
Matt Wells
2015-08-16 17:14:21 -07:00
b709f736f4
show max mem alloc slots in pagestats.cpp
Matt
2015-08-16 17:32:47 -06:00
ffa6c09c74
fix BigFile::addPart(n) when adding parts out of order.
Matt Wells
2015-08-16 15:13:59 -07:00
f8fb266844
fix new merging algo.
Matt Wells
2015-08-16 10:11:21 -07:00
178721d35b
speed up getFileSize() by using stat() func again. despam logs at startup. do not perm check every coll dir, only first 100, on startup to make things faster.
Matt Wells
2015-08-15 22:21:15 -07:00
bff643b555
use a linked list of merge candidates to make attemptMergeAll() much much faster.
Matt
2015-08-15 19:26:37 -06:00
d9422d8b0e
get rid of limits on file sizes. dynamically allocate file names and fixed-size File array in BigFile class. should save gigabytes of memory in many-collection systems with 1+ million files or so.
Matt
2015-08-14 20:14:50 -06:00
f7f577cf98
the new disk page cache. temporarily disabled.
Matt
2015-08-14 15:52:24 -06:00
3213858545
Merge branch 'diffbot-testing' into diffbot
Matt
2015-08-14 13:08:48 -06:00
0d2aa33afb
undo #define thing
Matt
2015-08-14 13:08:11 -06:00
a1ed368d82
bring back max mem control into master controls. it's useful to limit per process mem usage to prevent oom killer because we can't save if we get killed. overhaul diskpagecache to just use rdbcache. much simpler and faster, but disabled for now until debugged more. reduce min files to merge for crawlbot collections so they stay more tightly merged to conserve fds and mem. improved logDebugDisk msgs. overhauled File.cpp fd pool. now it is way faster and doesn't use any extra mem. much simpler too. although could be sped up a little by using a linked list, but probably is not significant enough to warrant doing right now. increase mem ptr table from 3M to 8M slots. should really make dynamic though. fix core from null msg20s[0]->m_r. only call attemptMergeAll once every 60 seconds really. do not attempt merge if already merging.
Matt
2015-08-14 12:58:54 -06:00