1dc68de0famore fixes to pausing spiders if too many incoming udp slots. raised limit from 200 to 300.
Matt Wells
2015-09-02 07:43:18 -07:00
aeb5039470Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt
2015-09-02 07:21:44 -06:00
a1a38bd2b2fix attempt merge some more
Matt
2015-09-02 07:21:32 -06:00
98b5c05c84done not construct waiting tree if too many udp incoming requests
Matt Wells
2015-09-01 13:22:27 -07:00
88ddfb14daif 200+ incoming udp slots then pause spidering on that host. should make it so it calling getUrlFilterNum() will not slow down the whole network as much.
Matt Wells
2015-09-01 12:58:39 -07:00
586fe15cb2parm updates
Matt
2015-09-01 09:42:18 -06:00
4d9c35098dupdate new parm
Matt
2015-09-01 09:32:53 -06:00
5a7b01585dWork on graph axis autoscaling.
Zak Betz
2015-08-31 23:19:28 -06:00
de51769e5aadd switch to turn off site num inlink computation and just use sitelinks.txt for speed
Matt
2015-08-31 22:29:51 -06:00
0d7b465f17fix coll getting starved by other coll on max ips limitation.
Matt Wells
2015-08-31 15:03:05 -07:00
9de719b050Merge branch 'testing' into diffbot-testing
Matt
2015-08-31 14:07:16 -06:00
c803e0906efix </script> tag detection stuff again.
Matt
2015-08-31 14:06:44 -06:00
efa93aad18prevent double ./gb start calls from messing things up.
Matt
2015-08-31 11:13:33 -06:00
ddf4ae2240More testing on nospider, noquery. Add flags to make the nospider and noquery visible on hosts page.
Zak Betz
2015-08-31 10:47:19 -06:00
994bdbdd54fix logging deadlock bug.
Matt
2015-08-31 09:56:34 -06:00
e373f28728update hosts.conf generation. removed old stuff.
Matt
2015-08-31 09:29:28 -06:00
744cd54131Merge branch 'ia' into ia-zak
Matt
2015-08-31 09:14:27 -06:00
792f12587eMerge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-08-29 14:28:25 -07:00
cbf01ab77cadd download new urls.csv link to crawlbot page
Matt Wells
2015-08-29 14:28:01 -07:00
e3526fdacbbring back flush disk writes parm for experimenting with.
Matt
2015-08-28 22:43:45 -06:00
b6b31c7be0fixes for comments in script tags.
Matt
2015-08-28 18:07:50 -06:00
8299197ccacomments in <script> tags are a convultion. deal with all four types and their precedence issues. all of this is to find the proper end of the </script> and not a </script> or <script> that is being printed out in the javascript in the <script> tag.
Matt
2015-08-28 16:31:22 -06:00
a7222dcf3fnothing really.
Matt
2015-08-28 13:32:40 -06:00
7418d6e9e2better parsing of <script> tags. now we use single and double quotes and comments so we ignore '</script>' or '<script>' if in a writeln statement, or comment, etc.
Matt
2015-08-28 11:45:43 -06:00
b7e4ab9848fix having <script> tags in a <script> tag if it is in single or double quotes. ignore escaped quotes.
Matt
2015-08-28 09:08:39 -06:00
41268aeba7Changes to script to copy to back twins.
Zak Betz
2015-08-28 09:06:16 -06:00
3e47276950update makefile for 32 bit compilation
Matt
2015-08-25 15:01:27 -06:00
60c4c5c437Add nospider and noquery options.
Zak Betz
2015-08-25 13:48:20 -06:00
c766e40357set g_errno to ENOCOLLREC if getRdbBase() returns null.
Matt Wells
2015-08-25 11:41:17 -07:00
35a3ce14adfix infinite loop when coll rec is deleted during a merge.
Matt Wells
2015-08-25 11:15:02 -07:00
962f8672bffix getting xml status msg
Matt
2015-08-25 11:27:07 -06:00
1badb8cd07fix up hammer queue table print out on sockets pages. make crawl delay link to the robots.txt.
Matt
2015-08-25 11:07:54 -06:00
ea2c2d7190show read buf of http sockets as well as the send buf in the tool tip.
Matt
2015-08-25 10:53:16 -06:00
7fcc2ab4e1in the sockets table page, show url download requests that are queued up to prevent hammering an ip. also show the first 500 bytes of the send buf in the http server sockets table.
Matt Wells
2015-08-25 09:34:45 -07:00
c7546cc646save a malloc in bigfile
Matt Wells
2015-08-25 08:37:53 -07:00
9180c97a2cby default do not make static gb any more, not even on debian/ubuntu. we were not always detecting redhat installs correctly like on aws.
Matt Wells
2015-08-25 08:33:48 -07:00
e140b001d8try merging 1000 collections per call to preserve cpu
Matt Wells
2015-08-25 08:25:55 -07:00
01d77ee220make umsg00 electric fence code SPECIAL
Matt
2015-08-24 18:37:30 -06:00
289c6b90cfssl connect calls malloc
Matt
2015-08-24 18:12:28 -06:00
3db2ce8e24errnotest.cpp fix
Matt
2015-08-24 16:22:11 -06:00
f12d3ffd01use useElectricFence var for clarity
Matt
2015-08-24 14:35:23 -06:00
9c686a40d3fix realloc core from special umsg00 electric fence code
Matt
2015-08-24 14:25:04 -06:00
76bfe4a8baelec free is delayed.
Matt
2015-08-24 14:08:56 -06:00
49e9f5a827fixes for umsg00 electric fence. take out catdb/statsdb merging attempts.
Matt
2015-08-24 11:35:33 -06:00
65f61351eeefence only on umsg00 buffers. fix BigFile so we dont realloc file buf which could change file ptrs that are engaged in outstanding reads, and go back to using file ptrs again.
Matt Wells
2015-08-24 09:23:40 -07:00
a4bfbb31f8fix save prevention when coring in malloc/free.
Matt Wells
2015-08-23 11:51:46 -07:00
a5a9820441ignore tagdb tag rec bad recsize core. do not save conf if crawlbot and not host id 0 and cored in mem function, otherwise it just hangs and gb can't restart.
Matt Wells
2015-08-23 09:40:11 -07:00
493a816be8cut down on the mallocs for BigFile::m_baseFilename safebuf
Matt Wells
2015-08-22 17:30:09 -07:00
0d320acebfdo not access BigFile::m_fileBuf when in thread, it might have been reallocated or closed, etc. juse use getCloseCount_r().
Matt Wells
2015-08-22 16:32:25 -07:00
e252dfb088Add docs per second stat. Fix auto update on statsdb graph. Add Stat toggles for statsdb graph. Add a unit test for indexing an array in metadata.
Zak Betz
2015-08-22 12:05:20 -06:00
bb16341f51try to fix core dumps. not sure how mem is getting corrupted.
Matt Wells
2015-08-22 08:52:28 -07:00
0f7910125bmake it so we can still save coll.conf on malloc/free cores. do not call RdbMap::reduceMemFootprint() on maps that are from files being merged into and we're resuming the killed merge at startup.
Matt Wells
2015-08-21 18:07:07 -07:00
23c7862892added more quickpolls.
Matt Wells
2015-08-21 16:52:23 -07:00
035d232673fix another core in crawlbot
Matt Wells
2015-08-21 14:30:13 -07:00
74ec812959try to fix core from adding a file that already exists. just return an error now. hopefully merge will try again later. also core if you try to write recs to an rdbmap that has already had its memory footprint reduced so we can find that overrun bug.
Matt Wells
2015-08-21 14:00:40 -07:00
ad695c001afix stolen fd bug.
Matt Wells
2015-08-20 20:07:30 -07:00
d4468c8d66File::m_filename as a safebuf was causing problems. reverted to char buf of fixed length. only autosave coll.confs if there are not too many or we are host #0. otherwise it blocks too long.
Matt Wells
2015-08-19 14:30:18 -07:00
1b19d53286give safebuf buffer for File::m_filename[]. easier to save if core in malloc/free and less mallocs in general.
Matt Wells
2015-08-19 10:14:11 -07:00
adbec58f41fix core from asking for too many docids
Matt
2015-08-19 08:53:39 -07:00
6c14d659b8move 2nd occurence of same collnum_t collection id on the same shard to the trash/ subdir. put call to syncParmsWithHost0 in a sleep loop in case host #0 has error, although the timeout is really high.
Matt Wells
2015-08-18 18:59:01 -07:00
9642947136fix so host #0 will delete then re-add collections that use the same collnum but have a different name. fixed some unlabelled safebufs. fix core when deleting collnum from tree/buckets that is higher than Collectiondb.m_numRecs. fix File::m_filename safebufs that were not freed on exit.
Matt Wells
2015-08-18 14:09:16 -07:00
dd9b4e0ca2fix little core
Matt Wells
2015-08-17 15:04:16 -07:00
30693c3cf7use setBuf() func instead
Matt Wells
2015-08-16 22:19:30 -07:00
28644f127efix problem of saving rdbmap when coring in a malloc/free.
Matt Wells
2015-08-16 22:14:53 -07:00
be1ebfbcd0do not execute backtrace function if core was in Mem.cpp basically otherwise we don't save state.
Matt Wells
2015-08-16 20:29:14 -07:00
3a67480b63for BigFile::m_fileBuf array of Files make sure to clear it for files that do not exist so File::m_calledSet is false on them. so BigFile::getFile(j) returns a File ptr whose m_calledSet is false if the file does not exist on disk. and BigFile::removePart(j) sets ((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
Matt Wells
2015-08-16 19:40:08 -07:00
63c7752734Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2015-08-16 17:14:33 -07:00
e671be17cafix log msg
Matt Wells
2015-08-16 17:14:21 -07:00
b709f736f4show max mem alloc slots in pagestats.cpp
Matt
2015-08-16 17:32:47 -06:00
ffa6c09c74fix BigFile::addPart(n) when adding parts out of order.
Matt Wells
2015-08-16 15:13:59 -07:00
f8fb266844fix new merging algo.
Matt Wells
2015-08-16 10:11:21 -07:00
178721d35bspeed up getFileSize() by using stat() func again. despam logs at startup. do not perm check every coll dir, only first 100, on startup to make things faster.
Matt Wells
2015-08-15 22:21:15 -07:00
bff643b555use a linked list of merge candidates to make attemptMergeAll() much much faster.
Matt
2015-08-15 19:26:37 -06:00
d9422d8b0eget rid of limits on file sizes. dynamically allocate file names and fixed-size File array in BigFile class. should save gigabytes of memory in many-collection systems with 1+ million files or so.
Matt
2015-08-14 20:14:50 -06:00
f7f577cf98the new disk page cache. temporarily disabled.
Matt
2015-08-14 15:52:24 -06:00
3213858545Merge branch 'diffbot-testing' into diffbot
Matt
2015-08-14 13:08:48 -06:00
0d2aa33afbundo #define thing
Matt
2015-08-14 13:08:11 -06:00
a1ed368d82bring back max mem control into master controls. it's useful to limit per process mem usage to prevent oom killer because we can't save if we get killed. overhaul diskpagecache to just use rdbcache. much simpler and faster, but disabled for now until debugged more. reduce min files to merge for crawlbot collections so they stay more tightly merged to conserve fds and mem. improved logDebugDisk msgs. overhauled File.cpp fd pool. now it is way faster and doesn't use any extra mem. much simpler too. although could be sped up a little by using a linked list, but probably is not significant enough to warrant doing right now. increase mem ptr table from 3M to 8M slots. should really make dynamic though. fix core from null msg20s[0]->m_r. only call attemptMergeAll once every 60 seconds really. do not attempt merge if already merging.
Matt
2015-08-14 12:58:54 -06:00