657d669ec8exclude events and seo functionality. most people want this for web search so it should be a non-issue.
mwells
2013-09-08 17:07:42 -0600
34b6d3e74afixed some cores. brought in fixes from old repo.
mwells
2013-09-08 16:16:13 -0600
dcf45dd69ddump out doledb to disk when it has more than 50,000 negative keys to avoid positive/negative key annihilations delays.
mwells
2013-09-08 15:09:54 -0600
03706131fedocumentation updates in Spider.h.
Matt Wells
2013-09-08 13:42:02 -0700
54c9353dbdtry to fix core from g_inSigHandler being set. it should never be set since we do not use real time signals any more
mwells
2013-09-08 12:34:37 -0600
0581f86265fix core from calling a gettime related function from a pthread when a signal handler from the main thread was in use and POSSIBLY in the same function when the signal went off. different threads should be able to access that function just fine i'd imagine.
mwells
2013-09-06 15:39:53 -0600
7aa81abf91use the "onsite" keyword in your url filters instead of this "only spider links from same host" switch to keep things simpler.
mwells
2013-09-06 09:37:17 -0600
c58df10155fix major bug causing spiders not to work.
Matt Wells
2013-09-04 11:01:24 -0700
91c4e768b1more family filter fixes
mwells
2013-09-01 18:28:49 -0600
aaf333c46ctry to get family filter (&ff=1) working again to filter out adult search results.
mwells
2013-09-01 18:22:38 -0600
afbd1e2b96fix core from trying to get the time while in a sig handler. getTime() is not async safe.
mwells
2013-09-01 12:55:22 -0600
93dfb0cfd4fix for the "spiders stuck" fix.
mwells
2013-08-31 11:25:26 -0600
af46945403show more info when dumping doledb.
mwells
2013-08-31 10:55:05 -0600
9696c7936aMerge branch 'master' into diffbot
Matt Wells
2013-08-30 16:33:00 -0700
94e6492916removed MAX_COLL_RECS so we can have unlimited collections, really limited by the sizeof(collnum_t) only now, which is 16bits, 15bits unsigned, which is the limitation. can always expand this so we can have more than 32k collections.
Matt Wells
2013-08-30 16:20:38 -0700
900bbf8fbatry to fix the bug of the spiders kinda getting stuck and now spidering to their max potential because of doledb record annihilations at the top of the spider priority queue in spiderdb of SpiderRequests. was causing lots of re-reads in Msg5.cpp of doledb, like over 300 rounds, very slow.
mwells
2013-08-29 21:59:02 -0600
2e9c8f7c6eMerge branch 'master' of github.com:gigablast/open-source-search-engine
mwells
2013-08-29 21:17:46 -0600
84fae9a3c6Fix issue of reading spiderrequests from doledb at the very first key in spiderdb. causes lots of positive/negative key annihilations. we end up re-reading like 300 times in some cases just to get a url from a doledb priority.
mwells
2013-08-29 21:16:59 -0600
ca2a024d04fixed up thread/spider log msgs. fixed core from calling fprintf in alarm signal missed quickpoll handler.
mwells
2013-08-29 21:15:42 -0600
e925012dcechange a couple of possible reserved names in C++ to non-reserved names. #define _ADDRESS_H_ to _GB_ADDRESS_H_ etc.
mwells
2013-08-28 22:59:01 -0600
82ee2dfed7fix cores when spider is unzipping gzipped web pages.
mwells
2013-08-28 22:49:22 -0600
80179525c1when using pthreads block SIGIO so it does not silently kill the gb process because we no longer have a handler for it because it was bogging down the cpu because it went off every time a udp datagram was sent/received and it seemed to have a ton of overhead with it. SIGIO used to be sent when the signal queue was full so we'd resort to polling the file descriptors, so i'm not sure how this will affect us. also updated Threads.cpp to use getpidtid() instead of getpid() to get the thread id when using pthreads, not the process id. using pthreads is now default behaviour even though they suck. we used to use clone() but the newer stuff doesn't allow us to override errno_location anymore.
mwells
2013-08-21 15:01:26 -0600
6332de2dafadded link to compare.html comparison to SOLR into documentation.
mwells
2013-08-21 13:14:17 -0600
37a6549a58updates to developer.html developer documentation. removed a lot of obsolete information. still needs more work.
mwells
2013-08-21 13:09:55 -0600
8971d9b932comment our urldb from developer.html since no longer used.
mwells
2013-08-21 08:59:51 -0600
6cf0497c2cadded a little posdb documentation to developer.html. posdb replaced indexdb as the new index because it has word position info as well as word field info.
mwells
2013-08-21 08:40:28 -0600
a2a57addd9try fixing the cpu being slammed in the sigiohandler. seems like signals meaning might have changed in the kernel, etc. over the years. fixed Loop.cpp.
mwells
2013-08-20 14:12:44 -0600
a270a9bc91updated README.md to reference compare.html
mwells
2013-08-19 17:20:30 -0600
7d3cc672c8use ./gb blaster -u <fileofurls> to just inject urls, but use -i to also add the outlinks to spiderdb.
mwells
2013-08-19 16:33:27 -0600
95a020574cset spiderlinks=1 when doing ./gb blaster -i <fileofurls> to index/inject a file of urls so that we add the outlinks to spiderdb. this will slow things down a little since we will have to do a dns lookup of the subdomain of each outlink, unless it is cached.
mwells
2013-08-19 16:15:58 -0600
72d7e42497added a quick start note to admin.html.
mwells
2013-08-19 15:34:07 -0600
24af21394ddns ip fix in gb.conf.
mwells
2013-08-19 15:25:37 -0600
e9297df240listen on DNS port 5998 not 6000. 6000 seemed to cause issues on a particular install for some reason.
mwells
2013-08-19 15:02:27 -0600
eb4758b565fix init error when injecting file of urls.
mwells
2013-08-19 13:34:47 -0600
2c83b96ba4Added support for 'gb blaster -i <fileofurls> <maxThreads>' to inject/index a file of urls. Committing older work for compare.html that shows differences between gigablast and solr, but has a lot of blanks.
mwells
2013-08-19 13:26:46 -0600
5facc7d859add injection timing stat point to compare.html
mwells
2013-08-17 11:06:24 -0600
4092177e5fadded injectme3 file and documentation into compare.html to describe how to inject a file of concatenated HTML documents into gb. Still have to find out how to do that in SOLR and elasticsearch for comparison.
mwells
2013-08-17 11:02:26 -0600
c0e1216022stub for compare.html
Matt Wells
2013-08-17 09:07:48 -0700
f7f377a1f7fixed a core dump in proxy.cpp. make doubly sure protocol is not 1.1 since we have keep alives disabled we need to force protocol to 1.0.
Matt Wells
2013-08-17 08:58:28 -0700
410604c388minor edits
Matt Wells
2013-08-17 08:48:07 -0700
5bac648cc9start up the gigablast blog again.
Matt Wells
2013-08-17 08:44:32 -0700
5a2cb35e6cadded gb.conf.txt and hosts.conf.txt for display from admin.html.
Matt Wells
2013-08-11 00:45:27 -0700
e42afac9d8admin.html documentation updates.
Matt Wells
2013-08-11 00:32:53 -0700
834128a076Fixed heap breaches caused by our bult-in electric fence code from death queries. Use HTTP/1.0 not 1.1 since we disabled keep-alive support a long time ago.
Matt Wells
2013-08-10 09:51:14 -0700
9b94e0feacfix core from huge death query.
Matt Wells
2013-08-09 21:05:38 -0700
ef333f8937fix bug in user accounting system summary stats recs. when sumbuf reallocated the hashtable of ptrs could be invalidated. so use offsets, not ptrs, in the hash table.
Matt Wells
2013-08-09 16:40:12 -0700