5e0a53b909
minor print change
mwells
2013-08-31 10:57:36 -06:00
af46945403
show more info when dumping doledb.
mwells
2013-08-31 10:55:05 -06:00
9696c7936a
Merge branch 'master' into diffbot
Matt Wells
2013-08-30 16:33:00 -07:00
94e6492916
removed MAX_COLL_RECS so we can have unlimited collections, really limited by the sizeof(collnum_t) only now, which is 16bits, 15bits unsigned, which is the limitation. can always expand this so we can have more than 32k collections.
Matt Wells
2013-08-30 16:20:38 -07:00
f6bcaeb76a
minor fix.
mwells
2013-08-30 00:16:30 -06:00
900bbf8fba
try to fix the bug of the spiders kinda getting stuck and now spidering to their max potential because of doledb record annihilations at the top of the spider priority queue in spiderdb of SpiderRequests. was causing lots of re-reads in Msg5.cpp of doledb, like over 300 rounds, very slow.
mwells
2013-08-29 21:59:02 -06:00
2e9c8f7c6e
Merge branch 'master' of github.com:gigablast/open-source-search-engine
mwells
2013-08-29 21:17:46 -06:00
84fae9a3c6
Fix issue of reading spiderrequests from doledb at the very first key in spiderdb. causes lots of positive/negative key annihilations. we end up re-reading like 300 times in some cases just to get a url from a doledb priority.
mwells
2013-08-29 21:16:59 -06:00
ca2a024d04
fixed up thread/spider log msgs. fixed core from calling fprintf in alarm signal missed quickpoll handler.
mwells
2013-08-29 21:15:42 -06:00
e925012dce
change a couple of possible reserved names in C++ to non-reserved names. #define _ADDRESS_H_ to _GB_ADDRESS_H_ etc.
mwells
2013-08-28 22:59:01 -06:00
82ee2dfed7
fix cores when spider is unzipping gzipped web pages.
mwells
2013-08-28 22:49:22 -06:00
80179525c1
when using pthreads block SIGIO so it does not silently kill the gb process because we no longer have a handler for it because it was bogging down the cpu because it went off every time a udp datagram was sent/received and it seemed to have a ton of overhead with it. SIGIO used to be sent when the signal queue was full so we'd resort to polling the file descriptors, so i'm not sure how this will affect us. also updated Threads.cpp to use getpidtid() instead of getpid() to get the thread id when using pthreads, not the process id. using pthreads is now default behaviour even though they suck. we used to use clone() but the newer stuff doesn't allow us to override errno_location anymore.
mwells
2013-08-21 15:01:26 -06:00
6332de2daf
added link to compare.html comparison to SOLR into documentation.
mwells
2013-08-21 13:14:17 -06:00
37a6549a58
updates to developer.html developer documentation. removed a lot of obsolete information. still needs more work.
mwells
2013-08-21 13:09:55 -06:00
8971d9b932
comment our urldb from developer.html since no longer used.
mwells
2013-08-21 08:59:51 -06:00
6cf0497c2c
added a little posdb documentation to developer.html. posdb replaced indexdb as the new index because it has word position info as well as word field info.
mwells
2013-08-21 08:40:28 -06:00
a2a57addd9
try fixing the cpu being slammed in the sigiohandler. seems like signals meaning might have changed in the kernel, etc. over the years. fixed Loop.cpp.
mwells
2013-08-20 14:12:44 -06:00
a270a9bc91
updated README.md to reference compare.html
mwells
2013-08-19 17:20:30 -06:00
7d3cc672c8
use ./gb blaster -u <fileofurls> to just inject urls, but use -i to also add the outlinks to spiderdb.
mwells
2013-08-19 16:33:27 -06:00
95a020574c
set spiderlinks=1 when doing ./gb blaster -i <fileofurls> to index/inject a file of urls so that we add the outlinks to spiderdb. this will slow things down a little since we will have to do a dns lookup of the subdomain of each outlink, unless it is cached.
mwells
2013-08-19 16:15:58 -06:00
72d7e42497
added a quick start note to admin.html.
mwells
2013-08-19 15:34:07 -06:00
24af21394d
dns ip fix in gb.conf.
mwells
2013-08-19 15:25:37 -06:00
e9297df240
listen on DNS port 5998 not 6000. 6000 seemed to cause issues on a particular install for some reason.
mwells
2013-08-19 15:02:27 -06:00
71aa03ab5d
little admin.html update.
mwells
2013-08-19 13:45:43 -06:00
eb4758b565
fix init error when injecting file of urls.
mwells
2013-08-19 13:34:47 -06:00
2c83b96ba4
Added support for 'gb blaster -i <fileofurls> <maxThreads>' to inject/index a file of urls. Committing older work for compare.html that shows differences between gigablast and solr, but has a lot of blanks.
mwells
2013-08-19 13:26:46 -06:00
5facc7d859
add injection timing stat point to compare.html
mwells
2013-08-17 11:06:24 -06:00
4092177e5f
added injectme3 file and documentation into compare.html to describe how to inject a file of concatenated HTML documents into gb. Still have to find out how to do that in SOLR and elasticsearch for comparison.
mwells
2013-08-17 11:02:26 -06:00
c0e1216022
stub for compare.html
Matt Wells
2013-08-17 09:07:48 -07:00
f7f377a1f7
fixed a core dump in proxy.cpp. make doubly sure protocol is not 1.1 since we have keep alives disabled we need to force protocol to 1.0.
Matt Wells
2013-08-17 08:58:28 -07:00
410604c388
minor edits
Matt Wells
2013-08-17 08:48:07 -07:00
5bac648cc9
start up the gigablast blog again.
Matt Wells
2013-08-17 08:44:32 -07:00
5a2cb35e6c
added gb.conf.txt and hosts.conf.txt for display from admin.html.
Matt Wells
2013-08-11 00:45:27 -07:00
e42afac9d8
admin.html documentation updates.
Matt Wells
2013-08-11 00:32:53 -07:00
834128a076
Fixed heap breaches caused by our bult-in electric fence code from death queries. Use HTTP/1.0 not 1.1 since we disabled keep-alive support a long time ago.
Matt Wells
2013-08-10 09:51:14 -07:00
9b94e0feac
fix core from huge death query.
Matt Wells
2013-08-09 21:05:38 -07:00
ef333f8937
fix bug in user accounting system summary stats recs. when sumbuf reallocated the hashtable of ptrs could be invalidated. so use offsets, not ptrs, in the hash table.
Matt Wells
2013-08-09 16:40:12 -07:00
4f4047a3ad
new Make.depend.
mwells
2013-08-09 17:13:45 -06:00
dbefaec10d
Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
Matt Wells
2013-08-09 16:04:00 -07:00
77325fe5fa
Fixed a couple mostly proxy-related cores.
Matt Wells
2013-08-09 16:03:48 -07:00
f15beeb297
Removed another unused variable (tidy make output)w
Steve Cook
2013-08-09 12:37:25 -06:00
5a5050d8d9
extend copyright line in LICENSE file.
Matt Wells
2013-08-09 09:05:09 -07:00
14688f1a7b
remove more temp files
Matt Wells
2013-08-09 08:54:45 -07:00
c386f0d1f0
remove temp file.
Matt Wells
2013-08-09 08:53:36 -07:00
2c7caf0653
Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
Matt Wells
2013-08-09 08:52:23 -07:00
76cf68f7b1
Fixed some bugs.
Matt Wells
2013-08-09 08:52:15 -07:00
fd88c6c3b2
add contact info to README.md.
mwells
2013-08-08 15:31:20 -06:00