Commit Graph

  • 657d669ec8 exclude events and seo functionality. most people want this for web search so it should be a non-issue. mwells 2013-09-08 17:07:42 -0600
  • 34b6d3e74a fixed some cores. brought in fixes from old repo. mwells 2013-09-08 16:16:13 -0600
  • dcf45dd69d dump out doledb to disk when it has more than 50,000 negative keys to avoid positive/negative key annihilations delays. mwells 2013-09-08 15:09:54 -0600
  • 03706131fe documentation updates in Spider.h. Matt Wells 2013-09-08 13:42:02 -0700
  • 54c9353dbd try to fix core from g_inSigHandler being set. it should never be set since we do not use real time signals any more mwells 2013-09-08 12:34:37 -0600
  • 0581f86265 fix core from calling a gettime related function from a pthread when a signal handler from the main thread was in use and POSSIBLY in the same function when the signal went off. different threads should be able to access that function just fine i'd imagine. mwells 2013-09-06 15:39:53 -0600
  • 7aa81abf91 use the "onsite" keyword in your url filters instead of this "only spider links from same host" switch to keep things simpler. mwells 2013-09-06 09:37:17 -0600
  • c58df10155 fix major bug causing spiders not to work. Matt Wells 2013-09-04 11:01:24 -0700
  • 91c4e768b1 more family filter fixes mwells 2013-09-01 18:28:49 -0600
  • aaf333c46c try to get family filter (&ff=1) working again to filter out adult search results. mwells 2013-09-01 18:22:38 -0600
  • afbd1e2b96 fix core from trying to get the time while in a sig handler. getTime() is not async safe. mwells 2013-09-01 12:55:22 -0600
  • 93dfb0cfd4 fix for the "spiders stuck" fix. mwells 2013-08-31 11:25:26 -0600
  • 5e0a53b909 minor print change mwells 2013-08-31 10:57:36 -0600
  • af46945403 show more info when dumping doledb. mwells 2013-08-31 10:55:05 -0600
  • 9696c7936a Merge branch 'master' into diffbot Matt Wells 2013-08-30 16:33:00 -0700
  • 94e6492916 removed MAX_COLL_RECS so we can have unlimited collections, really limited by the sizeof(collnum_t) only now, which is 16bits, 15bits unsigned, which is the limitation. can always expand this so we can have more than 32k collections. Matt Wells 2013-08-30 16:20:38 -0700
  • f6bcaeb76a minor fix. mwells 2013-08-30 00:16:30 -0600
  • 900bbf8fba try to fix the bug of the spiders kinda getting stuck and now spidering to their max potential because of doledb record annihilations at the top of the spider priority queue in spiderdb of SpiderRequests. was causing lots of re-reads in Msg5.cpp of doledb, like over 300 rounds, very slow. mwells 2013-08-29 21:59:02 -0600
  • 2e9c8f7c6e Merge branch 'master' of github.com:gigablast/open-source-search-engine mwells 2013-08-29 21:17:46 -0600
  • 84fae9a3c6 Fix issue of reading spiderrequests from doledb at the very first key in spiderdb. causes lots of positive/negative key annihilations. we end up re-reading like 300 times in some cases just to get a url from a doledb priority. mwells 2013-08-29 21:16:59 -0600
  • ca2a024d04 fixed up thread/spider log msgs. fixed core from calling fprintf in alarm signal missed quickpoll handler. mwells 2013-08-29 21:15:42 -0600
  • e925012dce change a couple of possible reserved names in C++ to non-reserved names. #define _ADDRESS_H_ to _GB_ADDRESS_H_ etc. mwells 2013-08-28 22:59:01 -0600
  • 82ee2dfed7 fix cores when spider is unzipping gzipped web pages. mwells 2013-08-28 22:49:22 -0600
  • 80179525c1 when using pthreads block SIGIO so it does not silently kill the gb process because we no longer have a handler for it because it was bogging down the cpu because it went off every time a udp datagram was sent/received and it seemed to have a ton of overhead with it. SIGIO used to be sent when the signal queue was full so we'd resort to polling the file descriptors, so i'm not sure how this will affect us. also updated Threads.cpp to use getpidtid() instead of getpid() to get the thread id when using pthreads, not the process id. using pthreads is now default behaviour even though they suck. we used to use clone() but the newer stuff doesn't allow us to override errno_location anymore. mwells 2013-08-21 15:01:26 -0600
  • 6332de2daf added link to compare.html comparison to SOLR into documentation. mwells 2013-08-21 13:14:17 -0600
  • 37a6549a58 updates to developer.html developer documentation. removed a lot of obsolete information. still needs more work. mwells 2013-08-21 13:09:55 -0600
  • 8971d9b932 comment our urldb from developer.html since no longer used. mwells 2013-08-21 08:59:51 -0600
  • 6cf0497c2c added a little posdb documentation to developer.html. posdb replaced indexdb as the new index because it has word position info as well as word field info. mwells 2013-08-21 08:40:28 -0600
  • a2a57addd9 try fixing the cpu being slammed in the sigiohandler. seems like signals meaning might have changed in the kernel, etc. over the years. fixed Loop.cpp. mwells 2013-08-20 14:12:44 -0600
  • a270a9bc91 updated README.md to reference compare.html mwells 2013-08-19 17:20:30 -0600
  • 7d3cc672c8 use ./gb blaster -u <fileofurls> to just inject urls, but use -i to also add the outlinks to spiderdb. mwells 2013-08-19 16:33:27 -0600
  • 3550bf2d8a compare.html update. mwells 2013-08-19 16:21:01 -0600
  • 95a020574c set spiderlinks=1 when doing ./gb blaster -i <fileofurls> to index/inject a file of urls so that we add the outlinks to spiderdb. this will slow things down a little since we will have to do a dns lookup of the subdomain of each outlink, unless it is cached. mwells 2013-08-19 16:15:58 -0600
  • 72d7e42497 added a quick start note to admin.html. mwells 2013-08-19 15:34:07 -0600
  • 24af21394d dns ip fix in gb.conf. mwells 2013-08-19 15:25:37 -0600
  • e9297df240 listen on DNS port 5998 not 6000. 6000 seemed to cause issues on a particular install for some reason. mwells 2013-08-19 15:02:27 -0600
  • 71aa03ab5d little admin.html update. mwells 2013-08-19 13:45:43 -0600
  • eb4758b565 fix init error when injecting file of urls. mwells 2013-08-19 13:34:47 -0600
  • 2c83b96ba4 Added support for 'gb blaster -i <fileofurls> <maxThreads>' to inject/index a file of urls. Committing older work for compare.html that shows differences between gigablast and solr, but has a lot of blanks. mwells 2013-08-19 13:26:46 -0600
  • 5facc7d859 add injection timing stat point to compare.html mwells 2013-08-17 11:06:24 -0600
  • 4092177e5f added injectme3 file and documentation into compare.html to describe how to inject a file of concatenated HTML documents into gb. Still have to find out how to do that in SOLR and elasticsearch for comparison. mwells 2013-08-17 11:02:26 -0600
  • c0e1216022 stub for compare.html Matt Wells 2013-08-17 09:07:48 -0700
  • f7f377a1f7 fixed a core dump in proxy.cpp. make doubly sure protocol is not 1.1 since we have keep alives disabled we need to force protocol to 1.0. Matt Wells 2013-08-17 08:58:28 -0700
  • 410604c388 minor edits Matt Wells 2013-08-17 08:48:07 -0700
  • 5bac648cc9 start up the gigablast blog again. Matt Wells 2013-08-17 08:44:32 -0700
  • 5a2cb35e6c added gb.conf.txt and hosts.conf.txt for display from admin.html. Matt Wells 2013-08-11 00:45:27 -0700
  • e42afac9d8 admin.html documentation updates. Matt Wells 2013-08-11 00:32:53 -0700
  • 834128a076 Fixed heap breaches caused by our bult-in electric fence code from death queries. Use HTTP/1.0 not 1.1 since we disabled keep-alive support a long time ago. Matt Wells 2013-08-10 09:51:14 -0700
  • 651b899453 oops, wrong sign direction. mwells 2013-08-09 22:14:13 -0600
  • 9b94e0feac fix core from huge death query. Matt Wells 2013-08-09 21:05:38 -0700
  • ef333f8937 fix bug in user accounting system summary stats recs. when sumbuf reallocated the hashtable of ptrs could be invalidated. so use offsets, not ptrs, in the hash table. Matt Wells 2013-08-09 16:40:12 -0700
  • 4f4047a3ad new Make.depend. mwells 2013-08-09 17:13:45 -0600
  • dbefaec10d Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-08-09 16:04:00 -0700
  • 77325fe5fa Fixed a couple mostly proxy-related cores. Matt Wells 2013-08-09 16:03:48 -0700
  • f15beeb297 Removed another unused variable (tidy make output)w Steve Cook 2013-08-09 12:37:25 -0600
  • 5a5050d8d9 extend copyright line in LICENSE file. Matt Wells 2013-08-09 09:05:09 -0700
  • 14688f1a7b remove more temp files Matt Wells 2013-08-09 08:54:45 -0700
  • c386f0d1f0 remove temp file. Matt Wells 2013-08-09 08:53:36 -0700
  • 2c7caf0653 Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-08-09 08:52:23 -0700
  • 76cf68f7b1 Fixed some bugs. Matt Wells 2013-08-09 08:52:15 -0700
  • fd88c6c3b2 add contact info to README.md. mwells 2013-08-08 15:31:20 -0600
  • 0b94b31fbc Fix potential core issue in proxy. mwells 2013-08-08 15:14:36 -0600
  • 7d54efef09 remove file addsinprogress.dat mwells 2013-08-08 14:45:16 -0600
  • be7aab78b7 Fixed bugs with running a proxy. Added more comments into hosts.conf. mwells 2013-08-08 14:41:38 -0600
  • d805696040 Merge branch 'master' of github.com:gigablast/open-source-search-engine mwells 2013-08-05 13:07:36 -0600
  • 1b03e77db3 README.md update. mwells 2013-08-05 13:07:09 -0600
  • 1883e392e9 Merge branch 'master' of github.com:gigablast/open-source-search-engine Steve Cook 2013-08-05 10:32:59 -0600
  • 6ac55d3b75 Update Msg13.cpp Gigablast 2013-08-05 10:34:08 -0600
  • 6f64568ef8 A bit of html cleanup and added <pre> style. Steve Cook 2013-08-05 10:31:52 -0600
  • b002233b02 Updated some of the content and put some comments about possibly removing internal gigablast notes. Steve Cook 2013-08-05 10:17:41 -0600
  • 1466b186dc Commented out two unused variables (dt, zadj), to remove compiler warnings. Steve Cook 2013-08-05 10:16:57 -0600
  • d35d337f84 Merge pull request #1 from coconutpilot/master Gigablast 2013-08-05 04:47:47 -0700
  • 78ec7b5ee7 Merge 0783c7395e into f6e560c1f4 David Sparks 2013-08-04 23:26:47 -0700
  • 0783c7395e Copied these global vars from main.cpp to fix compilation error David Sparks 2013-08-04 22:37:01 -0700
  • f6e560c1f4 Initial file population. Matt Wells 2013-08-02 13:12:24 -0700
  • d43acab215 Initial commit Gigablast 2013-08-02 13:07:43 -0700