Commit Graph

  • d14cb2d5b0 fix debug log msgs. Matt 2015-02-17 19:15:43 -07:00
  • 2488c1a338 added proper write callback registration into TcpServer.cpp so we only register write callbacks when a non-blocking write does not write all the bytes requested of it, or when a connection does not complete. also fixed up the sslHandshake() function which calls SSL_connect(). Matt 2015-02-16 14:48:39 -07:00
  • db4fcb30f8 limit downloaded doc size to something under the MAX_DGRAMS limit so msg13 won't core trying to send the reply back. Matt 2015-02-16 09:43:39 -07:00
  • c9b4dc66a8 show ignored query words in xml and json. show prettier in html. Matt 2015-02-13 17:34:31 -08:00
  • cd9c158199 loop.cpp cleanups. make it so non-linux os will break out of the select() loop eventually even if select() only gets EINTRs all the time. so we can process shutdown cmd. save ips.txt again for qatest123 qa collection. do not use winnerlist cache when we have 'sitepages' url filter expression. it messes it up. Matt 2015-02-13 12:07:10 -08:00
  • b891f2ff22 format updates for qa tool Matt 2015-02-12 17:19:14 -08:00
  • 596a674c61 fixes for rebuilding the active list in SpiderLoop class. Matt Wells 2015-02-12 17:00:38 -08:00
  • 24eac820d5 fixed bad deletenode call causing dups in winnertree. Matt 2015-02-12 16:12:23 -08:00
  • 579a08d287 fixed link overflow logic. Matt 2015-02-12 15:03:01 -08:00
  • 735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() Matt 2015-02-12 14:23:16 -08:00
  • 415c96fc56 added overflow checks to ensure we don't have more than 10M unique urls for a given "firstip" queued up to be spidered in spiderdb that have never been spidered. should prevent us from having 20GB spiderdbs for spidering those sites that essentially have an infinite # of urls, black hole sites, that seems to be plaguing crawls. Matt 2015-02-12 13:41:40 -08:00
  • c8fb1af5c4 added tree mem reclaimer for doledb since it is now a tree-only rdb. Matt 2015-02-12 12:12:25 -08:00
  • 04cc8adbdd fix &admin=0 so it works again Matt 2015-02-12 11:16:34 -08:00
  • c009430b6c more fixes for new spider updates Matt 2015-02-11 21:54:36 -08:00
  • b12913ed83 only add urls we should spider to our own doledbtree Matt 2015-02-11 19:27:28 -08:00
  • 9ea53ed89e bug fixes. spidering seems to work somewhat again. Matt 2015-02-11 19:23:36 -08:00
  • 30a77dd422 checkpoint on massive spidering speed ups. Matt 2015-02-11 17:55:28 -08:00
  • f6723ddaa3 new much faster spider. cache the winner tree basically. TODO: need to update cache if new spiderrequests are added that should be in the cached winner tree. Matt 2015-02-10 21:27:21 -08:00
  • 5c31dbda9a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-10 21:07:02 -07:00
  • 5b538e7cee fix core in linkdb logic mwells 2015-02-10 21:06:47 -07:00
  • 7909df5b5e Merge branch 'diffbot' into diffbot-testing Matt Wells 2015-02-10 12:21:29 -08:00
  • acbf4c582f show sigpipes and sigios for help debugging Matt Wells 2015-02-10 12:20:32 -08:00
  • 12cdc7c9d4 more spider speed ups based on profiler data. added Rdb::getCollNumTotalRecs() function. Matt 2015-02-10 12:00:04 -08:00
  • 4c7ee42dd9 speed up spiderDoledUrls() loop calling of gettimeofdayInMillisecondsSynced() using g_clockNeedsUpdate logic. Matt 2015-02-10 11:47:53 -08:00
  • 18d449c681 show pause message before next round to start msg. Matt Wells 2015-02-09 16:14:59 -08:00
  • 01687fcb0e fix gb thrutest disk tests Matt Wells 2015-02-09 10:29:08 -08:00
  • 1cdd3da6ee updated readme.md Matt 2015-02-08 22:57:49 -08:00
  • 9bb0dfd2f1 update installation instructions for redhat/fedora Matt 2015-02-08 22:17:13 -08:00
  • c60acf5f40 updates to makefile for bldg packages Matt 2015-02-08 21:24:15 -08:00
  • 8edc83f085 Merge branch 'testing' Matt 2015-02-08 21:09:59 -08:00
  • b40ee75187 fix core from certain queries mwells 2015-02-08 22:06:28 -07:00
  • 201b234217 use STDERR_FILENO not 2. Matt 2015-02-08 20:02:05 -08:00
  • b78fb5a40f fix log to /dev/stderr on redhat Matt 2015-02-08 19:57:22 -08:00
  • 2c0e0a57e6 Merge branch 'testing' Matt 2015-02-08 19:44:37 -08:00
  • 5eeeaef446 do not compile redhat's gb with -static. even if we yum install the static libs there's still problems. Matt 2015-02-08 19:43:32 -08:00
  • 5e752e78ef add 'more from this site' link back to results. mwells 2015-02-08 18:13:48 -07:00
  • 53bfd960c5 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-08 16:05:17 -07:00
  • bccdd6b65a fix site cluster by default parm bug mwells 2015-02-08 16:05:04 -07:00
  • 047a469552 fix bug with sitelinks.txt Matt 2015-02-07 15:24:19 -08:00
  • 4918ce1090 faq updates Matt 2015-02-07 15:14:11 -08:00
  • 8edc382888 fix makefile for pkg builds mwells 2015-02-07 15:37:27 -07:00
  • 8fff54621c doc updates Matt 2015-02-07 12:13:40 -08:00
  • 67a143864c take out add gigablast to your browser's search engines for now Matt 2015-02-07 12:10:43 -08:00
  • 9327ebf61f take out FEED link for now Matt 2015-02-07 12:09:21 -08:00
  • afbe35c5a9 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-02-07 12:07:52 -08:00
  • 580736d766 support arc injections Matt 2015-02-07 12:07:42 -08:00
  • aff7e49db2 fix case bug mwells 2015-02-06 19:55:45 -07:00
  • 85b244337c fix parm out of band core. fix hostdb conf symlink bug. Matt Wells 2015-02-06 15:35:00 -08:00
  • f2a87358e6 try to speed up threads more Matt Wells 2015-02-05 15:00:18 -08:00
  • 6c1c2c66c4 added dstart to gb -h help menu Matt 2015-02-05 12:39:13 -08:00
  • 9f22e268a2 try to fix crawlbot nightly smoke tests Matt 2015-02-05 12:29:43 -08:00
  • e9c36d1f75 comment update Matt 2015-02-05 10:35:52 -08:00
  • b0f81b848c fix flush bug mwells 2015-02-04 10:13:34 -07:00
  • e426877eea make a note of obscure condition Matt 2015-02-03 19:45:18 -08:00
  • 3e1cc9a450 fix bug of parms being set at seemingly random. Matt 2015-02-03 17:52:44 -08:00
  • 76ec7f3a4a add # of tcp connections to hosts table Matt 2015-02-03 14:14:17 -08:00
  • a6435bb210 miscellaneous spider/injection speedups. Matt 2015-02-03 14:04:53 -08:00
  • f70d533525 make threads enabled for disk the default setting now that creating threads should be much faster. Matt 2015-02-03 13:43:13 -08:00
  • 93fce690d6 more speedups. do not calls sigprocmask in main thread before pthread_create(). instead call pthread_sigmask() from thread like we were doing already for SIGINT. Matt 2015-02-03 13:39:23 -08:00
  • 3badbb69f4 fix injection bug Matt 2015-02-03 13:00:47 -08:00
  • cbf913522d make pthread_create() and pthread_join() faster by supplying our own guarded stack. Matt 2015-02-03 11:58:44 -08:00
  • 7e4d39b870 update proxy control desc. Matt 2015-02-02 14:20:11 -08:00
  • 6fc83566e2 more fixes Matt 2015-02-02 14:06:38 -08:00
  • 739b296cf2 fix proxy bugs Matt 2015-02-02 13:29:52 -08:00
  • c15bd53e52 added support for supplying basic proxy authorization to spider proxies. username:password@1.2.3.4:80 Matt 2015-02-02 13:23:38 -08:00
  • e2bfa8ef52 fix core Matt 2015-02-02 09:28:21 -07:00
  • 784eb53727 show index body parm mwells 2015-02-01 20:49:27 -07:00
  • 3a146dddc0 fix scoring when doing facets or numeric termlists. Matt 2015-02-01 09:14:28 -07:00
  • 79a1d632cd need to have sitelinks.txt present in dir. Matt 2015-01-31 22:58:05 -07:00
  • b2a030f202 fix scoring bug in serps Matt 2015-01-31 18:45:36 -07:00
  • c0332d4381 fix qa Matt 2015-01-31 18:42:31 -07:00
  • 4a642fd2e9 log less Matt 2015-01-31 18:20:27 -07:00
  • e8948cea65 Merge branch 'diffbot-testing' into testing Matt 2015-01-31 15:35:02 -07:00
  • f4ca6d8cd4 try ddomain only urls with www. when looking up in sitelinks.txt Matt 2015-01-31 15:33:37 -07:00
  • 0c3ad724f8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-01-31 15:18:30 -07:00
  • cad1d3d076 added support for sitelinks.txt file Matt 2015-01-31 15:18:06 -07:00
  • 39beee8b22 more timeout fixes Matt 2015-01-31 09:51:15 -07:00
  • 0d0951284d fix core when host is down for > 1000 secs while spidering Matt 2015-01-31 09:22:24 -07:00
  • 3a003908ea time link info fetching Matt 2015-01-31 08:53:13 -07:00
  • 62df34bacc fix bug of gbsortbyint:gbspiderdate based query not working Matt Wells 2015-01-30 15:53:06 -08:00
  • ae95b30559 more tagdb lookup fixes Matt 2015-01-29 22:13:58 -07:00
  • e40425e0ee dont drop tagdb lookups Matt 2015-01-29 22:04:53 -07:00
  • 2019474e4e fix Matt 2015-01-29 21:48:16 -07:00
  • 7b19110433 no limit to tagdb lookups even if niceness 1 in udpserver.cpp so we do not discard them if we got a lot of udp slots in use Matt 2015-01-29 21:38:10 -07:00
  • 2fc7d464a1 try to handle those quick tagdb lookups first. Matt 2015-01-29 20:55:02 -07:00
  • 1eb9fdc658 fix some cores. fix debug log linkdb stuff. Matt 2015-01-29 19:42:29 -07:00
  • a87b582145 little fix Matt 2015-01-29 19:26:15 -07:00
  • e81b3a19ea fix time_t bugs. Matt 2015-01-29 19:08:27 -07:00
  • c9702d768c handle sigquit abrt sys to see if those are why gb exits abruptly Matt Wells 2015-01-29 09:39:10 -08:00
  • d91909c355 label safebuf mem Matt Wells 2015-01-29 10:35:06 -07:00
  • 0a3d26f893 added support for &nf=50 to limit to top 50 facets. Matt Wells 2015-01-29 10:34:22 -07:00
  • 2cb37042d2 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2015-01-29 09:48:38 -07:00
  • 507c607b39 return ENOPERM on certain pages if not master/coll admin. Matt Wells 2015-01-29 09:46:48 -07:00
  • f76f0c77d8 fix core while streaming and getting EPIPE. but it seems like firefox and firefox only has a bug it in when streaming json with high start values like &s=20000&q=type:json Matt Wells 2015-01-28 19:53:38 -08:00
  • 2501292516 fix core Matt Wells 2015-01-26 17:16:51 -08:00
  • b6115dee52 minor date update Matt 2015-01-25 22:41:13 -07:00
  • 0abcbf0ff0 doc updates Matt 2015-01-25 22:40:10 -07:00
  • bdc7625e11 64 bit package builds Matt 2015-01-25 22:33:37 -07:00
  • daa46dcec5 build fixes Matt 2015-01-25 21:50:35 -07:00
  • 3b1d3a64c8 Merge branch 'testing' Matt 2015-01-25 20:36:07 -07:00