Commit Graph

  • b891f2ff22 format updates for qa tool Matt 2015-02-12 17:19:14 -08:00
  • 596a674c61 fixes for rebuilding the active list in SpiderLoop class. Matt Wells 2015-02-12 17:00:38 -08:00
  • 24eac820d5 fixed bad deletenode call causing dups in winnertree. Matt 2015-02-12 16:12:23 -08:00
  • 579a08d287 fixed link overflow logic. Matt 2015-02-12 15:03:01 -08:00
  • 735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() Matt 2015-02-12 14:23:16 -08:00
  • 415c96fc56 added overflow checks to ensure we don't have more than 10M unique urls for a given "firstip" queued up to be spidered in spiderdb that have never been spidered. should prevent us from having 20GB spiderdbs for spidering those sites that essentially have an infinite # of urls, black hole sites, that seems to be plaguing crawls. Matt 2015-02-12 13:41:40 -08:00
  • c8fb1af5c4 added tree mem reclaimer for doledb since it is now a tree-only rdb. Matt 2015-02-12 12:12:25 -08:00
  • 04cc8adbdd fix &admin=0 so it works again Matt 2015-02-12 11:16:34 -08:00
  • c009430b6c more fixes for new spider updates Matt 2015-02-11 21:54:36 -08:00
  • b12913ed83 only add urls we should spider to our own doledbtree Matt 2015-02-11 19:27:28 -08:00
  • 9ea53ed89e bug fixes. spidering seems to work somewhat again. Matt 2015-02-11 19:23:36 -08:00
  • 30a77dd422 checkpoint on massive spidering speed ups. Matt 2015-02-11 17:55:28 -08:00
  • f6723ddaa3 new much faster spider. cache the winner tree basically. TODO: need to update cache if new spiderrequests are added that should be in the cached winner tree. Matt 2015-02-10 21:27:21 -08:00
  • 5c31dbda9a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-10 21:07:02 -07:00
  • 5b538e7cee fix core in linkdb logic mwells 2015-02-10 21:06:47 -07:00
  • 7909df5b5e Merge branch 'diffbot' into diffbot-testing Matt Wells 2015-02-10 12:21:29 -08:00
  • acbf4c582f show sigpipes and sigios for help debugging Matt Wells 2015-02-10 12:20:32 -08:00
  • 12cdc7c9d4 more spider speed ups based on profiler data. added Rdb::getCollNumTotalRecs() function. Matt 2015-02-10 12:00:04 -08:00
  • 4c7ee42dd9 speed up spiderDoledUrls() loop calling of gettimeofdayInMillisecondsSynced() using g_clockNeedsUpdate logic. Matt 2015-02-10 11:47:53 -08:00
  • 18d449c681 show pause message before next round to start msg. Matt Wells 2015-02-09 16:14:59 -08:00
  • 01687fcb0e fix gb thrutest disk tests Matt Wells 2015-02-09 10:29:08 -08:00
  • 1cdd3da6ee updated readme.md Matt 2015-02-08 22:57:49 -08:00
  • 9bb0dfd2f1 update installation instructions for redhat/fedora Matt 2015-02-08 22:17:13 -08:00
  • c60acf5f40 updates to makefile for bldg packages Matt 2015-02-08 21:24:15 -08:00
  • 8edc83f085 Merge branch 'testing' Matt 2015-02-08 21:09:59 -08:00
  • b40ee75187 fix core from certain queries mwells 2015-02-08 22:06:28 -07:00
  • 201b234217 use STDERR_FILENO not 2. Matt 2015-02-08 20:02:05 -08:00
  • b78fb5a40f fix log to /dev/stderr on redhat Matt 2015-02-08 19:57:22 -08:00
  • 2c0e0a57e6 Merge branch 'testing' Matt 2015-02-08 19:44:37 -08:00
  • 5eeeaef446 do not compile redhat's gb with -static. even if we yum install the static libs there's still problems. Matt 2015-02-08 19:43:32 -08:00
  • 5e752e78ef add 'more from this site' link back to results. mwells 2015-02-08 18:13:48 -07:00
  • 53bfd960c5 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-08 16:05:17 -07:00
  • bccdd6b65a fix site cluster by default parm bug mwells 2015-02-08 16:05:04 -07:00
  • 047a469552 fix bug with sitelinks.txt Matt 2015-02-07 15:24:19 -08:00
  • 4918ce1090 faq updates Matt 2015-02-07 15:14:11 -08:00
  • 8edc382888 fix makefile for pkg builds mwells 2015-02-07 15:37:27 -07:00
  • 8fff54621c doc updates Matt 2015-02-07 12:13:40 -08:00
  • 67a143864c take out add gigablast to your browser's search engines for now Matt 2015-02-07 12:10:43 -08:00
  • 9327ebf61f take out FEED link for now Matt 2015-02-07 12:09:21 -08:00
  • afbe35c5a9 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-02-07 12:07:52 -08:00
  • 580736d766 support arc injections Matt 2015-02-07 12:07:42 -08:00
  • aff7e49db2 fix case bug mwells 2015-02-06 19:55:45 -07:00
  • 85b244337c fix parm out of band core. fix hostdb conf symlink bug. Matt Wells 2015-02-06 15:35:00 -08:00
  • f2a87358e6 try to speed up threads more Matt Wells 2015-02-05 15:00:18 -08:00
  • 6c1c2c66c4 added dstart to gb -h help menu Matt 2015-02-05 12:39:13 -08:00
  • 9f22e268a2 try to fix crawlbot nightly smoke tests Matt 2015-02-05 12:29:43 -08:00
  • e9c36d1f75 comment update Matt 2015-02-05 10:35:52 -08:00
  • b0f81b848c fix flush bug mwells 2015-02-04 10:13:34 -07:00
  • e426877eea make a note of obscure condition Matt 2015-02-03 19:45:18 -08:00
  • 3e1cc9a450 fix bug of parms being set at seemingly random. Matt 2015-02-03 17:52:44 -08:00
  • 76ec7f3a4a add # of tcp connections to hosts table Matt 2015-02-03 14:14:17 -08:00
  • a6435bb210 miscellaneous spider/injection speedups. Matt 2015-02-03 14:04:53 -08:00
  • f70d533525 make threads enabled for disk the default setting now that creating threads should be much faster. Matt 2015-02-03 13:43:13 -08:00
  • 93fce690d6 more speedups. do not calls sigprocmask in main thread before pthread_create(). instead call pthread_sigmask() from thread like we were doing already for SIGINT. Matt 2015-02-03 13:39:23 -08:00
  • 3badbb69f4 fix injection bug Matt 2015-02-03 13:00:47 -08:00
  • cbf913522d make pthread_create() and pthread_join() faster by supplying our own guarded stack. Matt 2015-02-03 11:58:44 -08:00
  • 7e4d39b870 update proxy control desc. Matt 2015-02-02 14:20:11 -08:00
  • 6fc83566e2 more fixes Matt 2015-02-02 14:06:38 -08:00
  • 739b296cf2 fix proxy bugs Matt 2015-02-02 13:29:52 -08:00
  • c15bd53e52 added support for supplying basic proxy authorization to spider proxies. username:password@1.2.3.4:80 Matt 2015-02-02 13:23:38 -08:00
  • e2bfa8ef52 fix core Matt 2015-02-02 09:28:21 -07:00
  • 784eb53727 show index body parm mwells 2015-02-01 20:49:27 -07:00
  • 3a146dddc0 fix scoring when doing facets or numeric termlists. Matt 2015-02-01 09:14:28 -07:00
  • 79a1d632cd need to have sitelinks.txt present in dir. Matt 2015-01-31 22:58:05 -07:00
  • b2a030f202 fix scoring bug in serps Matt 2015-01-31 18:45:36 -07:00
  • c0332d4381 fix qa Matt 2015-01-31 18:42:31 -07:00
  • 4a642fd2e9 log less Matt 2015-01-31 18:20:27 -07:00
  • e8948cea65 Merge branch 'diffbot-testing' into testing Matt 2015-01-31 15:35:02 -07:00
  • f4ca6d8cd4 try ddomain only urls with www. when looking up in sitelinks.txt Matt 2015-01-31 15:33:37 -07:00
  • 0c3ad724f8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-01-31 15:18:30 -07:00
  • cad1d3d076 added support for sitelinks.txt file Matt 2015-01-31 15:18:06 -07:00
  • 39beee8b22 more timeout fixes Matt 2015-01-31 09:51:15 -07:00
  • 0d0951284d fix core when host is down for > 1000 secs while spidering Matt 2015-01-31 09:22:24 -07:00
  • 3a003908ea time link info fetching Matt 2015-01-31 08:53:13 -07:00
  • 62df34bacc fix bug of gbsortbyint:gbspiderdate based query not working Matt Wells 2015-01-30 15:53:06 -08:00
  • ae95b30559 more tagdb lookup fixes Matt 2015-01-29 22:13:58 -07:00
  • e40425e0ee dont drop tagdb lookups Matt 2015-01-29 22:04:53 -07:00
  • 2019474e4e fix Matt 2015-01-29 21:48:16 -07:00
  • 7b19110433 no limit to tagdb lookups even if niceness 1 in udpserver.cpp so we do not discard them if we got a lot of udp slots in use Matt 2015-01-29 21:38:10 -07:00
  • 2fc7d464a1 try to handle those quick tagdb lookups first. Matt 2015-01-29 20:55:02 -07:00
  • 1eb9fdc658 fix some cores. fix debug log linkdb stuff. Matt 2015-01-29 19:42:29 -07:00
  • a87b582145 little fix Matt 2015-01-29 19:26:15 -07:00
  • e81b3a19ea fix time_t bugs. Matt 2015-01-29 19:08:27 -07:00
  • c9702d768c handle sigquit abrt sys to see if those are why gb exits abruptly Matt Wells 2015-01-29 09:39:10 -08:00
  • d91909c355 label safebuf mem Matt Wells 2015-01-29 10:35:06 -07:00
  • 0a3d26f893 added support for &nf=50 to limit to top 50 facets. Matt Wells 2015-01-29 10:34:22 -07:00
  • 2cb37042d2 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2015-01-29 09:48:38 -07:00
  • 507c607b39 return ENOPERM on certain pages if not master/coll admin. Matt Wells 2015-01-29 09:46:48 -07:00
  • f76f0c77d8 fix core while streaming and getting EPIPE. but it seems like firefox and firefox only has a bug it in when streaming json with high start values like &s=20000&q=type:json Matt Wells 2015-01-28 19:53:38 -08:00
  • 2501292516 fix core Matt Wells 2015-01-26 17:16:51 -08:00
  • b6115dee52 minor date update Matt 2015-01-25 22:41:13 -07:00
  • 0abcbf0ff0 doc updates Matt 2015-01-25 22:40:10 -07:00
  • bdc7625e11 64 bit package builds Matt 2015-01-25 22:33:37 -07:00
  • daa46dcec5 build fixes Matt 2015-01-25 21:50:35 -07:00
  • 3b1d3a64c8 Merge branch 'testing' Matt 2015-01-25 20:36:07 -07:00
  • ec55540432 fix gb dump sitelinks Matt Wells 2015-01-25 19:33:31 -08:00
  • 7c4a625779 fix dumping tagdb for sitelinks.txt Matt Wells 2015-01-25 18:04:15 -08:00
  • 9b6c56f447 fixes Matt 2015-01-25 18:52:18 -07:00
  • 1ef3932b32 use ./gb dump z main 0 -1 1 to generate sitelinks.txt Matt 2015-01-25 18:45:40 -07:00
  • e7a12fc2e5 fix printing stack trace on core for 64bit gb Matt 2015-01-25 10:52:13 -07:00