Commit Graph

  • b80a70a6fd fix for https urls through proxies using newly updated tcp/loop code. Matt Wells 2015-02-21 09:25:54 -0800
  • cc98589da3 Merge branch 'diffbot-testing' Matt Wells 2015-02-20 08:18:30 -0700
  • 0c8e3b8e62 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-20 08:13:20 -0700
  • ada18e648b try to fix core in reclaiming doledb mem mwells 2015-02-20 08:11:49 -0700
  • 856823e862 fix qa test some. Matt 2015-02-19 20:18:30 -0700
  • ac4bc2842f Merge branch 'diffbot' into diffbot-testing Matt Wells 2015-02-19 18:09:43 -0800
  • 480072274d emergency proxy fixes Matt Wells 2015-02-19 12:49:42 -0800
  • f15e7fbaf4 show total link overflows in spiderdb in the page of stats. Matt 2015-02-18 19:18:38 -0700
  • 860ff24227 do not cache winner list if the # of requests from the IP is less than 25k about. Matt 2015-02-18 19:14:06 -0700
  • dbaff2dfb8 insert collection name for spider status. Matt 2015-02-18 15:18:39 -0700
  • 18f8eddadd fix core from shutting down the server. Matt 2015-02-18 09:41:40 -0700
  • c894146e29 redhat cores in xmldoc.o if compiled with -O3 so use -O2 Matt 2015-02-18 08:32:07 -0700
  • 913102c48c Merge branch 'testing' Matt 2015-02-18 07:26:01 -0700
  • e7d0687f4c Merge branch 'diffbot-matt' into diffbot-testing mwells 2015-02-17 22:50:09 -0700
  • 2bcbfd6852 fix double close/destroy bug. mwells 2015-02-17 22:49:44 -0700
  • cce2308bcf Merge branch 'diffbot-matt' into diffbot-testing Matt Wells 2015-02-17 20:16:49 -0800
  • ce8c97e8e2 fix log spam Matt 2015-02-17 20:55:36 -0700
  • ef99aabf4d try to fix qainject1 core in qa.cpp Matt 2015-02-17 20:17:59 -0700
  • dce8d9f930 fix qa bug of not resetting s_i. fix tcpserver.cpp bug of destroying a streaming socket after what is really not the final write. Matt 2015-02-17 20:10:13 -0700
  • d14cb2d5b0 fix debug log msgs. Matt 2015-02-17 19:15:43 -0700
  • 2488c1a338 added proper write callback registration into TcpServer.cpp so we only register write callbacks when a non-blocking write does not write all the bytes requested of it, or when a connection does not complete. also fixed up the sslHandshake() function which calls SSL_connect(). Matt 2015-02-16 14:48:39 -0700
  • db4fcb30f8 limit downloaded doc size to something under the MAX_DGRAMS limit so msg13 won't core trying to send the reply back. Matt 2015-02-16 09:43:39 -0700
  • c9b4dc66a8 show ignored query words in xml and json. show prettier in html. Matt 2015-02-13 17:34:31 -0800
  • cd9c158199 loop.cpp cleanups. make it so non-linux os will break out of the select() loop eventually even if select() only gets EINTRs all the time. so we can process shutdown cmd. save ips.txt again for qatest123 qa collection. do not use winnerlist cache when we have 'sitepages' url filter expression. it messes it up. Matt 2015-02-13 12:07:10 -0800
  • b891f2ff22 format updates for qa tool Matt 2015-02-12 17:19:14 -0800
  • 596a674c61 fixes for rebuilding the active list in SpiderLoop class. Matt Wells 2015-02-12 17:00:38 -0800
  • 24eac820d5 fixed bad deletenode call causing dups in winnertree. Matt 2015-02-12 16:12:23 -0800
  • 579a08d287 fixed link overflow logic. Matt 2015-02-12 15:03:01 -0800
  • 735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() Matt 2015-02-12 14:23:16 -0800
  • 415c96fc56 added overflow checks to ensure we don't have more than 10M unique urls for a given "firstip" queued up to be spidered in spiderdb that have never been spidered. should prevent us from having 20GB spiderdbs for spidering those sites that essentially have an infinite # of urls, black hole sites, that seems to be plaguing crawls. Matt 2015-02-12 13:41:40 -0800
  • c8fb1af5c4 added tree mem reclaimer for doledb since it is now a tree-only rdb. Matt 2015-02-12 12:12:25 -0800
  • 04cc8adbdd fix &admin=0 so it works again Matt 2015-02-12 11:16:34 -0800
  • c009430b6c more fixes for new spider updates Matt 2015-02-11 21:54:36 -0800
  • b12913ed83 only add urls we should spider to our own doledbtree Matt 2015-02-11 19:27:28 -0800
  • 9ea53ed89e bug fixes. spidering seems to work somewhat again. Matt 2015-02-11 19:23:36 -0800
  • 30a77dd422 checkpoint on massive spidering speed ups. Matt 2015-02-11 17:55:28 -0800
  • f6723ddaa3 new much faster spider. cache the winner tree basically. TODO: need to update cache if new spiderrequests are added that should be in the cached winner tree. Matt 2015-02-10 21:27:21 -0800
  • 5c31dbda9a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-10 21:07:02 -0700
  • 5b538e7cee fix core in linkdb logic mwells 2015-02-10 21:06:47 -0700
  • 7909df5b5e Merge branch 'diffbot' into diffbot-testing Matt Wells 2015-02-10 12:21:29 -0800
  • acbf4c582f show sigpipes and sigios for help debugging Matt Wells 2015-02-10 12:20:32 -0800
  • 12cdc7c9d4 more spider speed ups based on profiler data. added Rdb::getCollNumTotalRecs() function. Matt 2015-02-10 12:00:04 -0800
  • 4c7ee42dd9 speed up spiderDoledUrls() loop calling of gettimeofdayInMillisecondsSynced() using g_clockNeedsUpdate logic. Matt 2015-02-10 11:47:53 -0800
  • 18d449c681 show pause message before next round to start msg. Matt Wells 2015-02-09 16:14:59 -0800
  • 01687fcb0e fix gb thrutest disk tests Matt Wells 2015-02-09 10:29:08 -0800
  • 1cdd3da6ee updated readme.md Matt 2015-02-08 22:57:49 -0800
  • 9bb0dfd2f1 update installation instructions for redhat/fedora Matt 2015-02-08 22:17:13 -0800
  • c60acf5f40 updates to makefile for bldg packages Matt 2015-02-08 21:24:15 -0800
  • 8edc83f085 Merge branch 'testing' Matt 2015-02-08 21:09:59 -0800
  • b40ee75187 fix core from certain queries mwells 2015-02-08 22:06:28 -0700
  • 201b234217 use STDERR_FILENO not 2. Matt 2015-02-08 20:02:05 -0800
  • b78fb5a40f fix log to /dev/stderr on redhat Matt 2015-02-08 19:57:22 -0800
  • 2c0e0a57e6 Merge branch 'testing' Matt 2015-02-08 19:44:37 -0800
  • 5eeeaef446 do not compile redhat's gb with -static. even if we yum install the static libs there's still problems. Matt 2015-02-08 19:43:32 -0800
  • 5e752e78ef add 'more from this site' link back to results. mwells 2015-02-08 18:13:48 -0700
  • 53bfd960c5 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing mwells 2015-02-08 16:05:17 -0700
  • bccdd6b65a fix site cluster by default parm bug mwells 2015-02-08 16:05:04 -0700
  • 047a469552 fix bug with sitelinks.txt Matt 2015-02-07 15:24:19 -0800
  • 4918ce1090 faq updates Matt 2015-02-07 15:14:11 -0800
  • 8edc382888 fix makefile for pkg builds mwells 2015-02-07 15:37:27 -0700
  • 8fff54621c doc updates Matt 2015-02-07 12:13:40 -0800
  • 67a143864c take out add gigablast to your browser's search engines for now Matt 2015-02-07 12:10:43 -0800
  • 9327ebf61f take out FEED link for now Matt 2015-02-07 12:09:21 -0800
  • afbe35c5a9 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-02-07 12:07:52 -0800
  • 580736d766 support arc injections Matt 2015-02-07 12:07:42 -0800
  • aff7e49db2 fix case bug mwells 2015-02-06 19:55:45 -0700
  • 85b244337c fix parm out of band core. fix hostdb conf symlink bug. Matt Wells 2015-02-06 15:35:00 -0800
  • f2a87358e6 try to speed up threads more Matt Wells 2015-02-05 15:00:18 -0800
  • 6c1c2c66c4 added dstart to gb -h help menu Matt 2015-02-05 12:39:13 -0800
  • 9f22e268a2 try to fix crawlbot nightly smoke tests Matt 2015-02-05 12:29:43 -0800
  • e9c36d1f75 comment update Matt 2015-02-05 10:35:52 -0800
  • b0f81b848c fix flush bug mwells 2015-02-04 10:13:34 -0700
  • e426877eea make a note of obscure condition Matt 2015-02-03 19:45:18 -0800
  • 3e1cc9a450 fix bug of parms being set at seemingly random. Matt 2015-02-03 17:52:44 -0800
  • 76ec7f3a4a add # of tcp connections to hosts table Matt 2015-02-03 14:14:17 -0800
  • a6435bb210 miscellaneous spider/injection speedups. Matt 2015-02-03 14:04:53 -0800
  • f70d533525 make threads enabled for disk the default setting now that creating threads should be much faster. Matt 2015-02-03 13:43:13 -0800
  • 93fce690d6 more speedups. do not calls sigprocmask in main thread before pthread_create(). instead call pthread_sigmask() from thread like we were doing already for SIGINT. Matt 2015-02-03 13:39:23 -0800
  • 3badbb69f4 fix injection bug Matt 2015-02-03 13:00:47 -0800
  • cbf913522d make pthread_create() and pthread_join() faster by supplying our own guarded stack. Matt 2015-02-03 11:58:44 -0800
  • 7e4d39b870 update proxy control desc. Matt 2015-02-02 14:20:11 -0800
  • 6fc83566e2 more fixes Matt 2015-02-02 14:06:38 -0800
  • 739b296cf2 fix proxy bugs Matt 2015-02-02 13:29:52 -0800
  • c15bd53e52 added support for supplying basic proxy authorization to spider proxies. username:password@1.2.3.4:80 Matt 2015-02-02 13:23:38 -0800
  • e2bfa8ef52 fix core Matt 2015-02-02 09:28:21 -0700
  • 784eb53727 show index body parm mwells 2015-02-01 20:49:27 -0700
  • 3a146dddc0 fix scoring when doing facets or numeric termlists. Matt 2015-02-01 09:14:28 -0700
  • 79a1d632cd need to have sitelinks.txt present in dir. Matt 2015-01-31 22:58:05 -0700
  • b2a030f202 fix scoring bug in serps Matt 2015-01-31 18:45:36 -0700
  • c0332d4381 fix qa Matt 2015-01-31 18:42:31 -0700
  • 4a642fd2e9 log less Matt 2015-01-31 18:20:27 -0700
  • e8948cea65 Merge branch 'diffbot-testing' into testing Matt 2015-01-31 15:35:02 -0700
  • f4ca6d8cd4 try ddomain only urls with www. when looking up in sitelinks.txt Matt 2015-01-31 15:33:37 -0700
  • 0c3ad724f8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-01-31 15:18:30 -0700
  • cad1d3d076 added support for sitelinks.txt file Matt 2015-01-31 15:18:06 -0700
  • 39beee8b22 more timeout fixes Matt 2015-01-31 09:51:15 -0700
  • 0d0951284d fix core when host is down for > 1000 secs while spidering Matt 2015-01-31 09:22:24 -0700
  • 3a003908ea time link info fetching Matt 2015-01-31 08:53:13 -0700
  • 62df34bacc fix bug of gbsortbyint:gbspiderdate based query not working Matt Wells 2015-01-30 15:53:06 -0800
  • ae95b30559 more tagdb lookup fixes Matt 2015-01-29 22:13:58 -0700