Commit Graph

  • fbb7e4fe44 fix coll passwords link Matt 2014-10-28 20:39:30 -0600
  • a92ff328ae Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-10-28 18:25:32 -0700
  • 254ad6895a to to fix OOM core Matt Wells 2014-10-28 18:25:16 -0700
  • 9a12709e39 unfix Matt Wells 2014-10-27 20:31:06 -0700
  • e458d4b9f7 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-10-27 20:23:21 -0700
  • 561f493e11 check for "urls" as well as "seeds" cgi parms before creating collection. Matt Wells 2014-10-27 19:59:59 -0700
  • 5eb51b2d9e fix OOM condition a bit Matt 2014-10-27 16:17:37 -0600
  • a8771f4e95 add 2nd collection passwords link mwells 2014-10-27 11:21:23 -0600
  • d9e8f7e465 Merge pull request #26 from miketung/diffbot-testing Gigablast 2014-10-24 13:46:34 -0600
  • 1dbfb55a8e User needs to specify seeds to create a crawlbot crawl. This fixes 2110. Mike Tung 2014-10-21 18:10:50 -0700
  • 4d4faf3429 added query debug msgs for allocating TopNodes in TopTree. Matt Wells 2014-10-15 14:06:43 -0700
  • 8d8570950b fix EXTREMELY slow unlinks and file renames on morpheus because of sync() cmd. Matt Wells 2014-10-14 11:01:38 -0700
  • d933a3f038 fix &n=10000000 fix some more Matt Wells 2014-10-14 10:04:09 -0700
  • 625c5e5790 fix squid proxy balancing code some more. fix death queries when msg2 had docs wanted of 0 because all termlists were empty. Matt Wells 2014-10-14 08:04:23 -0700
  • 4914098e5c Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-10-13 20:51:42 -0700
  • 393cd90c63 remove log debug msg Matt Wells 2014-10-13 20:51:22 -0700
  • 1f9e38057c Merge pull request #24 from miketung/diffbot-testing Gigablast 2014-10-13 20:50:24 -0600
  • f14552e194 Remove mobile user-agents to prevent fetching mobile version of page. Mike Tung 2014-10-13 19:36:34 -0700
  • ef0e37d5a4 remove debug prints Matt Wells 2014-10-13 17:05:39 -0700
  • bc4750b28c distribute diffbot proxy requests over all hosts. Matt Wells 2014-10-13 16:48:09 -0700
  • 18d3b210ce added some debug logic. (will remove shortly) fix for mike's commit. Matt Wells 2014-10-13 16:36:43 -0700
  • 843e96a759 Merge pull request #23 from miketung/diffbot-testing Gigablast 2014-10-13 16:53:52 -0600
  • 9ad0f84d03 Enable facets for site Mike Tung 2014-10-13 15:41:49 -0700
  • 162e89b2d5 return error if client tries to use https for squid proxy right now. Matt Wells 2014-10-10 07:47:23 -0700
  • f75985d19e Merge branch 'diffbot-testing' into testing mwells 2014-10-10 08:00:44 -0600
  • 9cc2ab0362 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing mwells 2014-10-10 08:00:22 -0600
  • 033a8b80a0 fix core if json item has column not in table when dumping json items as csv. Matt Wells 2014-10-10 07:00:11 -0700
  • 5c7fc3b083 fix OOM for large &n=1000000000 values when searching. just alloc for the docids found, not the docids asked for. Matt Wells 2014-10-09 11:35:35 -0700
  • f483fccc2e if no crawl regex, and it has a crawl pattern consisting of only negative patterns then restrict to domains of seeds mwells 2014-10-09 11:15:33 -0600
  • 8bb3545b71 emergency fixes for out of sockets core and get proxy request timing out causing spider to hang bug. Matt Wells 2014-10-09 07:20:04 -0700
  • 12c01566ca Merge ad640e888e into 487c6e0037 miketung 2014-10-08 17:27:53 +0000
  • 487c6e0037 typo proxyAuth Matt Wells 2014-10-07 19:29:40 -0700
  • ad640e888e Typo proxyAuth Mike Tung 2014-10-07 18:01:24 -0700
  • f5d56b3640 fix json being messed up when doc was banned. Matt Wells 2014-10-07 15:49:10 -0700
  • b0974b81fe make it 500 ms Matt Wells 2014-10-07 14:44:20 -0700
  • 4bdd496db0 reduce delay per banned proxy from 2s to 1s Matt Wells 2014-10-07 14:43:36 -0700
  • 20e6983ac6 fix oops Matt Wells 2014-10-07 14:39:47 -0700
  • 65800b65cf fix so diffbot doesn't timeout due to large floater/proxy backoff crawl delay. append &timeout=MAXCRAWLDELAY to diffbot api url. Matt Wells 2014-10-07 14:32:38 -0700
  • ce61090b52 more fixes for federated search going OOM Matt Wells 2014-10-07 11:03:10 -0700
  • 23922df703 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing mwells 2014-10-07 11:50:40 -0600
  • a503a5c1e4 fix to do federated search over colls sequentially to avoid OOM. mwells 2014-10-07 11:50:14 -0600
  • 7d498c8518 fix oops Matt Wells 2014-10-06 22:35:32 -0700
  • c8af0b82a4 do not gen images if custom crawl Matt Wells 2014-10-06 22:04:29 -0700
  • 3bba6881fa fix gblocal.conf bug Matt Wells 2014-10-06 21:35:54 -0700
  • 37e02dc831 allow for query delete of diffbot json child docs mwells 2014-10-06 18:11:08 -0600
  • 10370d6419 Merge branch 'diffbot-testing' into testing mwells 2014-10-06 17:26:59 -0600
  • 016cadfe30 Merge pull request #21 from miketung/diffbot-testing Gigablast 2014-10-06 16:58:16 -0600
  • 837974e0ec Valid JSON output for showinput=1. Mike Tung 2014-10-05 21:15:08 -0700
  • e1b9271e2a nothing mwells 2014-10-03 14:41:09 -0700
  • 2de70798fb backwards compatible hack since we change the parm object type of master passwords/ips. mwells 2014-10-03 14:36:58 -0700
  • 092d8484f8 fix gbsortbyINT:gbsitenuminlinks mwells 2014-10-02 18:46:06 -0700
  • 7b58e2bd94 added 'gb dump' usage back to gb -h. mwells 2014-10-02 16:49:08 -0700
  • dc5b1408bc threads update for more warning msgs and to save thread enabled status to gb.conf. Matt Wells 2014-10-02 16:11:51 -0700
  • 2256c1fdba disable threads for disk and intersect/merge by default for better performance since pthreads suck. Matt Wells 2014-10-02 15:24:40 -0700
  • 6a650a3856 fixed ssl CONNECT reply from being truncated. after receiving the CONNECT reply of ok we had to reset the total bytes to read to 0 to avoid using the old value and truncating the follow up reply of the actual webserver response. Matt Wells 2014-10-02 14:47:55 -0700
  • b7e31af499 fix pageget.cpp Matt Wells 2014-10-02 12:45:58 -0700
  • 7df7fbe721 support the CONNECT for gb squid proxy Matt Wells 2014-10-02 12:36:43 -0700
  • 4e7152b487 fix more bugs in squid proxy implementation. force squid proxy stack to use floaters. mwells 2014-10-02 11:54:50 -0700
  • 42b891219d several fixes for floater proxy through squid proxy. gb needs to act like squid for the rendering machines so it can do crawl delay backoff and load balancing over the floaters. mwells 2014-10-02 02:08:38 -0700
  • 00a09104ca Merge branch 'diffbot-testing' into testing mwells 2014-10-01 21:27:18 -0700
  • 8f96ba0187 tell diffbot to use gigablast host #0 as its proxy and diffbot will in turn send the request to one of the spider proxies in its list. this way it can do its load balancing etc. algo. mwells 2014-10-01 21:25:39 -0700
  • 3a0d60d6b9 remove graphix Matt Wells 2014-10-01 20:00:35 -0700
  • 7d2ee6df37 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-10-01 19:55:45 -0700
  • abd75e5eca remove size limit on coll.conf Matt Wells 2014-10-01 19:55:24 -0700
  • c7a5073139 added sorting by site # inlinks/pop to menu for testing mwells 2014-10-01 12:09:43 -0700
  • 854767e074 add example for gbsortby:sitenuminlinks into syntax page mwells 2014-10-01 12:07:01 -0700
  • 0075dfee84 fix nytimes.com cookie/redir bug again. mwells 2014-10-01 11:53:47 -0700
  • 145341dcb3 import patch from diffbot-testing mwells 2014-10-01 11:36:02 -0700
  • 1d22e9525a fixes to pass smoke tests. Matt Wells 2014-10-01 11:33:39 -0700
  • 65840d969e update to spider proxy choose set logic mwells 2014-10-01 10:00:24 -0700
  • 7d4c4e8db1 update spider proxy logic. mwells 2014-10-01 09:26:41 -0700
  • 603b350e09 misnomer mwells 2014-09-30 17:40:02 -0700
  • b4ca812ef8 added parm to reset proxy stats in table. erases all our knowledge/stats for each proxy. mwells 2014-09-30 17:38:59 -0700
  • 3ae773a1f3 Merge branch 'testing' into diffbot-testing Matt Wells 2014-09-30 16:22:37 -0700
  • 83111cb5c1 fixes before smoke testing Matt Wells 2014-09-30 16:22:18 -0700
  • 23d26e26ba Merge branch 'testing' into diffbot-testing Matt Wells 2014-09-30 16:02:07 -0700
  • ce56fb93ab fix qa test so we can roll out proxy code. mwells 2014-09-30 15:40:02 -0700
  • 2af806993b update proxy algo so not all proxies get cutoff at once. mwells 2014-09-30 13:08:35 -0700
  • 98ce40967f more collection swapping fixes Matt Wells 2014-09-29 21:52:58 -0700
  • 8c6d216a14 lots of fixes for collection swapping. Matt Wells 2014-09-29 20:16:39 -0700
  • 7275765fbb get collection/root login system working mwells 2014-09-29 19:56:31 -0700
  • e3dbeafa5f more updates to cloud code mwells 2014-09-29 18:28:36 -0700
  • cfb2ab7e82 fix core when deleting collection that is not swapped out. Matt Wells 2014-09-29 14:00:10 -0700
  • bca24fb0e6 fix collection swap logic a bunch. seems to work now. mwells 2014-09-29 13:05:20 -0700
  • 257a7e3c10 first stab at swapping out collection recs to save memory when # of collections is high mwells 2014-09-29 11:37:05 -0700
  • 46290fa52f new password systems. individual collection passwords/accessIps. mwells 2014-09-28 18:59:49 -0700
  • 66dcf61fd7 fix empty summary related core. added \n to scoring info on serps to make diffs in qa.cpp simpler. mwells 2014-09-28 14:31:56 -0700
  • 235d69571b hid indexbody parm. show pure xml in cached page if it's xml. do not show summaries for xml/json docs in the serps, pointless. fix hashSections(). mwells 2014-09-28 13:47:54 -0700
  • a8c5d6a46e fix gbfacetstr: operator for xml docs mwells 2014-09-28 12:09:04 -0700
  • 2366776da3 fix parsing inconsistency bug from fixing the hashing gblang:de etc. mwells 2014-09-28 11:43:02 -0700
  • 7d3bcd7672 1 spider out at a time for qa test consistency mwells 2014-09-28 11:00:31 -0700
  • 7a0f9fe370 fix support for indexing xml docs. no longer use hacks gbxmltitle and gbxmllinks. no longer convert html entities for xml docs using hacks since we have XmlDoc::hashXmlFields() function. added qaxml() qa test to test xml doc indexing and searching. ignore <?xml> tag when generating xml tag compound name. mwells 2014-09-28 10:43:41 -0700
  • 52c88aee94 index xml docs properly like we do json mwells 2014-09-28 09:20:16 -0700
  • a2beb23d87 added Xml::getCompoundName() Matt Wells 2014-09-28 08:39:46 -0700
  • 47eecf4165 msie img border fix Matt Wells 2014-09-27 21:49:48 -0700
  • 8308915654 render tabs cleaner for MSIE Matt Wells 2014-09-27 21:41:15 -0700
  • 903e53a239 fix login cookie for msie Matt Wells 2014-09-27 21:13:14 -0700
  • 8e6365f476 minor fixes in docs mwells 2014-09-27 20:26:21 -0700
  • 675d21df0e fix so gblang:en gblang:"zh_cn" terms work mwells 2014-09-27 19:35:30 -0700
  • 4b611edf8d link to add gigablast to browser search bar mwells 2014-09-27 19:21:00 -0700