Commit Graph

  • f75985d19e Merge branch 'diffbot-testing' into testing mwells 2014-10-10 08:00:44 -06:00
  • 9cc2ab0362 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing mwells 2014-10-10 08:00:22 -06:00
  • 033a8b80a0 fix core if json item has column not in table when dumping json items as csv. Matt Wells 2014-10-10 07:00:11 -07:00
  • 5c7fc3b083 fix OOM for large &n=1000000000 values when searching. just alloc for the docids found, not the docids asked for. Matt Wells 2014-10-09 11:35:35 -07:00
  • f483fccc2e if no crawl regex, and it has a crawl pattern consisting of only negative patterns then restrict to domains of seeds mwells 2014-10-09 11:15:33 -06:00
  • 8bb3545b71 emergency fixes for out of sockets core and get proxy request timing out causing spider to hang bug. Matt Wells 2014-10-09 07:20:04 -07:00
  • 487c6e0037 typo proxyAuth Matt Wells 2014-10-07 19:29:40 -07:00
  • f5d56b3640 fix json being messed up when doc was banned. Matt Wells 2014-10-07 15:49:10 -07:00
  • b0974b81fe make it 500 ms Matt Wells 2014-10-07 14:44:20 -07:00
  • 4bdd496db0 reduce delay per banned proxy from 2s to 1s Matt Wells 2014-10-07 14:43:36 -07:00
  • 20e6983ac6 fix oops Matt Wells 2014-10-07 14:39:47 -07:00
  • 65800b65cf fix so diffbot doesn't timeout due to large floater/proxy backoff crawl delay. append &timeout=MAXCRAWLDELAY to diffbot api url. Matt Wells 2014-10-07 14:32:38 -07:00
  • ce61090b52 more fixes for federated search going OOM Matt Wells 2014-10-07 11:03:10 -07:00
  • 23922df703 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing mwells 2014-10-07 11:50:40 -06:00
  • a503a5c1e4 fix to do federated search over colls sequentially to avoid OOM. mwells 2014-10-07 11:50:14 -06:00
  • 7d498c8518 fix oops Matt Wells 2014-10-06 22:35:32 -07:00
  • c8af0b82a4 do not gen images if custom crawl Matt Wells 2014-10-06 22:04:29 -07:00
  • 3bba6881fa fix gblocal.conf bug Matt Wells 2014-10-06 21:35:54 -07:00
  • 37e02dc831 allow for query delete of diffbot json child docs mwells 2014-10-06 18:11:08 -06:00
  • 10370d6419 Merge branch 'diffbot-testing' into testing mwells 2014-10-06 17:26:59 -06:00
  • 016cadfe30 Merge pull request #21 from miketung/diffbot-testing Gigablast 2014-10-06 16:58:16 -06:00
  • 837974e0ec Valid JSON output for showinput=1. Mike Tung 2014-10-05 21:15:08 -07:00
  • e1b9271e2a nothing mwells 2014-10-03 14:41:09 -07:00
  • 2de70798fb backwards compatible hack since we change the parm object type of master passwords/ips. mwells 2014-10-03 14:36:58 -07:00
  • 092d8484f8 fix gbsortbyINT:gbsitenuminlinks mwells 2014-10-02 18:46:06 -07:00
  • 7b58e2bd94 added 'gb dump' usage back to gb -h. mwells 2014-10-02 16:49:08 -07:00
  • dc5b1408bc threads update for more warning msgs and to save thread enabled status to gb.conf. Matt Wells 2014-10-02 16:11:51 -07:00
  • 2256c1fdba disable threads for disk and intersect/merge by default for better performance since pthreads suck. Matt Wells 2014-10-02 15:24:40 -07:00
  • 6a650a3856 fixed ssl CONNECT reply from being truncated. after receiving the CONNECT reply of ok we had to reset the total bytes to read to 0 to avoid using the old value and truncating the follow up reply of the actual webserver response. Matt Wells 2014-10-02 14:47:55 -07:00
  • b7e31af499 fix pageget.cpp Matt Wells 2014-10-02 12:45:58 -07:00
  • 7df7fbe721 support the CONNECT for gb squid proxy Matt Wells 2014-10-02 12:36:43 -07:00
  • 4e7152b487 fix more bugs in squid proxy implementation. force squid proxy stack to use floaters. mwells 2014-10-02 11:54:50 -07:00
  • 42b891219d several fixes for floater proxy through squid proxy. gb needs to act like squid for the rendering machines so it can do crawl delay backoff and load balancing over the floaters. mwells 2014-10-02 02:08:38 -07:00
  • 00a09104ca Merge branch 'diffbot-testing' into testing mwells 2014-10-01 21:27:18 -07:00
  • 8f96ba0187 tell diffbot to use gigablast host #0 as its proxy and diffbot will in turn send the request to one of the spider proxies in its list. this way it can do its load balancing etc. algo. mwells 2014-10-01 21:25:39 -07:00
  • 3a0d60d6b9 remove graphix Matt Wells 2014-10-01 20:00:35 -07:00
  • 7d2ee6df37 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-10-01 19:55:45 -07:00
  • abd75e5eca remove size limit on coll.conf Matt Wells 2014-10-01 19:55:24 -07:00
  • c7a5073139 added sorting by site # inlinks/pop to menu for testing mwells 2014-10-01 12:09:43 -07:00
  • 854767e074 add example for gbsortby:sitenuminlinks into syntax page mwells 2014-10-01 12:07:01 -07:00
  • 0075dfee84 fix nytimes.com cookie/redir bug again. mwells 2014-10-01 11:53:47 -07:00
  • 145341dcb3 import patch from diffbot-testing mwells 2014-10-01 11:36:02 -07:00
  • 1d22e9525a fixes to pass smoke tests. Matt Wells 2014-10-01 11:33:39 -07:00
  • 65840d969e update to spider proxy choose set logic mwells 2014-10-01 10:00:24 -07:00
  • 7d4c4e8db1 update spider proxy logic. mwells 2014-10-01 09:26:41 -07:00
  • 603b350e09 misnomer mwells 2014-09-30 17:40:02 -07:00
  • b4ca812ef8 added parm to reset proxy stats in table. erases all our knowledge/stats for each proxy. mwells 2014-09-30 17:38:59 -07:00
  • 3ae773a1f3 Merge branch 'testing' into diffbot-testing Matt Wells 2014-09-30 16:22:37 -07:00
  • 83111cb5c1 fixes before smoke testing Matt Wells 2014-09-30 16:22:18 -07:00
  • 23d26e26ba Merge branch 'testing' into diffbot-testing Matt Wells 2014-09-30 16:02:07 -07:00
  • ce56fb93ab fix qa test so we can roll out proxy code. mwells 2014-09-30 15:40:02 -07:00
  • 2af806993b update proxy algo so not all proxies get cutoff at once. mwells 2014-09-30 13:08:35 -07:00
  • 98ce40967f more collection swapping fixes Matt Wells 2014-09-29 21:52:58 -07:00
  • 8c6d216a14 lots of fixes for collection swapping. Matt Wells 2014-09-29 20:16:39 -07:00
  • 7275765fbb get collection/root login system working mwells 2014-09-29 19:56:31 -07:00
  • e3dbeafa5f more updates to cloud code mwells 2014-09-29 18:28:36 -07:00
  • cfb2ab7e82 fix core when deleting collection that is not swapped out. Matt Wells 2014-09-29 14:00:10 -07:00
  • bca24fb0e6 fix collection swap logic a bunch. seems to work now. mwells 2014-09-29 13:05:20 -07:00
  • 257a7e3c10 first stab at swapping out collection recs to save memory when # of collections is high mwells 2014-09-29 11:37:05 -07:00
  • 46290fa52f new password systems. individual collection passwords/accessIps. mwells 2014-09-28 18:59:49 -07:00
  • 66dcf61fd7 fix empty summary related core. added \n to scoring info on serps to make diffs in qa.cpp simpler. mwells 2014-09-28 14:31:56 -07:00
  • 235d69571b hid indexbody parm. show pure xml in cached page if it's xml. do not show summaries for xml/json docs in the serps, pointless. fix hashSections(). mwells 2014-09-28 13:47:54 -07:00
  • a8c5d6a46e fix gbfacetstr: operator for xml docs mwells 2014-09-28 12:09:04 -07:00
  • 2366776da3 fix parsing inconsistency bug from fixing the hashing gblang:de etc. mwells 2014-09-28 11:43:02 -07:00
  • 7d3bcd7672 1 spider out at a time for qa test consistency mwells 2014-09-28 11:00:31 -07:00
  • 7a0f9fe370 fix support for indexing xml docs. no longer use hacks gbxmltitle and gbxmllinks. no longer convert html entities for xml docs using hacks since we have XmlDoc::hashXmlFields() function. added qaxml() qa test to test xml doc indexing and searching. ignore <?xml> tag when generating xml tag compound name. mwells 2014-09-28 10:43:41 -07:00
  • 52c88aee94 index xml docs properly like we do json mwells 2014-09-28 09:20:16 -07:00
  • a2beb23d87 added Xml::getCompoundName() Matt Wells 2014-09-28 08:39:46 -07:00
  • 47eecf4165 msie img border fix Matt Wells 2014-09-27 21:49:48 -07:00
  • 8308915654 render tabs cleaner for MSIE Matt Wells 2014-09-27 21:41:15 -07:00
  • 903e53a239 fix login cookie for msie Matt Wells 2014-09-27 21:13:14 -07:00
  • 8e6365f476 minor fixes in docs mwells 2014-09-27 20:26:21 -07:00
  • 675d21df0e fix so gblang:en gblang:"zh_cn" terms work mwells 2014-09-27 19:35:30 -07:00
  • 4b611edf8d link to add gigablast to browser search bar mwells 2014-09-27 19:21:00 -07:00
  • 8ffa4fe24e doc updates mwells 2014-09-27 17:49:49 -07:00
  • 0267e865b8 minor fixes mwells 2014-09-27 17:01:16 -07:00
  • afd41676d2 bring back meta tag display in results again. added qa tests for advanced search and api parms. various api parm fixes and hides. do not do test url on proxies if test url empty. mwells 2014-09-27 15:54:55 -07:00
  • 9d738cdb8b more advanced search fixes mwells 2014-09-27 11:51:37 -07:00
  • 6de7a3f6b3 get advanced search working again mwells 2014-09-27 11:12:47 -07:00
  • 6c94cfceef add <omitCount> stuff. fix getDocIds() recalls when too many results invisible (dedupped etc.). show in xml, json, html. provide link in html to show omitted results. mwells 2014-09-27 09:56:23 -07:00
  • 8c53ce2a79 undo hashtab change. too much overhead. mwells 2014-09-27 08:39:22 -07:00
  • 6f5b646084 make zobrist values the same across cygwin and apple by explicitly listing them in hashtab.cpp mwells 2014-09-26 23:19:35 -07:00
  • e10f6cdd61 cygwin fixes mwells 2014-09-26 23:04:16 -07:00
  • f81bb6d072 added floater coll override switch. mwells 2014-09-26 21:28:04 -07:00
  • c2f98a81b6 fix floater bug from reading hashtable off disk. force use floaters if ! useRobots and is diffbot crawl. mwells 2014-09-26 15:30:42 -07:00
  • a7bb1c59a3 Merge branch 'diffbot-testing' into testing mwells 2014-09-26 14:25:08 -07:00
  • ad385c35a7 nothing Matt Wells 2014-09-26 13:41:12 -07:00
  • c85df203a0 fix for sitenuminlinks logic so we do not overwrite global-index imported site pop info in tagdb Matt Wells 2014-09-26 13:36:12 -07:00
  • 23836055c2 default ask for gzip to on mwells 2014-09-26 12:07:12 -07:00
  • f436da283a added status count to collection nav bar key at the bottom. mwells 2014-09-26 11:13:25 -07:00
  • 3b87161e83 added http server compression (gzip) stats. remove import link. mwells 2014-09-26 11:06:38 -07:00
  • b4c4688a43 Merge branch 'testing' of git@github.com:gigablast/open-source-search-engine into testing Matt Wells 2014-09-26 08:02:38 -07:00
  • 96d776e790 donations link mwells 2014-09-26 08:12:32 -07:00
  • dcb44e6db0 try new pdftohtml binary Matt Wells 2014-09-26 08:02:17 -07:00
  • 376c48a764 Merge branch 'master' into testing mwells 2014-09-26 07:50:14 -07:00
  • 7bbe406e1b version update mwells 2014-09-26 08:52:09 -06:00
  • e636af8473 fix importing some more mwells 2014-09-25 22:40:13 -06:00
  • 26e201d14c Merge branch 'master' of github.com:gigablast/open-source-search-engine mwells 2014-09-25 21:52:38 -06:00
  • 29f928a71e import fixes mwells 2014-09-25 20:48:34 -07:00
  • f2c907f72a Merge branch 'master' of github.com:gigablast/open-source-search-engine mwells 2014-09-25 21:41:50 -06:00