Commit Graph

  • 102199fcd1 fix the fix Matt 2015-10-10 14:01:11 -0600
  • 2fac15049e fix halfstopwikibigram bug Matt 2015-10-10 13:15:09 -0600
  • 8763fd6d78 make gb shutdown easier (./gb stop) Matt 2015-10-10 11:12:02 -0700
  • f52147bf5d we were allocating too many nodes in top tree. tone that down. fix bug with verify writes being turned on then off. Matt 2015-10-09 14:30:57 -0600
  • 0578bcd271 Merge b71099bccf into 260864b364 Ivan Skytte Jørgensen 2015-10-09 10:29:17 +0000
  • b71099bccf fix logging deadlock bug. Matt 2015-08-31 09:56:34 -0600
  • 0048ab6be8 fix right Matt 2015-10-08 13:42:42 -0700
  • 02140ce00f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-10-08 13:41:34 -0700
  • edb9b91fb1 fix blaster2 Matt 2015-10-08 13:41:23 -0700
  • 1708d0608c some fixes for detecting corrupted injection requests. seems to be very common. Matt 2015-10-07 21:47:10 -0600
  • dfe1d6c7b3 move a parm down Matt 2015-10-07 16:23:14 -0600
  • a1e876becc add <docScore> to serps Matt 2015-10-07 11:31:45 -0600
  • b1f93ff6be fix compiler error Matt 2015-10-07 10:05:53 -0600
  • db82c56026 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-10-07 10:05:10 -0600
  • b33a60ca5e remove log spam. Zak Betz 2015-10-07 10:05:06 -0600
  • 0f453d5cdf Merge branch 'ia-zak' into testing Matt 2015-10-07 10:02:38 -0600
  • dec683387a Merge branch 'ia-zak' into testing Matt 2015-10-07 10:02:13 -0600
  • 16db36252c Merge branch 'diffbot-testing' into testing Matt 2015-10-07 10:02:06 -0600
  • 46315ac7a3 fix multiple title rec gbcapture date bug Matt 2015-10-07 09:30:38 -0600
  • a3de262ebe Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-10-07 08:56:08 -0600
  • 45744d74f3 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into warc-stream Zak Betz 2015-10-07 08:46:07 -0600
  • a9410738ae fix permissions bug when creating directories, need to put in user/group execute bit. Matt 2015-10-07 08:26:27 -0600
  • 4600ce0816 fix threads from freezing up just because pthread_create() had an error. need to return the thread stack. Matt 2015-10-07 07:50:44 -0600
  • 836aa1756d fix threads from freezing up just because pthread_create_thread() had errors. need to return the thread stack to the linked list of thread stacks. Matt 2015-10-07 07:49:34 -0600
  • 6d90ea2e5f try to launch threads even if none need cleanup. hopefully fixes thread freeze. Matt 2015-10-07 07:29:37 -0600
  • cee5d8922a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-10-05 18:48:06 -0600
  • 5b605624ca fix core dump average": 3.677707812426755e+26, Matt 2015-10-05 18:47:11 -0600
  • a77c9be5b8 Merge branch 'diffbot-testing' into diffbot-sam Matt 2015-10-05 17:32:21 -0600
  • df1c7f6e0f update qa.cpp syntax test to do &n=100 for gbssStatusCode:0 query Matt 2015-10-05 17:31:35 -0600
  • 97b9c99bec correcting facet min=0 sam 2015-10-05 16:09:52 -0700
  • 49ec9c99fd Don't restart all items when forcing a list of items into injector. Zak Betz 2015-10-05 15:33:09 -0600
  • 9b785a1522 allow more than 2gb of mem to be allocated to hold resulting docids. Matt 2015-10-05 09:35:44 -0700
  • 757a44b149 fix facets when doing > 1 split and first split termlist is empty. Matt 2015-10-05 10:05:08 -0600
  • 21b71226a6 remove bad fix Matt 2015-10-05 09:33:02 -0600
  • c947252fee Add gbcapturedate to individual doc's metadata when injecting warcs. Zak Betz 2015-10-04 01:53:54 -0600
  • 39214a9dc6 Merge branch 'diffbot-testing' into testing Matt 2015-10-02 19:26:15 -0600
  • 42cdd5b382 fix msg20 getsummary core Matt Wells 2015-10-02 12:34:08 -0700
  • e4adc99c0c fix empty winner tree bug. try to improve rdbcache promotion logic for all caches. -O2 on spider.cpp. Matt Wells 2015-10-02 12:16:48 -0700
  • 9daaa4d5af Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2015-10-01 19:37:12 -0700
  • 9178d67b2f fix churn bug in winnerlistcache in spider.cpp so do not add the dolebuf list of spiderrequests back into the cache, but just modify the "jump" in the first 4 bytes of the cached record. because when we re-added it back to the cache it created too much churn and we'd lose cached records unnecessarily. Matt Wells 2015-10-01 19:35:34 -0700
  • 6becb55a2b Stream warcs instead of downloading them and unzipping them on disk. Zak Betz 2015-09-30 22:25:59 -0600
  • a8e3e4b269 if metadata is already in the old xmldoc::ptr_metadata then do not re-add it. Matt 2015-09-30 21:46:07 -0600
  • e06ae06c23 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-09-30 15:27:07 -0600
  • a31c7f5fc8 exiting msg Matt 2015-09-30 15:26:58 -0600
  • 06aea41611 show spiderdb scan progress in spider queue for the collection Matt Wells 2015-09-30 13:38:22 -0700
  • b97546f98c do not expand about:blank iframes. Matt Wells 2015-09-30 09:36:04 -0700
  • cb4bbe8892 Merge branch 'ia' into ia-zak Matt 2015-09-30 07:58:31 -0600
  • d4c677170f index metadata on EDOCUNCHANGED errors, and append new meta data to XmlDoc::ptr_metadata. Matt 2015-09-30 07:57:40 -0600
  • 1b0350932d Merge remote-tracking branch 'upstream/master' Ivan Skytte Jørgensen 2015-09-29 13:18:53 +0200
  • 67fc339953 prevent out of mem core. actually trying to alloc more than 2GB for search result stuff. Matt 2015-09-26 21:34:07 -0700
  • f0a2f86200 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-09-25 08:09:25 -0700
  • 55993e58d5 fix cores on gi #0 Matt 2015-09-25 08:09:05 -0700
  • 2721256c0d show ip port of bad host Matt Wells 2015-09-25 07:47:21 -0700
  • 68a3679fae exit if can not load auth/internetarchive.yml file Matt 2015-09-25 08:33:25 -0600
  • 83ac18fff4 Merge branch 'master' into testing Matt 2015-09-25 08:25:19 -0600
  • e2fad81227 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing Matt 2015-09-25 08:24:54 -0600
  • 3ce6c7d941 Merge branch 'ia-zak' into testing Matt 2015-09-25 08:24:12 -0600
  • 1e8f656d30 Merge branch 'diffbot-testing' into ia-zak Matt 2015-09-25 08:23:42 -0600
  • 27f259ca3b Merge branch 'master' of github.com:isj-privacore/open-source-search-engine Ivan Skytte Jørgensen 2015-09-25 16:12:25 +0200
  • bfca214a0b Test on XML node type correctly (was accessing ->m_tagNameLen whas had undefined value if the nodes weren't tags) Ivan Skytte Jørgensen 2015-09-25 12:00:32 +0200
  • 1bf17b9ada Merge pull request #1 from isj-privacore/text_node_type Ivan Skytte Jørgensen 2015-09-25 12:05:27 +0200
  • 12638fc5d8 Test on XML node type correctly (was accessing ->m_tagNameLen whas had undefined value if the nodes weren't tags) Ivan Skytte Jørgensen 2015-09-25 12:00:32 +0200
  • 93943a0cab some pages legitamately have no outlinks, no need to think they were banned. Matt Wells 2015-09-24 14:01:23 -0700
  • 6454dad6bf Revert "ignore real root, just use seeds, to detect if banned." Matt Wells 2015-09-24 13:29:20 -0700
  • cb60f68e72 ignore real root, just use seeds, to detect if banned. Matt 2015-09-24 14:15:29 -0600
  • 260864b364 urgent fix for core dumps for some queries that have long termlists. Matt 2015-09-24 11:49:50 -0600
  • 268b21d552 reduce log spam Matt Wells 2015-09-24 11:37:16 -0600
  • 9be3f9310e fix annoying core dump for some queries in Posdb.cpp Matt 2015-09-24 11:34:02 -0600
  • 8a0461b82f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt 2015-09-24 09:10:37 -0600
  • d92b153090 added 'verify writes' switch to track down data corruption Matt 2015-09-24 09:10:20 -0600
  • 76f4b321fb Support building on SuSE/OpenSuSE with dynamic linking Ivan Skytte Jørgensen 2015-09-24 15:35:28 +0200
  • bf31cb4b6e Test on XML node type correctly (was accessing ->m_tagNameLen whas had undefined value if the nodes weren't tags) Ivan Skytte Jørgensen 2015-09-24 15:23:32 +0200
  • faedea4a9f Fix repeating label. Zak Betz 2015-09-24 01:33:51 -0600
  • ce29993951 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-09-24 01:08:56 -0600
  • ae44f295e7 Fix some bad html in statsdb graph. Zak Betz 2015-09-24 01:08:40 -0600
  • 025d7e6e30 fix statsdb plot graph breach Matt 2015-09-23 19:53:29 -0600
  • 3dcaf414db report bytes saved to disk. if thread crashes try to dump core. Matt Wells 2015-09-23 15:40:30 -0700
  • 98744889e2 do not core if no collrec for msg20 summary request Matt Wells 2015-09-23 14:39:13 -0700
  • ba8ebc7794 Revert "data corruption fixes" Matt Wells 2015-09-23 14:38:17 -0700
  • 27172945c7 data corruption fixes Matt Wells 2015-09-23 14:34:52 -0700
  • 2fde3ac5bc call umask() to fix gb process umask so files created are group writable Matt 2015-09-22 12:23:33 -0600
  • f9442ac5cd Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing Zak Betz 2015-09-21 16:46:36 -0600
  • 16b6e44bd1 Show utf8 url in page results. Zak Betz 2015-09-21 16:44:40 -0600
  • 100888d691 fix file/dir creation permissions bugs Matt 2015-09-21 12:44:41 -0600
  • 74cde33a3a just use the user's umask val for all file/dir creation Matt 2015-09-21 11:33:38 -0600
  • ce7b06fc4d all files made are now group writable. if you don't like that then you can make a special group and set the directory just group writable for that group using chmod g+s <dir>. Matt 2015-09-21 11:19:34 -0600
  • eefbe95ce9 Merge branch 'diffbot-testing' into ia-zak Matt 2015-09-21 10:13:29 -0600
  • 786ba76d10 Merge branch 'ia-zak' into testing Matt 2015-09-21 10:12:58 -0600
  • 5aaa08d81a Merge branch 'ia-zak' of github.com:gigablast/open-source-search-engine into ia-zak Matt 2015-09-21 10:12:38 -0600
  • 55169be6fc Warc injector update. Zak Betz 2015-09-21 09:31:59 -0600
  • 83190e3bbc Make punycoded urls printable. Zak Betz 2015-09-21 09:17:40 -0600
  • 7d8e7b83a3 Bugfix: Let spidered pages have the charset detected instead of using whatever charset/encoding is in GigablastRequest Ivan Skytte Jørgensen 2015-09-21 14:29:40 +0200
  • 5635695666 oom prevention Matt 2015-09-20 21:42:46 -0600
  • d6d5d10a15 prevent core from bad root title rec Matt Wells 2015-09-20 08:26:00 -0700
  • 69a3cb0999 fix corrupt tag with corrupt root title buf from coring Matt Wells 2015-09-17 21:33:58 -0700
  • 13e0ba7bff fix bug of having a meta redirect tag in <script> tags. we have to use Xml class to make sure it is a legit refresh tag. Matt 2015-09-16 11:03:38 -0600
  • 58e9f56015 never let any diffbot error prevent us from retrying a url in subsequent crawl rounds. Matt 2015-09-16 10:00:11 -0600
  • bcdecc63c6 expose "urlip" injection parm to provide ip of url being injected to save gigablast from an ip lookup if you want. Matt 2015-09-16 09:43:15 -0600
  • d11761cfd9 update graph key Matt 2015-09-15 15:48:40 -0600
  • b2f7e72d8a fix core from showing graph to two users Matt 2015-09-15 15:46:38 -0600