Commit Graph

  • 80ca589751 Removed explicit m_buf[0] from InjectionRequest Ivan Skytte Jørgensen 2016-04-04 14:27:56 +02:00
  • aa89c75565 Removed explicit m_buf[0] from Msg25Request Ivan Skytte Jørgensen 2016-04-04 14:23:57 +02:00
  • 8d8552b027 Removed explicit m_buf[0] from Msg13Request Ivan Skytte Jørgensen 2016-04-04 13:43:09 +02:00
  • 8eba33f359 Removed explicit m_buf[0] from Msg39Request/Msg39Reply Ivan Skytte Jørgensen 2016-04-04 12:36:32 +02:00
  • 1a2693dc12 Removed explicit m_buf [0] member from Msg20Request/Msg20Reply Ivan Skytte Jørgensen 2016-04-04 12:26:25 +02:00
  • e006d52d09 Use general serialize/deserialize functions in Msg20Request Ivan Skytte Jørgensen 2016-04-04 12:07:03 +02:00
  • 85e1874de5 Use general serialization/deserialization functions in msg20 Ivan Skytte Jørgensen 2016-04-04 11:57:31 +02:00
  • 36027e9ddb Removed unnecessary clearing in Msg20Request::reset() Ivan Skytte Jørgensen 2016-04-04 10:42:22 +02:00
  • 014ed8d7dc Issue 168. Partial fix of segfault on startup. CompunixAu 2021-06-17 16:27:33 +00:00
  • 6213d0e335 Merge pull request #145 from gigablast/master testing Gigablast 2021-05-09 10:33:05 -06:00
  • 5b3f7677c0 Merge pull request #175 from onlyjob/codespell Gigablast 2021-05-09 10:28:31 -06:00
  • 4d80cc4f95 Merge pull request #180 from onlyjob/build Gigablast 2021-05-09 10:23:56 -06:00
  • 670b8647ee L:image-file-has-conflicting-name, fixed file name inconsistencies (Closes: #173) Dmitry Smirnov 2021-05-09 16:37:45 +10:00
  • d9fe286116 make: sync FLAGS for 32bit architectures Dmitry Smirnov 2021-05-09 14:54:20 +10:00
  • 83d7ebf4c4 make: "gb" target to build "gb.pem" if required. Dmitry Smirnov 2021-05-09 14:56:44 +10:00
  • 857a62d3ba make: moved build instructions to README. Dmitry Smirnov 2021-05-09 15:01:08 +10:00
  • 64cf6411b2 make: parametrise "-O2" and remove very unstable "-O3". Dmitry Smirnov 2021-05-09 14:49:59 +10:00
  • ff6fe922d9 make: build and check "gb.pem"; updated expired "gb.pem" (Closes: #178). Dmitry Smirnov 2021-05-09 11:06:57 +10:00
  • b1ace63607 codespell: spelling corrections Dmitry Smirnov 2021-05-06 01:52:55 +10:00
  • 9bf4fd2e63 Merge pull request #170 from onlyjob/master Gigablast 2021-05-04 23:55:20 -06:00
  • 50d2cf9bc1 Removed obsolete private libiconv (Closes: #167) Dmitry Smirnov 2021-05-05 10:49:15 +10:00
  • c124eda914 cleanup: remove local zlib. All distros provide zlib1g-dev. Dmitry Smirnov 2021-05-05 10:26:56 +10:00
  • 3a55b74050 cleanup: removed useless local binaries (libgcc.a libc.a) Dmitry Smirnov 2021-05-05 10:24:45 +10:00
  • e18d2396a6 Removed private OpenSSL [hygiene,FTBFS]. All distros provide OpenSSL. Dmitry Smirnov 2021-05-05 10:20:11 +10:00
  • 7a2bca9649 Compile with "-std=c++98" to fix FTBFS (Closes: #164) Dmitry Smirnov 2021-05-05 10:07:55 +10:00
  • 9146f05574 Merge pull request #157 from shijuraj/master Gigablast 2020-05-04 08:55:25 -06:00
  • c1e5f9fd7f Merge pull request #1 from gigablast/master Shijuraj J 2019-09-21 08:24:58 +05:30
  • 4a943f1c79 Merge pull request #136 from vonbetz/master Gigablast 2017-06-02 11:32:56 -06:00
  • f10fdada73 Fix infinite loop on malformed proxy. Zak Betz 2017-06-02 11:28:58 -06:00
  • f9d607665b forgot comma. diffbot-testing Matt Wells 2017-02-24 10:01:26 -08:00
  • 6eb802054b fix some corruption in spider data after deleting a collection. Matt Wells 2017-02-24 09:59:28 -08:00
  • 88120414b0 import more stuck job fixes. Matt Wells 2016-11-16 10:59:55 -08:00
  • f1b6f73719 empty &seeds= fix to not reset seeds Matt Wells 2016-11-16 10:34:25 -08:00
  • 53ee1039b8 import final hung crawl fix into old gb. xor the firstip into the doledb key this time. seems to avoid all collisions now so we don't overwrite nodes in the doledb tree. Matt Wells 2016-11-07 09:11:27 -08:00
  • 3d248732d0 fix to shut up app checker. Matt 2016-11-04 17:28:26 -06:00
  • c0b2cdb60a hide the verify disk writes parm, seems to be causing cores when activated. and shouldn't really need to be used. is for debugging disk issues. Matt 2016-11-04 17:09:15 -06:00
  • 4b889e0ddd quick fix Matt Wells 2016-11-01 11:39:15 -07:00
  • 93d5752ab7 import fix for jobs hanging from pro. Matt Wells 2016-11-01 11:18:40 -07:00
  • 1542f7c57f do not save doledb on exit to prevent corruption being propagated and in case we change default spider priorities in the url filters code which could cause hung jobs. Matt Wells 2016-10-24 14:23:13 -07:00
  • d5bc775ce5 fix errCount bug in url filters from errCount wrapping over to negative numbers. Matt 2016-09-20 17:02:51 -06:00
  • f23daf3e5e more fixes for 'zeroing out' code. Matt Wells 2016-09-14 10:32:01 -07:00
  • 93f878fbca wow this thing is really being persnickety Matt Wells 2016-09-14 10:18:48 -07:00
  • 5d54fde09c more zerout fixes Matt Wells 2016-09-14 10:10:03 -07:00
  • 16ec2a6963 fix stack related bug. Matt Wells 2016-09-14 10:00:18 -07:00
  • 4ac26f9c9b rotate log file at 1GB. Matt Wells 2016-09-14 09:37:23 -07:00
  • 8a5891f2ec fix memory leak in Parms.cpp Matt 2016-09-12 13:41:07 -06:00
  • e9f0d067be zero out docs that we do not process to save disk space. record content hash in 'zeroed out' content so we preserve it for deduping. Matt Wells 2016-08-30 11:06:59 -07:00
  • 616bfeea86 show corrupt collection numbers for spiderdb corrupted recs. Matt Wells 2016-08-15 12:55:22 -07:00
  • 7e77b900a6 fix a core from corrupted spider request in doledb in rdb::reclaimMemFromDeletedTreeNodes() Matt Wells 2016-06-21 11:54:09 -07:00
  • b03571c1a1 fix infinite loop from corrupt spider request Matt Wells 2016-06-06 08:06:20 -07:00
  • 14e2f1f579 make trash subdir in case missing. Matt Wells 2016-05-30 19:37:03 -07:00
  • bd5618f1b7 fix spider request corruption detection in doledb Matt Wells 2016-05-28 08:56:36 -07:00
  • fe21fb1cc6 fix spider detection of corrupted requests. fix deduping so it doesn't core on docid based spider requests. Matt Wells 2016-05-28 08:40:59 -07:00
  • 7bd2344f41 increase regular page download timeout from 30 seconds to 60 seconds to accommodate some slower websites. Matt Wells 2016-05-18 10:05:02 -07:00
  • 494f7ca645 reduce diffbot timeout from 18000 secs to 240 secs Matt Wells 2016-05-18 09:55:15 -07:00
  • b0e015b97d fix dns lookup bug that was causing us to get incorrect ips sometimes. Matt Wells 2016-05-17 11:57:21 -07:00
  • 77dc78d122 fix empty file bug again Matt Wells 2016-05-13 14:49:54 -07:00
  • 39e621f655 trash files of length 0 that are holding up a merge. if we can't merge files we end up stockpiling them and things get slow fast. Matt Wells 2016-05-13 13:21:43 -07:00
  • dfca68ec46 Merge pull request #99 from vonbetz/regexdebug Gigablast 2016-05-13 10:06:24 -06:00
  • c89d963d46 Merge pull request #100 from vonbetz/tldfixes Gigablast 2016-05-13 10:01:01 -06:00
  • d4dc85bf18 Fixes for new tlds. They can now contain '-' and numbers. Fix punycode url encoding: set max length before encoding each url chunk. Zak Betz 2016-05-12 16:04:21 -06:00
  • 0074f2ec73 Merge branch 'diffbot-testing' of https://github.com/gigablast/open-source-search-engine into regexdebug Zak Betz 2016-05-11 14:07:41 -06:00
  • 0ee1e6164c Add input validation to regexs before crawlbot collections are created. Add ./gb egrep command to test regexes. Zak Betz 2016-05-11 14:03:45 -06:00
  • a246289238 Merge pull request #96 from vonbetz/diffbot-testing Gigablast 2016-05-11 09:32:21 -06:00
  • 89f3344be5 Updated tld list with the most current list. Zak Betz 2016-05-04 11:17:51 -06:00
  • 8c3eacc338 fix cores from XmlDoc::getLinkInfo1() returning -1 because of its call to getFirstIp() presumably. Matt Wells 2016-04-18 09:55:00 -07:00
  • c5de65a78a more core dump fixes concerning -1 being returned for XmlDoc::getLinkInfo1() Matt Wells 2016-04-17 18:50:23 -07:00
  • 65856e3b6a tell malloc to trim 100MB at a time to prevent kernel destroying the cpu by defraging/compacting memory. fix core in Title.cpp. Matt Wells 2016-04-17 09:16:03 -07:00
  • 95a3a261db fix so host 8 doesn't jam things up so much. host 8 was too busy merging a large spiderdb for blouartinfo and unable to tend to other smaller merges and therefore the # of files was getting out of hand causing slowdowns. so merge spiderdb much less aggressively. Matt 2016-04-15 13:34:41 -06:00
  • 165f724fd7 thanks to isj for the puny code fixes Matt 2016-04-06 11:10:42 -06:00
  • 74cfde3e53 fix calling doneSendingNotification() with a just-freed memory ptr bug. Matt 2016-04-06 10:53:08 -06:00
  • d2983747b8 fix sending back reply that has some stuff on the stack that it references when XmDoc::getMsg20Reply() returns. thanks to isj for this fix. Matt 2016-04-06 10:43:43 -06:00
  • 8891100c2a fix add url on root page to set collnum properly. fix Summary::getBestWindow() underrun bug. Matt 2016-04-06 10:31:04 -06:00
  • 70ca2fe48c update ./gb -h desc for ./gb inject. Matt 2016-04-05 21:06:38 -06:00
  • f5d0045b43 Merge pull request #82 from vonbetz/testing Gigablast 2016-03-29 13:12:56 -06:00
  • 3c140b87aa Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing Zak Betz 2016-03-29 12:42:05 -06:00
  • cf7ec13de6 Fix international domain printing bug. Zak Betz 2016-03-29 12:41:34 -06:00
  • 33e76af1d1 Merge branch 'testing' Matt 2016-03-29 04:11:30 -06:00
  • 816d69b34c a lot of bug fixes thanks to isj. Matt 2016-03-29 04:08:17 -06:00
  • 5072e851b7 fix misspelling Matt 2016-03-28 17:26:40 -06:00
  • 5935619eb2 hack on parentUrlDocId to the json object dump of diffbot objects. Matt 2016-03-28 12:39:48 -06:00
  • cab6d5c519 fix keysize==8 bug in keycmp Matt 2016-03-28 09:17:01 -06:00
  • b65a16caee Merge branch 'diffbot-testing' into testing Matt 2016-03-22 16:25:21 -06:00
  • 3c743a7d0e allow more docids to be downloaded/served in search results. Matt Wells 2016-03-22 15:24:33 -07:00
  • 04a8433256 show gbssParentDocId in status doc for children docs, like diffbot object docs. Matt Wells 2016-03-22 09:00:10 -07:00
  • 483d69d7f7 added httprequest debug line Matt Wells 2016-03-21 14:46:10 -07:00
  • 136d23816c fix hashbang properly Matt Wells 2016-03-21 09:29:55 -07:00
  • 48398d0cd7 Merge branch 'diffbot-testing' into testing Matt 2016-03-20 23:14:26 -06:00
  • 136b8842db fix more data corruption bugs. hopefully will dump out all the collections this time and not leave any in the tree, otherwise, especially if there are a lot left behind, they get corrupted. Matt Wells 2016-03-20 21:04:01 -07:00
  • 61ef806dea hash bang fix. detect more corruption. don't dump titledb and spiderdb at same time, seems to reduce corruption in rdbmem. Matt Wells 2016-03-20 12:50:43 -07:00
  • fc495a5bf5 fix dump core when collection deleted while dumping Matt Wells 2016-03-18 06:46:38 -07:00
  • 8922b8e69c Merge branch 'diffbot-testing' into testing Matt 2016-03-17 14:31:22 -06:00
  • 56bde4c3ef fix the data corruption fix Matt Wells 2016-03-17 13:22:56 -07:00
  • 8bc653c31c after dump completes scan tree to ensure all nodes reference secondary mem ptr so they don't get their data overwritten. Matt Wells 2016-03-17 10:09:49 -07:00
  • 0caf345850 if running ./gb start and another gb is already bound on the port then quickly exit(0) and have the bash keep alive loop exit the loop based on that return value. we can't use ./cleanexit file because it doesn't get remove and will mess up the main process that is running. Matt Wells 2016-03-16 16:56:48 -07:00
  • 36fdbf2f5a rename log files in the gb main.cpp code not in the bash loop. do not rename the log file if failed to start gb because socket was already bound. prevents us from double starts moving the log file, which is annoying. Matt Wells 2016-03-16 16:08:08 -07:00
  • a2e8a3a1fd use ./cleanexit file to ensure gb doesn't restart after a graceful exit in the bash keep alive loop. Matt Wells 2016-03-16 14:57:19 -07:00
  • 7396e57660 show docids of corrupted title recs found. show key range of each dump to disk. fix 'sentToDiffbot' bug for unchanged docs in status docs. make sure firstKeyInQueue is set properly from current key, so reset list ptr before doing that in RdbDump.cpp. Matt Wells 2016-03-16 13:53:08 -07:00
  • 5e8c47adfd Merge branch 'diffbot-testing' into testing Matt 2016-03-16 01:14:37 -06:00
  • 1faff50f5a if msg22a never called to get docid, then error out. Matt Wells 2016-03-16 00:14:02 -07:00