Commit Graph

  • 5f0ebb4aef fix stack overflow Matt Wells 2014-02-13 00:01:49 -08:00
  • 5db23c2eec fi infinite loop core thing. Matt Wells 2014-02-12 21:43:23 -08:00
  • a9737ea97d Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-02-12 21:20:01 -08:00
  • d42e2377e7 return json download as search results now. all smokes have passed. Matt Wells 2014-02-12 21:19:32 -08:00
  • 8bb17de3c5 pass smoketest: TestOnlyProcessIfNew.testNotUpdatedContent so diffbot reply will not be update in the index if it is unchanged thereby keeping lastCrawlTimeUTC the same. Matt Wells 2014-02-12 18:42:14 -08:00
  • 25eae3da39 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-02-12 13:21:57 -08:00
  • 0e48bbcea9 fix a core from bad return values Matt Wells 2014-02-12 13:21:30 -08:00
  • ca4aafa8a6 added host disk usage redbox and stats. Matt Wells 2014-02-12 09:47:44 -07:00
  • eb044c765c remove login link on root pages. add hand cursor to logout link. Matt Wells 2014-02-12 00:47:58 -07:00
  • e5408d6596 minor fix Matt Wells 2014-02-12 00:37:54 -07:00
  • 68a14de031 security admin fixes Matt Wells 2014-02-12 00:36:09 -07:00
  • 3b0a571cea fix security system to actually work now Matt Wells 2014-02-12 00:06:00 -07:00
  • 609a344a57 fix counting bug in array parms Matt Wells 2014-02-11 22:28:04 -07:00
  • 9a76ff2531 minor parm updates Matt Wells 2014-02-11 20:50:36 -07:00
  • 51d514f276 use supplied mime if supplied when injecting Matt Wells 2014-02-11 13:02:30 -08:00
  • c9be18615c more parm saving fixes Matt Wells 2014-02-10 22:04:22 -07:00
  • 2efbb602df fix saving parms bug Matt Wells 2014-02-10 21:52:29 -07:00
  • 953b7c558d parm updates Matt Wells 2014-02-10 21:45:03 -07:00
  • 69fa6662bc EDOCUNCHANGED fixes for diffbot Matt Wells 2014-02-10 16:23:39 -08:00
  • 44a9e08d38 fix EDOCUNCHANGED logic. Matt Wells 2014-02-10 14:56:22 -08:00
  • debd9089e8 better logging msg when updating parm. Matt Wells 2014-02-10 11:29:24 -08:00
  • c041d47a0c html formatting updates Matt Wells 2014-02-10 00:15:04 -07:00
  • b309d84245 html updates Matt Wells 2014-02-09 23:19:43 -07:00
  • 9f0d2ad82e parm updates Matt Wells 2014-02-09 23:05:36 -07:00
  • cdf2550136 more parm fixes Matt Wells 2014-02-09 22:51:16 -07:00
  • c2c3fe993c parm fixes for basic pages Matt Wells 2014-02-09 22:25:08 -07:00
  • d2b473e554 checkpoint Matt Wells 2014-02-09 19:09:44 -07:00
  • 91ea5384a6 formatting changes Matt Wells 2014-02-09 16:57:39 -07:00
  • ecdd167d9b code checkpoint Matt Wells 2014-02-09 16:41:43 -07:00
  • f420bd2769 checkpoint Matt Wells 2014-02-09 15:09:48 -07:00
  • c9ef525338 code checkpoint Matt Wells 2014-02-09 12:55:45 -07:00
  • 6c9a44367f code checkpoint Matt Wells 2014-02-09 12:38:40 -07:00
  • e60576c8eb another code checkpoint Matt Wells 2014-02-08 22:57:30 -07:00
  • 156b50240a code checkpoint Matt Wells 2014-02-08 16:24:33 -07:00
  • e593b6e1de basic controls code checkpoint. Matt Wells 2014-02-08 15:10:06 -07:00
  • dabd691626 basic admin controls page structure Matt Wells 2014-02-08 00:34:45 -07:00
  • fc47c18aec new printadmintop functionality. Matt Wells 2014-02-07 23:08:04 -07:00
  • b634d06287 fix some cores. use olddoc contenthash for msg13 call for EDOCUNCHANGED errors. Matt Wells 2014-02-07 18:28:09 -08:00
  • 252d24dc2a fix core of page spiders Matt Wells 2014-02-07 10:46:10 -08:00
  • 573a04bccd fix bug in gbminint. Matt Wells 2014-02-06 21:36:47 -08:00
  • edef3acf37 remove bugg line Matt Wells 2014-02-06 21:19:37 -08:00
  • b3453c248e take out buggy statement. Matt Wells 2014-02-06 21:16:30 -08:00
  • 7b42d2848d formatting fixes Matt Wells 2014-02-06 21:06:31 -08:00
  • 2d4af1aefe index numbers as integers too, not just floats so we can sort by spider date without losing 128 seconds of resolution. Matt Wells 2014-02-06 20:57:54 -08:00
  • 63e95c3b2d show lastSpidered time at end of json item. it's a float so we should probably store it as an int as well. we lose 128 seconds of resolution. Matt Wells 2014-02-06 18:56:38 -08:00
  • 8d534b8ed8 many more fixes for streaming mode Matt Wells 2014-02-06 18:21:22 -08:00
  • 874311ae52 fixes for streaming mode. Matt Wells 2014-02-06 16:28:42 -08:00
  • 5787b15884 Merge branch 'diffbot' into diffbot-testing Matt Wells 2014-02-06 15:26:21 -08:00
  • 8f6a4ee9b6 do not save collrecs all the time. stop superflusouly setting m_needsSave. try to stop evaluating crawls that have completed because of lack of urls. we still need to fix it so if they change url filters so that more urls become available, that we retry! Matt Wells 2014-02-06 15:27:49 -08:00
  • 845611ae1b &stream=1 stream mode fixes. Matt Wells 2014-02-06 15:23:53 -08:00
  • 4cfe69a96f minor link updates Matt Wells 2014-02-06 14:41:33 -08:00
  • f9dbd64056 get streaming time sliced results working Matt Wells 2014-02-06 14:25:44 -08:00
  • 106077c163 fix spiderrequest deduping some more Matt Wells 2014-02-06 09:47:18 -08:00
  • 4029b0b937 more faster spider fixes. tried to fix corrupt rdbcache. Matt Wells 2014-02-06 09:25:27 -08:00
  • 9145d89e3f raise spiderdb minfilestomerge from 2 to 3 to reduce merging since we allow many urls in doledb for the same firstip now Matt Wells 2014-02-05 19:35:19 -08:00
  • 203cdc5f99 delete from winnertable when deleting from winnertree Matt Wells 2014-02-05 19:12:33 -08:00
  • 25e7ba5ef8 fix too many spiders out per ip some more Matt Wells 2014-02-05 17:11:45 -08:00
  • 2842350e6d gb.conf spiders back on Matt Wells 2014-02-05 16:59:06 -08:00
  • 5c8b9af1d3 fix rdbcache corruption from -O2 compile bug. fix too many spiders per ip bug! Matt Wells 2014-02-05 16:58:21 -08:00
  • 951e9d5068 wait 180 secs for diffbot reply Matt Wells 2014-02-05 15:46:26 -08:00
  • c60dcf4ecb show userobots for bulk jobs Matt Wells 2014-02-05 15:45:39 -08:00
  • d9f0d57c0c core fixes. csv fixes. Matt Wells 2014-02-05 14:56:22 -08:00
  • ecc10c2cb9 dup cache fixes. do not add dups to spiderdb either. Matt Wells 2014-02-05 14:09:35 -08:00
  • 7806a8a68c fix excessive dupcache deduping. Matt Wells 2014-02-05 13:41:15 -08:00
  • c159f80f05 MAX_WINNER_NODES back to 40. Matt Wells 2014-02-05 13:25:04 -08:00
  • 9c26b85c2f fixed contenthash32 logic for json objects. fixed hashing of numbers/bools for json objects. added m_dupCache to reduce spiderrequests added to spiderdb. do not add urls to waitingtree if ufn is obviously filtered/banned. do not spider spiderrequest from doledb is maxoutperip would be violated. Matt Wells 2014-02-05 13:22:03 -08:00
  • d86c7b8fbb do not store 40 urls in doledb if firstip does not have that many urls to begin with. it's better to just store one url in doledb for small domains. Matt Wells 2014-02-04 20:39:46 -08:00
  • bda134268e added winnertable to avoid dups in winnertree. Matt Wells 2014-02-04 20:09:43 -08:00
  • 053a9b9a0d spiders seem to be working somewhat now. Matt Wells 2014-02-04 18:23:37 -08:00
  • 189999509b code checkpoint. time slicing, faster spider code compiling. now needs debug. Matt Wells 2014-02-04 17:34:43 -08:00
  • 7f4d3205e5 streaming results code checkpoint. Matt Wells 2014-02-04 17:05:43 -08:00
  • 3312400fee checkpoint for faster spider code. Matt Wells 2014-02-04 16:15:27 -08:00
  • 20c31dcc78 Merge branch 'master' into diffbot-slicing Matt Wells 2014-02-04 12:28:43 -08:00
  • d2cebad8e7 spidercoll deletion fixes. Matt Wells 2014-02-04 12:28:05 -08:00
  • 9ded8fa091 faster spiders checkpoint Matt Wells 2014-02-04 12:26:42 -08:00
  • 258e3cba0d fix maxtocrawl limit thing Matt Wells 2014-02-04 09:25:27 -07:00
  • 17fff243f9 add connectips back. call them adminIps this time. if your ip is on the list then you have admin access. cookie tokens will come later/soon. Matt Wells 2014-02-03 20:47:48 -07:00
  • d3b498a057 time slice checkpoint Matt Wells 2014-02-03 19:17:58 -08:00
  • 5ea852dac3 fix core when thread fails to spawn. Matt Wells 2014-02-03 07:27:32 -07:00
  • b46da4c192 prevent msg20/tagdb lookup socket jam up. throttle back max outstanding msg20s (summary generations) based on used udp sockets. Matt Wells 2014-02-03 07:09:29 -07:00
  • 56adb2ee8c nomenclature. url filters -> spider scheduler Matt Wells 2014-02-02 17:00:11 -07:00
  • 10235bb840 fix add url and cached page getting Matt Wells 2014-02-02 16:49:31 -07:00
  • 7bf8a2ac49 do not let glibc do malloc checks, we do that. Matt Wells 2014-02-02 13:41:59 -07:00
  • 4be68fdaa6 set safebuf::m_buf to null in destructor Matt Wells 2014-02-02 12:16:11 -07:00
  • 0df697e56a fix keep alive loop code to bail out if fails to bind to socket as well as quick cores. Matt Wells 2014-02-02 12:11:18 -07:00
  • f58a94a8cc fix diffbot url bug Matt Wells 2014-02-02 11:53:10 -07:00
  • 93021b2f13 Merge branch 'diffbot' Matt Wells 2014-02-01 11:31:00 -07:00
  • 095c47f181 Merge branch 'diffbot' Matt Wells 2014-02-01 11:28:31 -07:00
  • 4346fcee29 added recovery mode display in hosts table Matt Wells 2014-02-01 10:16:46 -08:00
  • 4d2eafe39b added some repair logic for 0001.dat files. turn of spiderdb disk cache for now. Matt Wells 2014-02-01 10:14:25 -08:00
  • 10d0e9f52b Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2014-01-31 14:54:23 -08:00
  • 392d043bd8 undo canonical deduping. added dump round stats when uploading json files. Matt Wells 2014-01-31 14:53:49 -08:00
  • 6e9b4f8ca2 fix core Matt Wells 2014-01-30 22:03:12 -07:00
  • e8a6d8f345 fix another core from freening wrong byte sized crawl info reply. Matt Wells 2014-01-30 20:16:41 -08:00
  • 09fd98c95b Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2014-01-30 19:57:07 -08:00
  • 7107f730d0 fix another core from deleting a coll and deleting a spidercoll in progress. Matt Wells 2014-01-30 19:56:43 -08:00
  • 4a1ad74f79 test fix for keep alive infinite loop bug. Matt Wells 2014-01-30 14:16:16 -08:00
  • 83e291f12b fix infinite keep alive restart bug some more Matt Wells 2014-01-30 14:12:32 -08:00
  • 03aa7842d0 do not enter into an inifinite keep alive restart loop. Matt Wells 2014-01-30 14:40:03 -07:00
  • 40f373c9e0 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2014-01-30 13:11:48 -08:00