Commit Graph

  • 7df2111ceb fixed 'gb inject titledb-DIR newhosts.conf' command for populating an index from titledb files in DIR and transmitting to appropriate host in newhosts.conf. also prettied up the gb -h output to use a formatting function. Matt Wells 2014-01-02 01:20:08 -0700
  • 935a4faccf fixed './gb inject titledb newhosts.conf' You have to be in working directory of the instance whose cached pages (titlerecs) you want to inject into the new cluster defined by newhosts.conf. Matt Wells 2014-01-01 22:04:26 -0700
  • b7e9b78c21 hash gbparenturl: for getting json objects for the specified url in the search results. Matt Wells 2013-12-31 10:21:08 -0800
  • d77ddc19c3 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-31 09:46:47 -0800
  • 5619d2a2c8 fix initializing status msg error Matt Wells 2013-12-31 09:46:35 -0800
  • 1919ad7f95 gb.conf spiders enabled. Matt Wells 2013-12-31 09:22:46 -0800
  • 471fc7a50a fixed core from deleting a non-existent crawl. it tried to add it ... Matt Wells 2013-12-30 10:53:45 -0800
  • 71982c9919 fix bad csv output Matt Wells 2013-12-30 10:39:45 -0800
  • f92f190176 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-29 14:51:33 -0800
  • a70b280206 nothing Matt Wells 2013-12-29 14:51:24 -0800
  • c0447de3a1 watch out for NULL "base" after a coll delete. Matt Wells 2013-12-29 01:32:40 -0800
  • 70fc63985b nothing Matt Wells 2013-12-28 20:32:28 -0800
  • 1c044235be count EFAKEFIRSTIP errors when spidering as page download attempts. should fix a couple smoke tests. Matt Wells 2013-12-27 19:25:51 -0800
  • 6aac48e487 fix crawl delay wait queue logic. if coll already exists trying to add, let it be. don't error out. Matt Wells 2013-12-27 14:35:51 -0800
  • 5cdb73bc70 fix spider core Matt Wells 2013-12-27 15:28:44 -0700
  • d8a9a3f4e3 fix parm sync code some more. added localhosts.conf to the 'gb install' dist. Matt Wells 2013-12-27 14:00:37 -0800
  • bff0083538 ensured robots.txt redirects are cached as well Matt Wells 2013-12-27 13:01:01 -0800
  • 534c9cf9db fix parm sync core Matt Wells 2013-12-27 12:09:46 -0800
  • 958becbdf0 fix parm checksum for syncing parms. was not using gbstrlen() for strings. Matt Wells 2013-12-27 11:56:20 -0800
  • 0181a32311 fix array count syncing. fix parms that were not syncing. Matt Wells 2013-12-26 11:51:20 -0800
  • 100af585a6 parm sync fixes Matt Wells 2013-12-26 11:20:19 -0800
  • 93d62a1f9e Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-26 09:34:47 -0800
  • 9b5e3016df fix hosts.conf Matt Wells 2013-12-26 09:34:35 -0800
  • 141a76c322 try localhosts.conf before hosts.conf Matt Wells 2013-12-26 09:32:22 -0800
  • 7624a3db0a if url is manually added and it is simplifiedredirect then re-add with the same manually added bit set in the new spider request, otherwise seed url might not get spidered since it might not match the regex. Matt Wells 2013-12-26 08:58:56 -0800
  • 048b715962 if coll is deleted or reset in a middle of a dump or merge then stop the dump/merge with ENOCOLLREC error. avoid calling "base->" functions since it could be NULL if deleted. Matt Wells 2013-12-25 17:12:09 -0800
  • f9d7b9dbc7 fix core Matt Wells 2013-12-23 18:50:46 -0800
  • 8537a02008 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-23 10:31:00 -0800
  • 6cc69106c2 fix hosts.conf Matt Wells 2013-12-23 10:30:45 -0800
  • 3acd6a08d5 add the true spider request when retrying to spider a fake-ip spider request. add a EFAKEIP error reply for the fake ip request. prevents us double spidering the same url. Matt Wells 2013-12-23 10:27:42 -0800
  • 11d6d5ad6a Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-23 09:30:52 -0800
  • 2ac8ff2952 compile regex so it's case dependent Matt Wells 2013-12-23 09:30:35 -0800
  • b0d77a834a do not spider fake ips requests, just re-add them with the right firstip Matt Wells 2013-12-20 12:22:02 -0800
  • 6f2e552bcd fix core in linked list of msg13requests in case one gets freed Matt Wells 2013-12-20 11:26:46 -0800
  • 5fcfff6729 fixes for spiders getting stuck Matt Wells 2013-12-19 20:04:06 -0700
  • 4c7ce819b9 fix core dump Matt Wells 2013-12-19 18:39:29 -0800
  • c2f8445a70 expand reg ex shortcuts like \d to [0-9] Matt Wells 2013-12-19 18:31:37 -0800
  • 261f4feb9b fixed cdata parsing issue Matt Wells 2013-12-19 16:04:53 -0800
  • 3092dcecaa rebuild url filters and regexes at startup Matt Wells 2013-12-19 15:56:27 -0800
  • 99099505d8 call regfree before changing regex Matt Wells 2013-12-19 15:32:26 -0800
  • 7f70e4e887 fix regex logic Matt Wells 2013-12-19 15:19:18 -0800
  • aad12f9fe3 minor print format fix Matt Wells 2013-12-19 14:30:56 -0800
  • ef5decb0b8 more fixing stuck spiders Matt Wells 2013-12-19 14:17:22 -0800
  • 32db83ae47 try to fix spiders from petering out. reset doledb next keys and empty flags every 3 minutes. Matt Wells 2013-12-19 13:31:14 -0800
  • cb111a1efa fix doledb empty logic Matt Wells 2013-12-19 13:06:35 -0800
  • d2f9dcf8e0 revert last commit. not needed Matt Wells 2013-12-19 10:33:04 -0700
  • 784b6900cd more spider fixes Matt Wells 2013-12-19 10:29:01 -0700
  • d5f63888a3 Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-12-19 10:15:05 -0700
  • 56461ee795 fix spidering getting stuck bug Matt Wells 2013-12-19 10:14:50 -0700
  • a440e1cbf5 update admin link on root page and documentation for url filters mwells 2013-12-18 19:51:50 -0700
  • a0ceade641 fix oom doleiptable using too much mem so bulk job went oom Matt Wells 2013-12-18 17:20:53 -0800
  • e93cfe8ac6 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-18 15:57:39 -0800
  • 58ce15a7f3 fix big post of 70MB of urls Matt Wells 2013-12-18 15:57:10 -0800
  • 894ced5b08 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-12-18 15:29:54 -0800
  • 356836812d wtf i did not modify these files. Matt Wells 2013-12-18 15:26:50 -0800
  • 60dddfc669 final fixes for parms Matt Wells 2013-12-18 15:22:54 -0800
  • 6f0137889b fixes for getUrlFilterNum so it looks at "hadReply" bit in SPiderRequest when getting diffbot api url. Matt Wells 2013-12-18 14:05:41 -0800
  • 7170c4f0ea rebuild url filters was not getting called when some relavent parms were updated. Matt Wells 2013-12-18 13:24:38 -0800
  • 1b5057ad42 log cleanups mostly. took out disk page cache, kinda buggy... need to fix at some point. Matt Wells 2013-12-18 10:57:18 -0800
  • 2ffad5d835 fix cores Matt Wells 2013-12-17 17:35:07 -0800
  • 2f2333abd1 parmdb fixes Matt Wells 2013-12-17 14:53:33 -0800
  • 33ba8070b5 more bug fixes parmdb Matt Wells 2013-12-17 13:09:05 -0800
  • 31e16c972d fix restart crawl Matt Wells 2013-12-17 11:17:33 -0800
  • 39a0b7f85e parm updates Matt Wells 2013-12-17 10:53:12 -0800
  • d03028ea93 bulk api post truncation fix Matt Wells 2013-12-17 10:03:46 -0800
  • 2cd53386ad parm updates Matt Wells 2013-12-17 09:51:08 -0800
  • 523d32a2ea parmdb updates Matt Wells 2013-12-16 18:13:38 -0800
  • ad4a4415d0 fix pauseCrawl Matt Wells 2013-12-16 17:21:59 -0800
  • 3f19ece776 parmdb updates Matt Wells 2013-12-16 17:07:15 -0800
  • 617a0ff76e parmdb fixes Matt Wells 2013-12-16 16:04:43 -0800
  • 6c652c1cc6 more parmdb fixes Matt Wells 2013-12-16 15:39:24 -0800
  • 9a65febd9e parmdb updates Matt Wells 2013-12-16 14:35:27 -0800
  • 1fe91cad2f parmdb updates Matt Wells 2013-12-16 14:10:39 -0800
  • 9b080ff89c more parmdb bug fixes Matt Wells 2013-12-16 13:36:31 -0800
  • 9be1ab6323 more parmdb fixes Matt Wells 2013-12-16 12:20:13 -0800
  • a727fb10e6 parmdb fixes for checkboxes. use radio buttons. Matt Wells 2013-12-16 11:41:43 -0800
  • 9cb99f7621 Merge branch 'diffbot' into diffbot-testing Matt Wells 2013-12-16 11:06:11 -0800
  • 0615acff17 zero out url filters checkboxes on submit Matt Wells 2013-12-16 11:03:40 -0800
  • 2b10a3327d Merge branch 'master' into diffbot Matt Wells 2013-12-16 10:49:40 -0800
  • 22eb06e54d a few bugfixes imported from neo github subdir Matt Wells 2013-12-16 10:49:13 -0800
  • 660f43cec7 fix bugs of pthreads junk not being async safe. we were calling fprintf from a signal handler (interrupt) while fprintf was currently in progress and the pthread junk did not like that. Matt Wells 2013-12-15 11:41:41 -0700
  • 06f67db16b forgot to unlock thread lock Matt Wells 2013-12-15 10:43:34 -0700
  • 0d976e5a7f include pthread.h Matt Wells 2013-12-15 10:40:50 -0700
  • 7cad5df43e try to fix core from pthreads logging msgs. Matt Wells 2013-12-15 10:38:18 -0700
  • 39c8a9a1d7 Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-12-14 15:20:04 -0700
  • f9f73dae65 fixed core from null json Matt Wells 2013-12-14 15:19:52 -0700
  • 4fdc781a27 fix spiders sticking when coll is immediately deleted after seeding a url. Matt Wells 2013-12-14 10:52:41 -0700
  • 777dfb9713 fix round from incrementing while spiders out. Matt Wells 2013-12-13 20:14:34 -0700
  • b7a96a0a1d minor update Matt Wells 2013-12-13 19:47:11 -0700
  • a8d03b0634 more parmdb bug fixes Matt Wells 2013-12-12 13:57:19 -0800
  • 463e6ced54 Merge branch 'diffbot' into diffbot-testing Matt Wells 2013-12-12 13:02:50 -0800
  • abcbfa7a60 Merge branch 'master' into diffbot Matt Wells 2013-12-12 13:02:12 -0800
  • 7b768d4b86 Merge branch 'diffbot' into diffbot-testing Matt Wells 2013-12-12 13:01:49 -0800
  • 16e91375f4 bring in changes from live beta from ~/github. limit spiders to 50, not 500 to prevent oom. resume killed merges that had num files shrunk even if down to one file. show collnum in spider queue. remove back-to-back whitespace, and make all space a ' ' for getting the doc checksum for deduping. Matt Wells 2013-12-12 12:58:58 -0800
  • 33d4b92544 Merge branch 'diffbot' into diffbot-testing Matt Wells 2013-12-12 12:51:43 -0800
  • a13114605a more parm overhaul fixes Matt Wells 2013-12-12 12:44:54 -0800
  • d85dbfb8e7 do not use safebuf in thread Matt Wells 2013-12-12 10:15:02 -0700
  • 3f8c6378b3 parmdb fixes mwells 2013-12-10 17:45:34 -0800
  • ead1112ea9 some parm overhaul bug fixes mwells 2013-12-10 17:06:27 -0800
  • 76bb3d05e1 clean up logging so i can see what's going on mwells 2013-12-10 16:41:30 -0800