Commit Graph

  • 9489ce6832 now show json items in csv with aligned columns. use search requests as the way to export data now. Matt Wells 2013-11-20 10:45:10 -0800
  • cbc1303a2a make performance table taller. we are losing graphical data still. Matt Wells 2013-11-20 10:10:40 -0700
  • 5baf6a95d4 handle a bunch of oom conditions that caused core. found using oom tester. mwells 2013-11-20 10:14:02 -0700
  • 46a683a904 label the bigger safebuf chunks of mem so we can see a better breakdown of mem on the stats page, not just a big "SafeBuf" allocation. mwells 2013-11-19 23:53:40 -0700
  • ff8491c50e set g_errno in getCollRec() Matt Wells 2013-11-19 15:49:32 -0800
  • b467f70782 fix hosts.conf Matt Wells 2013-11-19 15:41:03 -0800
  • ec4d77f00a make waiting trees grow dynamically to save space. was taking like 1.5GB of ram for like 100 collections or so. Matt Wells 2013-11-19 15:23:25 -0800
  • c669f8c138 fix file descriptor leak in Dir class. try to fix core from Thread getting SIGALRM. try to set NOFILES to 1024 at startup in case more are allowed. Matt Wells 2013-11-19 13:41:56 -0800
  • 35d22bd9aa fix json parser Matt Wells 2013-11-19 09:44:42 -0800
  • 879cd588e0 use -DPTHREADS not _PTHREADS_ Matt Wells 2013-11-19 00:49:43 -0800
  • e909b85638 Merge branch 'master' into diffbot Matt Wells 2013-11-19 00:45:49 -0800
  • 9c62ab362c Revert "use scp not rcp for administrative cmds" Matt Wells 2013-11-19 00:19:21 -0700
  • 7490748139 errno test update mwells 2013-11-19 00:10:10 -0700
  • cec1918fb9 minor change to errno tester comment mwells 2013-11-18 23:18:06 -0700
  • 81a907f073 Merge branch 'master' of github.com:gigablast/open-source-search-engine mwells 2013-11-18 23:14:52 -0700
  • 339fc9d1de added errno thread tester. mwells 2013-11-18 23:14:36 -0700
  • 69df8df18e Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-11-18 22:38:20 -0700
  • aa3a847f17 put some old stuff back until we figure out errno more. Matt Wells 2013-11-18 22:38:12 -0700
  • 64910ee991 fix oops mwells 2013-11-18 22:32:00 -0700
  • 4e71bc0698 use pthreads again until we can verify the stability of the new clone approach. Matt Wells 2013-11-18 22:23:38 -0700
  • a8ffc6e50b indicate diffbot processing errors in the urls csv Matt Wells 2013-11-18 17:38:14 -0800
  • 25dd764dac Merge branch 'master' into diffbot Matt Wells 2013-11-18 16:59:33 -0800
  • 7d3b52fb3a if intersect thread takes forever was causing msg5 reads to block forever and spider round was getting incremented. fixed a few bugs around that issue. Matt Wells 2013-11-18 16:20:30 -0800
  • 2e317df2d2 Merge branch 'testing' Matt Wells 2013-11-18 15:53:57 -0700
  • f85a953a34 fix core dump Matt Wells 2013-11-18 15:53:30 -0700
  • dbcf4630ff show crawl delay in current urls table Matt Wells 2013-11-18 14:31:01 -0800
  • 8d9f000f11 make getNumSpidersOutPerIp() specific to a coll so another coll does not prevent a coll from popuating its own waiting tree. Matt Wells 2013-11-18 14:13:28 -0800
  • 3df310d3ec take out -lpthread. don't need it. Matt Wells 2013-11-17 22:25:19 -0700
  • cc1d117e55 use scp not rcp for administrative cmds like './gb installgb' most ppl do not have rcp on their system any more. Matt Wells 2013-11-17 20:49:38 -0700
  • 5022ea4d6e try ditching pthreads and using straight-up errno. it seems perhaps each clone() gets its own copy of errno now? Matt Wells 2013-11-17 19:43:20 -0700
  • 91279ff475 committing an abandoned asyncio project. mwells 2013-11-17 19:15:38 -0700
  • dfab4ee13d fixed bugs with advanced.html advanced search page. made stats graph only show last 5 minutes of stats. tends to make the graph look more continuous. do not use ajax to fetch the search results unless this is running in matt wells' datacenter. it is only an anti bot scraping measure and unnecessarily complicates things for others. Matt Wells 2013-11-17 14:58:47 -0700
  • 64ef37db2f Merge branch 'master' of git@github.com:gigablast/open-source-search-engine Matt Wells 2013-11-17 11:48:29 -0700
  • 5cf25cfa3f fix graphing bug when graphing performance graph. #background-color:xxxxxx was not always 6 hex digits and would not render right because of that. Matt Wells 2013-11-17 11:48:17 -0700
  • 12a6783a24 Update README.md Gigablast 2013-11-16 20:14:06 -0800
  • dc226dde0e fix LinkInfo mem leaks mwells 2013-11-16 17:50:32 -0800
  • 75e35b0c8d fix pthread_join bug some more. Matt Wells 2013-11-16 18:34:06 -0700
  • e5da0ac967 do not try to join on thread when pthread_create() fails to create thread. was causing core. Matt Wells 2013-11-16 18:28:49 -0700
  • e756aabf4b allow redirects to goog and bing again Matt Wells 2013-11-16 12:11:25 -0700
  • e27646c088 cleanup fixes. Matt Wells 2013-11-15 15:01:56 -0700
  • f43b7dca13 Merge branch 'master' into testing Matt Wells 2013-11-15 14:47:35 -0700
  • 5e30728a3a new graphic icons. minor clean ups. Matt Wells 2013-11-15 14:47:05 -0700
  • e04deb82d1 log when url matches page process pattern and which pattern it matches. Matt Wells 2013-11-15 13:11:05 -0800
  • c9af8adf6e when someone deletes/resets a coll we clear the lock table and the msg12 handler can not confirm a lock request since the table was cleared out! so do not core!! Matt Wells 2013-11-15 12:37:27 -0800
  • 6495dfd86e try to fix json parser overflow error. needs testing. tried to fix round num from incrementing for little job because i think server overload. should be fixed right some time. just made wait time 30 secs instead of 10 in Spider.cpp. Matt Wells 2013-11-15 11:30:16 -0800
  • 6b9d0656ff Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-11-15 09:36:09 -0800
  • 3258360679 do not do array break up substition logic on diffbot replies if not of type product or image. it was breaking up the images array WITHIN an article type. Matt Wells 2013-11-15 09:35:32 -0800
  • bf75ac6a0d fix page process pattern parsing Matt Wells 2013-11-15 09:34:47 -0800
  • 3563f0643f fix little but of using product not image Matt Wells 2013-11-15 09:13:27 -0800
  • fe1a7d1a75 rdbbase not fully resetting? it was trying to dump to coll directories that had been moved to trash folder. and printing out "deleted from under us". at least it was corrupting data in RdbMem this time because i added m_dumpErrno logic. Matt Wells 2013-11-15 09:01:58 -0800
  • 9ed40a1112 hacky hacks Matt Wells 2013-11-14 16:59:50 -0800
  • bb964ac214 fix core Matt Wells 2013-11-14 16:28:23 -0800
  • b0e40ae68b fix bad json bug Matt Wells 2013-11-14 15:05:15 -0800
  • 1518778405 fix for bad json splicing Matt Wells 2013-11-14 14:42:31 -0800
  • 7fc8b6a005 fix oopsy Matt Wells 2013-11-14 14:09:05 -0800
  • 7c84b6ee0b show restart crawl button Matt Wells 2013-11-14 14:07:45 -0800
  • 62432b3530 support for &restart=1 Matt Wells 2013-11-14 14:02:56 -0800
  • 3033684a8d fix for json parsing. added restart=1 support Matt Wells 2013-11-14 13:16:08 -0800
  • 9059aa8a01 fix link Matt Wells 2013-11-14 12:53:49 -0800
  • be213ca28f now fix embedded products and images in the diffbot json reply properly! Matt Wells 2013-11-14 12:51:34 -0800
  • 28cd1e6490 you can submit action then expression now. Matt Wells 2013-11-14 09:54:36 -0800
  • 8534914902 fix core when xmldoc::getmsg20reply is called Matt Wells 2013-11-14 09:32:18 -0800
  • 5c0194c439 fix json validation bug Matt Wells 2013-11-13 19:29:33 -0800
  • eb719849a6 do not core on this dump error Matt Wells 2013-11-13 19:04:22 -0800
  • da013d1b18 fix invalid json bug of not ending json items in images/products array Matt Wells 2013-11-13 18:44:15 -0800
  • 45cc9bb112 fix a few nasty bugs Matt Wells 2013-11-13 18:31:26 -0800
  • a5c3b3b8f8 fix so spider does not say it is done crawling right after you seed it! Matt Wells 2013-11-13 16:03:15 -0800
  • 7020f66daa bulk api nominal updates Matt Wells 2013-11-13 14:30:51 -0800
  • 9e77f1b2f6 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-11-13 13:27:45 -0800
  • a31b13ad61 fix a few bugs. Matt Wells 2013-11-13 13:27:22 -0800
  • 42b5638680 added log msg mwells 2013-11-13 00:57:59 -0700
  • 6cc4e6d980 added some more links to my gui Matt Wells 2013-11-12 17:05:13 -0800
  • 7f038235e1 hack in a type:product or type:image since product and image json elements are taken from an array and lack those. Matt Wells 2013-11-12 16:57:14 -0800
  • df28c4e0c2 search results in csv format. remove serps per page limit if custom crawl. Matt Wells 2013-11-12 16:33:45 -0800
  • 38c8bec024 use gbspiderdate not spiderdate. so gotta use gbsortby:gbspiderdate etc. Matt Wells 2013-11-12 13:55:47 -0800
  • fbcd6b8afd display json objects that are not in arrays in csv. show csv header. how to deal with heterogenous object lists? index spiderdate: for gbsortby:spiderdate. added gbrevsortby: support. Matt Wells 2013-11-12 13:51:52 -0800
  • 364216ff16 fixed bugs in sort by prices, etc. Matt Wells 2013-11-11 18:58:45 -0800
  • 4548098809 a couple more nominal updates Matt Wells 2013-11-11 16:10:47 -0800
  • c52dae8ee6 Update Abbreviations.h malik almhyd 2013-11-12 03:02:26 +0300
  • ad61e9ea5a /v2/bulk api updates. Matt Wells 2013-11-11 15:52:04 -0800
  • 7248641bc4 fix mem leaks. turn off electric fence. Matt Wells 2013-11-11 09:58:14 -0800
  • 7efb743e65 nothing Matt Wells 2013-11-10 22:25:19 -0800
  • 5aa1609350 Merge branch 'master' into diffbot Matt Wells 2013-11-10 22:11:39 -0800
  • af678b7c1b fix a few bugs. Matt Wells 2013-11-10 22:11:13 -0800
  • 105a201cde fix mem leak. check if tree writes are disabled and block until not when deleting/resetting a collection. just like we do it tree is being saved. Matt Wells 2013-11-10 16:28:00 -0800
  • 810a6918fd Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot Matt Wells 2013-11-10 09:41:19 -0800
  • 3afac4812d fix bug of trying to del/reset coll while disable writing was engaged. we already had it check to see if tree was saving, but not if writes were disabled. so it gets ETRYAGAIN and retries later. Matt Wells 2013-11-10 09:40:32 -0800
  • b1e98aa4b8 fix core. Matt Wells 2013-11-08 21:33:37 -0700
  • e395628d5a use &format=0 1 or 2 for html/xml/json now. use &icc=1 to get dump of json objects in serps. Matt Wells 2013-11-08 18:00:30 -0800
  • aa9a77674f fixed oopsy when parsing float words Matt Wells 2013-11-08 16:25:23 -0800
  • 09f28b2f26 now we index all numbers that have field names (so can't just be a number in the body) but it can be in a meta tag or json item. then use like gbsortby:products.offerPrice to sort the search results (json objects) by that. Matt Wells 2013-11-08 16:16:13 -0800
  • 9895ad093f fix that pesky spider start time bug. Matt Wells 2013-11-07 16:43:02 -0800
  • a76f4c6974 just POST a full request for webhook now so we can do application/json content type Matt Wells 2013-11-07 14:20:15 -0800
  • ab9a3b1798 out download links to api output for a crawl Matt Wells 2013-11-07 14:07:38 -0800
  • 3e4db4f1bc show all crawl details in url webhook notification in the post body. Matt Wells 2013-11-07 13:59:43 -0800
  • 2ae04cff71 return crawl delete reply in json. take out EDOCEVILREDIRECT errors. Matt Wells 2013-11-07 09:55:47 -0800
  • 3b929917d1 do not site cluster or do dup removal in crawlbot search results Matt Wells 2013-11-07 09:40:31 -0800
  • 396a88799a fix bad bug of basically emptying out all our data on auto-save! Matt Wells 2013-11-06 19:49:20 -0800
  • 73de13b2da fix core. Matt Wells 2013-11-06 17:15:29 -0800
  • 0655160c26 fixed quite a few nasty bugs. collectionrec neg/pos key counting overruns. Matt Wells 2013-11-06 15:44:50 -0800