db74af766bfix core in addExistingColl()
mwells
2013-12-10 15:46:38 -0800
82494baa89move CollectionRec stuff into Collectiondb files for simplicity.
mwells
2013-12-10 15:28:04 -0800
14b0682d6bcan't use safebuf in a thread. oops!
Matt Wells
2013-12-10 14:20:44 -0700
22271c0bb2do not accept msg4 add requests until in sync with host 0
mwells
2013-12-10 13:20:23 -0800
f2d5661965parmdb overhaul. support collection add/del sync when host comes back online. use udp not tcp. host #0 can now handle a new incoming request while a parm change is currently outstanding. all missed "command" parms will be received when a dead host comes back online, too, like a tight merge for instance. does not use msg4, uses msg3e and msg3f for syncing and sending parms.
mwells
2013-12-10 13:09:55 -0800
1175478705got this new parm shit compiling
mwells
2013-12-10 12:54:19 -0800
9e1976a8e2new parm stuff almost compiling.
mwells
2013-12-10 11:13:43 -0800
6f6c4aed84minor admin.html edit.
Matt Wells
2013-12-10 10:39:38 -0700
1a7d5e389bvery minor admin.html edit
Matt Wells
2013-12-10 00:56:56 -0700
ec2254d8edadded multi language support note to admin.html
Matt Wells
2013-12-09 23:18:33 -0700
f7e7acb398minor log msg updates. updated admin.html to give some performance and storage capacity info.
Matt Wells
2013-12-09 23:16:24 -0700
95bd6238d9do not core when running filters when our gb home dir is really long. thanks bill! call XmlDoc::getSpiderPriority() with a SpiderReply so we can act on m_langId, like chinese, for instance, to filter those langs out from indexing. it was doing this before but got commented out for some reason.
mwells
2013-12-09 22:55:02 -0700
cc63fd048fMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
mwells
2013-12-09 13:46:08 -0800
2a5d4beec4fix core from last push.
Matt Wells
2013-12-09 14:21:46 -0700
fa497de217remove annoying log msg
Matt Wells
2013-12-09 14:09:48 -0700
44ae7c4de6mem labelling fixes. fixed bad alloc when generating gigabits.
Matt Wells
2013-12-09 14:05:02 -0700
0dcd1211d3new opensource icon.
Matt Wells
2013-12-08 19:47:39 -0700
92ec3f1148added open source icon to homepage
Matt Wells
2013-12-08 19:45:49 -0700
92e3d841a6minor update
Matt Wells
2013-12-08 19:28:45 -0700
12404b4f85doc updates
Matt Wells
2013-12-08 19:26:48 -0700
dd3b49faa9collection name hell
Matt Wells
2013-12-08 16:44:37 -0700
3353a90a85fix resuming a killed merge condition.
Matt Wells
2013-12-08 15:50:45 -0700
ed79b67d2ecore dump fixes
Matt Wells
2013-12-08 15:36:23 -0700
144e2c898esave resources by not doing reads on an empty doledb priority. stop saving allSpidersOn and Off parms.
Matt Wells
2013-12-08 14:07:31 -0700
a2e52a5dc3little fix
Matt Wells
2013-12-08 10:15:54 -0700
020d7741b9new coll.conf for main with ismedia filter. updated url filters docs some more for "isnew" and explained the errorcount stuff more.
Matt Wells
2013-12-08 10:10:51 -0700
65e75167e3limit posdb merging to 8 files max. added some more url filters documentation.
Matt Wells
2013-12-08 09:41:05 -0700
78a4cfe6daforgot to push the .h files
Matt Wells
2013-12-07 22:12:48 -0700
e1712fc94ffix uninitialized diffbot titlerec header parms. ignore them when not a custom crawl.
Matt Wells
2013-12-07 22:11:26 -0700
06edfddf31a bunch of bug fixes, mostly spider related. also some for pagereindex.
Matt Wells
2013-12-07 21:56:37 -0700
5e4b5a112cMerge branch 'master' into diffbot
Matt Wells
2013-12-07 11:34:26 -0700
105be1fbdcmore core fixes
Matt Wells
2013-12-07 10:38:47 -0700
8d92a079c2minor spider error reply time fix
Matt Wells
2013-12-07 10:21:51 -0700
e731e5a4d8Merge branch 'diffbot' of git@github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-12-07 10:21:21 -0700
0e846a9389minor spider reply error fix
Matt Wells
2013-12-07 10:21:02 -0700
626a97770canother core fix
Matt Wells
2013-12-07 10:14:37 -0700
fda7b48500fix core
Matt Wells
2013-12-07 10:11:13 -0700
1bc80ab552fixed pagereindex. we now add spiderreplies for internal errors like ENOMEM or ENOTFOUND to try to avoid the "CRITICAL CRITICAL" msgs. these are considered temporary errors.
Matt Wells
2013-12-07 10:01:17 -0700
d9b31d3481quick bug fix
Matt Wells
2013-12-06 22:57:49 -0700
269c10f648try to figure out why pagereindex never displayed html page when done.
Matt Wells
2013-12-06 22:56:06 -0700
adf9d807eaMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
mwells
2013-12-06 12:31:36 -0800
08faf78be9checkpoint for new parm logic to allowing syncing with newly added or deleted collections even if a host was dead when collection was added/deleted. also added parm change request queueing.
mwells
2013-12-06 12:29:14 -0800
e7bd904765fix docids only printing.
Matt Wells
2013-12-06 09:53:32 -0700
c50ef1954fshow admin controls on serps if ip is local. fixed up the "reindex" page for deleting/reindexing search results for a given query.
Matt Wells
2013-12-06 09:48:30 -0700
4b3e111bedfix spider dumping to remember uh48's between list readings. was showing dups for www.nordicusa.com/webtv at the end.
Matt Wells
2013-12-05 10:09:06 -0800
99cc10fccdallow seed urls to match url crawl pattern regardless.
Matt Wells
2013-12-03 17:13:38 -0800
432099c4e6added rebuild=true fix for regex crawl change
Matt Wells
2013-12-03 16:23:58 -0800
2e46bcc97fMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-12-03 16:23:20 -0800
03219a3057add regex support back in
Matt Wells
2013-12-03 16:23:05 -0800
6ab9041f45fix bug when just getting the crawl parms was rebuilding the waiting tree.
Matt Wells
2013-12-03 16:17:36 -0800
9f1d79b124check for null collrec
Matt Wells
2013-12-02 10:13:19 -0800
cda5968b75update common word list
Matt Wells
2013-12-01 15:19:33 -0700
39f8dc646bdefault gigabits on for my copy.
Matt Wells
2013-12-01 15:07:06 -0700
7f4dca7a07Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
Matt Wells
2013-12-01 14:47:16 -0700
7874c8d832added ifdef NEEDSLICENSE
Matt Wells
2013-12-01 14:47:08 -0700
d43b55103cshow query in msg20 log msg
Matt Wells
2013-12-01 12:11:25 -0700
1077191e4afix log msg bug.
Matt Wells
2013-12-01 12:08:05 -0700
08030865e4fix compiler warning
Matt Wells
2013-12-01 11:57:26 -0700
d811a13627fix small oopsy
Matt Wells
2013-12-01 11:56:33 -0700
3155869fbfadded new log msg for recording cpu time for summary generation.
Matt Wells
2013-12-01 11:53:41 -0700
5ee2be8fcffixed data corruption bug. m_finalCrawlDelay was being stored in xmldoc titlerec header.
Matt Wells
2013-11-27 14:18:15 -0800
1129e9b635Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-27 14:09:54 -0800
57eb231a4edo not add timestamps to lastdownload cache if skiphammercheck is true. those are like robots.txt or redirs or root files.
Matt Wells
2013-11-26 14:21:17 -0800
0f3374e3f3measure crawl delay by default from start of each download now. it is a parm in msg13request.
Matt Wells
2013-11-26 14:07:28 -0800
4769ca0881if pthread_create() returns EAGAIN then do not always retry, it makes an infinite loop.
Matt Wells
2013-11-26 14:52:07 -0700
8bb086ac60crawldelay works now but it measures from the end of the download, not the beginning.
Matt Wells
2013-11-26 12:58:14 -0800
1c7c9a4d80Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-26 09:19:26 -0800
040bdb8039fix url filters formulation. fixed extra , in json. fixed upp and ucp patterns if all substrings are negative.
Matt Wells
2013-11-26 09:17:38 -0800
ca544ddb90Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-25 15:06:11 -0800
1bbbcff755fix getTokenizedDiffbotReply() to look for type: with a {} depth of 1 so it does not pick up on the type:image in the images array if there is one in the article.
Matt Wells
2013-11-25 13:58:31 -0800
61ce4be279fix major bug when you have twins/mirrors. queries not returning all the results.
Matt Wells
2013-11-25 09:53:53 -0700
9a456de178minor fix
Matt Wells
2013-11-24 20:48:47 -0700
5da41cd113fix a couple different cores.
Matt Wells
2013-11-24 19:46:44 -0700
41ce557627Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-22 18:26:53 -0800
e8065a0f0aenforce crawl delay perfectly.
Matt Wells
2013-11-22 18:26:34 -0800
1826860094forgot to add diffbot api url parm
Matt Wells
2013-11-22 17:55:37 -0800
f235a20752add ! support to all patterns
Matt Wells
2013-11-22 17:52:14 -0800
c3517ee019Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-22 17:37:42 -0800
bc251e17f5hosts.conf fix
Matt Wells
2013-11-22 14:18:03 -0800
791036aabbMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-22 14:17:34 -0800
3cc300bf03spider log debug msg fix. boost max cpu threads to 10, seems to have many cores usually.
Matt Wells
2013-11-22 14:17:10 -0800
e0a15194e1fix json double decoding issue. no more partial decodes, json parser stores fully decoded string into separate buf.
Matt Wells
2013-11-22 14:16:14 -0800
6b36ddfd31Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-22 11:14:35 -0800
9d9a976b4ffix bug of perpetual round incrementing ad nauseam.
Matt Wells
2013-11-22 11:14:03 -0800
c8da2a5af7fix core
Matt Wells
2013-11-22 09:47:12 -0700
8a58969ab8try to fix core. log redirects.
Matt Wells
2013-11-22 00:41:33 -0800
79df39655fMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-21 12:38:03 -0800
2a5d92a639log debug update.
Matt Wells
2013-11-21 12:37:53 -0800
f4de986c7etest to make sure diffbot reply contains "url":" field. try to find out why some diffbot replies are truncated.
Matt Wells
2013-11-21 12:37:08 -0800
14e2164acdoopsy
Matt Wells
2013-11-20 23:40:30 -0700
acac80d4a9fix core in summary generation highlighting.
Matt Wells
2013-11-20 23:38:28 -0700
4a83415832Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2013-11-20 16:44:41 -0800
dcae4682e8new api. tossed action/expression and added urlCrawlPattern/urlProcessPattern/apiUrl
Matt Wells
2013-11-20 16:41:28 -0800
6f4508c8f1fix issue of bulk job spidering links because of a simplified redirect.
Matt Wells
2013-11-20 16:09:50 -0800
43e40208b8Merge branch 'master' into diffbot
Matt Wells
2013-11-20 15:51:58 -0800
d2751211fedo not spider links in XmlDoc::spiderLinks() if its a custom bulk job. put in logIt() too.
Matt Wells
2013-11-20 15:46:17 -0800