Commit Graph

  • ac5d124ee0 experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable) Michael Christen 2012-04-13 06:47:33 +02:00
  • 8f89c8ef07 added information about inbound, outbound and citation links into yacydoc api servlet Michael Christen 2012-03-31 07:38:49 +02:00
  • 71649a1296 added an api to retrieve the new citation.index with the webstructure.xml api. This api will respond with details about a single URL if requested with 'webstructure.xml?about=[url|urlhash|host]'. Michael Christen 2012-03-29 17:22:31 +02:00
  • 8fc86fe397 added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets. Michael Christen 2012-03-29 17:20:14 +02:00
  • 22f05c83ff fixed default must-match filter for full domain crawls - the old filter was to restrictive and did not allow intranet crawls Michael Christen 2012-03-28 21:50:00 +02:00
  • 3e61287326 some better feedback on properties change Lotus 2012-03-25 22:21:42 +02:00
  • 96ac95cff9 added hint how to change integration options Lotus 2012-03-23 17:02:50 +01:00
  • 4f61b8fd82 Fixes for compare-search Thomas 2012-03-21 21:43:47 +01:00
  • e0680de7b3 Remove Scroogle from compare-search, Scroogle is dead Thomas 2012-03-20 23:00:06 +01:00
  • 78f0d8f046 no focus on preview frames for search integration fixes bug http://bugs.yacy.net/view.php?id=161 Lotus 2012-03-17 21:10:29 +01:00
  • 0b3f39136e allow custom ppm lower than minimum button on /Crawler_p.html fixes http://bugs.yacy.net/view.php?id=166 Lotus 2012-03-17 20:43:19 +01:00
  • e14eb9de82 checkalive.sh: try to fetch only once (default: 20) Lotus 2012-03-12 09:30:44 +01:00
  • 7792ac6406 fix links & bug #163 Lotus 2012-03-10 10:59:56 +01:00
  • 532c7cf827 added physics experiment to the graph plotter. not active by default Michael Peter Christen 2012-02-28 13:18:46 +01:00
  • aba9b1bfa0 better names for elements of a linked graph Michael Peter Christen 2012-02-27 21:27:17 +01:00
  • 0cc0290978 bugfix for a must-not-match pattern check. This bug did not make the check semantically wrong, but a trick that prevented an IP lookup in case that the filter was not used did not work. That bugfix causes that crawling gets a huge speed boost for noload urls! Michael Peter Christen 2012-02-27 00:52:44 +01:00
  • 2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html Michael Peter Christen 2012-02-27 00:42:32 +01:00
  • 8aba045ba1 if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given. Michael Peter Christen 2012-02-26 20:53:32 +01:00
  • fa7b3481b3 better navigation in file search: less results by first try, but much faster. after the first search is done, buttons appear to get more results for the same search Michael Peter Christen 2012-02-26 17:32:45 +01:00
  • 5fd2c30318 adjust Netbeans project class path settings to updated httpclient and commons jars reger 2012-02-26 00:06:57 +01:00
  • aae75def69 fix: prevent logging of Solr doc content reger 2012-02-26 00:04:25 +01:00
  • 8c06925984 animation of the web structure picture Release_1.02 Michael Peter Christen 2012-02-25 15:42:29 +01:00
  • 898fa7c3f3 use tld heuristic to check if a domain is local or global Michael Peter Christen 2012-02-25 15:41:20 +01:00
  • 213c8d97f2 use less proccesses in process pool Michael Peter Christen 2012-02-25 14:07:20 +01:00
  • c639248c23 protection against strange answers from remote peers during search Michael Peter Christen 2012-02-25 14:07:02 +01:00
  • 9c51db4243 Release_1.02 Michael Peter Christen 2012-02-25 12:59:19 +01:00
  • 36e4d82b27 changed ranking Michael Peter Christen 2012-02-25 12:58:12 +01:00
  • 99c74699de removed scroogle (scroogle is dead) Michael Peter Christen 2012-02-25 12:57:59 +01:00
  • f7ed050771 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-02-25 12:44:02 +01:00
  • 096c17e7cd added test code Michael Peter Christen 2012-02-25 12:42:13 +01:00
  • 84f506da68 update installed jre version Lotus 2012-02-24 09:11:48 +01:00
  • 6e51a00a2f Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size" Michael Peter Christen 2012-02-24 02:46:56 +01:00
  • 73f5a9e8b3 fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size Michael Peter Christen 2012-02-24 02:31:03 +01:00
  • 9c51dc0f13 fixed a bug with navigation: if a navigation was applied to file type or protocol, then it was not possible to remove that again. This is the fix for that. Michael Peter Christen 2012-02-24 02:28:40 +01:00
  • 665626a51b catch OOM errors during scanning Michael Peter Christen 2012-02-24 02:15:27 +01:00
  • 8bfc987374 enhanced hint how to enter file:// urls Michael Peter Christen 2012-02-24 02:14:54 +01:00
  • f838997126 updated commons io from 2.0.1 to 2.1 Michael Peter Christen 2012-02-24 01:35:01 +01:00
  • 1cd711d005 added classes for citation references (for new citation ranking) Michael Peter Christen 2012-02-24 01:07:15 +01:00
  • eeb57ae824 updated http client libraries Michael Peter Christen 2012-02-24 01:06:30 +01:00
  • 33a405dab8 ipv6 bugfix Michael Peter Christen 2012-02-24 00:50:46 +01:00
  • c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148 Michael Peter Christen 2012-02-24 00:38:57 +01:00
  • edaa8ac94c Merge commit 'e15e633a0128b8d31011283a65b4ef26a6dddcd8' Michael Peter Christen 2012-02-23 10:07:13 +01:00
  • e15e633a01 Bugfix for IE9 (doesn't accept html form within form) changes of API schedule row data changed form input form to unique field names using row pk. Fix for issue 96 http://bugs.yacy.net/view.php?id=96 reger 2012-02-23 02:40:07 +01:00
  • 716db3b79a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-02-23 01:26:09 +01:00
  • e0f1e7d904 added new citation reference data structure that shall be used for a citation ranking Michael Peter Christen 2012-02-23 01:22:29 +01:00
  • e18a4f6b74 more tolerant merge iterator Michael Peter Christen 2012-02-23 01:21:24 +01:00
  • 0d148c3353 more logging in resource observer Michael Peter Christen 2012-02-23 01:20:42 +01:00
  • 2fa037ae1d enhanced crawler Michael Peter Christen 2012-02-23 01:20:24 +01:00
  • 43ffae6590 delete yacy.running after kill as requested in http://forum.yacy-websuche.de/viewtopic.php?t=3835 Lotus 2012-02-22 18:41:32 +01:00
  • e101c2e0e2 added changes from copperdust (submitted by email): 1. Improved and fixed language detection: 1.1 Identificator.java - recognition fix (improved) 1.2 DCEntry.java - fix (changed detection order due to detection from tld in many cases is incorrect) 1.3 MultiProtocolURI.java - fixed and enhanced language from tld detection (all currently used top-level domains; ccTLD added but not tested). 2. Ukrainian language update. 3. Main Slavic languages langstats (tested and works fine). Michael Peter Christen 2012-02-22 12:21:27 +01:00
  • 58e08b1211 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-02-21 22:31:49 +01:00
  • a9b4d49b75 removed debug output Michael Peter Christen 2012-02-21 22:31:14 +01:00
  • 2120db289a *) Small change which should solve problem with cgitb module in Python CGI scripts. low012 2012-02-14 20:54:19 +01:00
  • ee89cf5ae5 fix must match filter for full domain crawl Lotus 2012-02-07 16:13:13 +01:00
  • 4f92389550 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-02-03 23:34:24 +01:00
  • 8d63a5887c bugfixes Michael Peter Christen 2012-02-02 23:38:23 +01:00
  • 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order. Michael Peter Christen 2012-02-02 21:33:42 +01:00
  • 7e4e3fe5b6 free some memory after parsing html Michael Peter Christen 2012-02-02 09:55:27 +01:00
  • 4540174fe0 memory hacks Michael Peter Christen 2012-02-02 07:37:00 +01:00
  • b4409cc803 small redesign of blob column index and usage Michael Peter Christen 2012-02-02 06:43:57 +01:00
  • d5c1f2746e performance hack Michael Peter Christen 2012-02-02 06:43:15 +01:00
  • 803963aebd performance hack: better space grow in CharBuffer (speeds up html parser) Michael Peter Christen 2012-02-01 23:27:59 +01:00
  • 8b0920b0b5 tried to fix the ipv6 problem as reported in bug Michael Peter Christen 2012-02-01 22:26:19 +01:00
  • e2f8f263e8 changed storage of search words: keep order Michael Peter Christen 2012-02-01 18:13:31 +01:00
  • ed39ef2890 changed generation of protocol information Michael Peter Christen 2012-02-01 18:12:59 +01:00
  • 0b67a0a5d8 added a column index for tables in blob files. This is heavily used during receiving of DHT submissions and when answering remote search requests. Both events together may have caused IO-deadlocking and this commit shall fix that. Michael Peter Christen 2012-02-01 15:11:21 +01:00
  • ffb72249ea added missing apicat.sh Michael Peter Christen 2012-02-01 00:49:40 +01:00
  • c166eb68b6 fixes in solr schema file Michael Peter Christen 2012-02-01 00:22:43 +01:00
  • 2e5cd6a1b2 fixed parser extension deny list generation and usage Michael Peter Christen 2012-02-01 00:15:59 +01:00
  • 8bee1472c9 there is no noindex, only nofollow in links Michael Peter Christen 2012-01-31 23:46:35 +01:00
  • 5e18f54a8c added shell script to get a servlet. this is the same as apicall.sh but it prints the result to stdout Michael Peter Christen 2012-01-31 23:21:49 +01:00
  • 3cd6dcd352 do not add new solr fields as activated fields Michael Peter Christen 2012-01-31 22:21:48 +01:00
  • e3bb73c3d6 serialized some database access methods Michael Peter Christen 2012-01-31 21:13:49 +01:00
  • 9727015213 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-01-31 18:18:13 +01:00
  • 7e728867e5 added a synchronization around iterations to prevent IO-deadlocking during concurrent remote search requests Michael Peter Christen 2012-01-31 18:17:25 +01:00
  • f077b11d38 Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git david 2012-01-30 20:02:11 +01:00
  • 29675d9766 more label on search options (usability) Lotus 2012-01-30 20:02:02 +01:00
  • 355ecf330f reduced target file site to 64mb Michael Peter Christen 2012-01-29 20:35:48 +01:00
  • fa1f35b0c8 Merge rc1/master reger 2012-01-29 20:06:10 +01:00
  • b4bc1e2875 remote search does not do snippet generation Michael Peter Christen 2012-01-29 19:25:09 +01:00
  • 46c986f8f7 Merge rc1/master reger 2012-01-29 00:48:54 +01:00
  • 335a776351 xss hardening on Status.html Lotus 2012-01-28 13:25:12 +01:00
  • 55518c600f Merge rc1/master reger 2012-01-26 22:43:34 +01:00
  • 943165c9a4 upd Netbeans IDE lib setting reger 2012-01-26 22:25:42 +01:00
  • 10ae6d94a1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-01-26 18:11:06 +01:00
  • 2ea585d616 fix for host navigator Michael Peter Christen 2012-01-26 18:10:34 +01:00
  • 2f6dde92e2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-01-26 16:45:33 +01:00
  • c560a582ac fix for single-word vocabulary lines Michael Peter Christen 2012-01-26 16:44:30 +01:00
  • 4c5edab1ec added option to have exception search result windows Michael Peter Christen 2012-01-26 15:32:30 +01:00
  • 329e3eebcf added example vocabularies and explanation how to use them Michael Peter Christen 2012-01-26 11:20:14 +01:00
  • 046d7de95b Merge remote branch 'reger/master' Michael Peter Christen 2012-01-26 10:47:40 +01:00
  • a95f645a61 Bugfix class repository.Loaddispatcher fixed download file limit of 10000 reger 2012-01-26 04:10:44 +01:00
  • 32adad7dd5 show less navigation by default Michael Peter Christen 2012-01-26 01:09:34 +01:00
  • ef78f22ee1 performance hack Michael Peter Christen 2012-01-25 12:48:48 +01:00
  • 41536eb4a2 performance hack Michael Peter Christen 2012-01-25 12:28:56 +01:00
  • 88b86afc89 no DoS protection for intranet mode Michael Peter Christen 2012-01-25 12:13:03 +01:00
  • 0f443ac755 automatic switching off of navigation that is not useful Michael Peter Christen 2012-01-25 12:07:24 +01:00
  • 852ce43d99 better rules for default open/close of navigation objetcs Michael Peter Christen 2012-01-25 11:53:25 +01:00
  • f91487fc50 added delete-button for host navigation Michael Peter Christen 2012-01-25 11:19:18 +01:00
  • e8d24fd802 author navigator can be switched off Michael Peter Christen 2012-01-25 11:11:42 +01:00