Commit Graph

  • 2a6499364d *) minor changes low012 2010-12-27 15:53:41 +00:00
  • c0274bd123 *) minor changes low012 2010-12-27 15:37:11 +00:00
  • e38217fe88 small changes to scanner orbiter 2010-12-26 23:21:34 +00:00
  • fe46536f6e enhanced network scanner (less name resolving during scanning and no name resolving during search) orbiter 2010-12-26 16:25:17 +00:00
  • a083cf531e added skin for 27c3 orbiter 2010-12-22 00:07:46 +00:00
  • e753027c43 fix for http://forum.yacy-websuche.de/viewtopic.php?p=21439#p21439 orbiter 2010-12-22 00:04:30 +00:00
  • bf4ef1513e - fix for map view - remove some UNRESOLVED PATTERN - maybe a fix for non-flushing cache orbiter 2010-12-21 23:48:05 +00:00
  • 6b70393d1d - new java version 1.6 - replaced old gif animator by java 1.6 gif animator orbiter 2010-12-20 22:51:50 +00:00
  • e88c428008 fix to ftp loader orbiter 2010-12-18 10:22:54 +00:00
  • 59b70a5a92 another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files orbiter 2010-12-18 00:44:19 +00:00
  • 9b25a33fd9 - fixed numerous bugs - better document names - fixed problem with ftp crawling - added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed. orbiter 2010-12-17 17:30:09 +00:00
  • 7bdb13bf7f more fixes to smb crawling: better file names orbiter 2010-12-17 00:52:24 +00:00
  • 94c48500cc several fixes orbiter 2010-12-17 00:11:42 +00:00
  • 0ac7311a62 fix for token parser orbiter 2010-12-16 23:47:36 +00:00
  • 58b59f9bc8 - a collection of bug fixes and some redesign of the Scanner class - fixed smb crawling - added smbget to download script generation orbiter 2010-12-16 23:37:21 +00:00
  • 4d5bb4c4ca *) Urks... low012 2010-12-16 18:51:20 +00:00
  • c54170421a fix for npe orbiter 2010-12-16 11:19:22 +00:00
  • c288fcf634 redesigned CrawlStartScanner user interface and added more features: - multiple hosts for environment scans can be given (comma-separated) - each service (ftp, smb, http, https) for the scan can be selected - the scan result can be accumulated or refreshed each time a network scan is made - a scheduler was added to repeat a scan and add all found urls to the indexer automatically orbiter 2010-12-16 02:15:20 +00:00
  • 6f4f957e50 *) cleaning up the code a little bit low012 2010-12-16 00:18:05 +00:00
  • 2521677a45 * deny adminForLocalhost and intranet network setup also on bootup and not only on network switch * require authentication for yacybot what ever adminForLocalhost is set to (after this patch, is the rule from above really nesseccary, the crawler also checks the robots.txt) f1ori 2010-12-15 21:39:02 +00:00
  • 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de) f1ori 2010-12-15 19:20:00 +00:00
  • 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls - integrated new parser into loader processes: enrich document parser - fixed a concurrent modification exception in kelondro iterator - hand-over of document size from crawler to indexer orbiter 2010-12-15 00:03:19 +00:00
  • 358feeeb39 enhanced speed and usability of network scanner servlet orbiter 2010-12-14 12:12:13 +00:00
  • 99a7fe87f9 - removed old intranet scanner (the generic scanner now completely subsumes the old one) - added information about granted access - enhanced servlet design - added submit-feedback (because it is a long-running task) orbiter 2010-12-14 01:14:15 +00:00
  • acab6801d9 added new network scanner - you can scan any ip or host in the internet for services - this replaces the intranet scanner orbiter 2010-12-13 18:19:37 +00:00
  • 586cbee2bb *) Fixed links to CrawlStart-page low012 2010-12-11 15:23:35 +00:00
  • 14e4fae8e9 fixes to ftp client orbiter 2010-12-11 09:21:37 +00:00
  • a563b05b60 enhanced crawler: - added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big - the noload queue is emptied with the parser process which indexes the file names only - the 'start from file' functionality now also reads from ftp crawler orbiter 2010-12-11 00:31:57 +00:00
  • c36da90261 added a very fast ftp file list generator to site crawler: - when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once - the harvester runs concurrently and feeds into the normal crawl queue orbiter 2010-12-09 17:17:25 +00:00
  • 4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. orbiter 2010-12-08 10:50:23 +00:00
  • fc2e41e691 added a forwarder for the default page. The forwarder forwards a browser to a different page if the root file index.html is accessed. This can be done by setting the name of the forwarder page to the field "Default index.html Page (by forwarder)" in /ConfigPortal.html The purpose is to forward to /yacyinteractive.html for the 27C3 FTP search plattform orbiter 2010-12-07 15:46:04 +00:00
  • db99db4be9 some redesign of the search-fail-response mechanism: when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism. orbiter 2010-12-06 14:34:58 +00:00
  • 4915d1781a * use local backup-file, if remote network-definition is not availible * resolve single point of failure in networks, managed by central network-definitions f1ori 2010-12-06 11:09:16 +00:00
  • 18d33b5c6d fixed several search result navigation bugs fixed bad behaviours during search result collection orbiter 2010-12-05 23:54:00 +00:00
  • 49b5a206cd - better caclculation of search result size - predefined search recommendations orbiter 2010-12-02 12:19:59 +00:00
  • 4e2c14efbb fixed bugs in parser and ftp client orbiter 2010-12-02 11:05:04 +00:00
  • d78e322e84 added a directory-structure reader to ftp client orbiter 2010-12-02 08:08:01 +00:00
  • f0651e5f2f added image search to yacyinteractive.html this causes that the search result view switches from list format to image preview format when a search is restricted to png, gif or jpg documents orbiter 2010-12-01 18:48:21 +00:00
  • fffb91447a fixed crawl queue delete function orbiter 2010-12-01 14:55:40 +00:00
  • 4e771e2063 enhanced interactive search: - better table design - less enumeration of same table structure (prepared now for streaming) - added a 'remove filetype' link orbiter 2010-12-01 14:43:07 +00:00
  • b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only - enhanced the pdf and torrent parser: better documents titles - enhanced the ftp client: more time-out time - fixed bugs in json for search results - enhanced yacyinteractive.html: added a file type navigator and a download-script generator for search result files orbiter 2010-11-30 16:13:55 +00:00
  • 6692e862ae do not reset language on config change lotus 2010-11-29 20:09:53 +00:00
  • 21e84539e8 one more fix to Domains orbiter 2010-11-29 19:49:57 +00:00
  • e192d61972 fix for latest commit orbiter 2010-11-29 19:32:47 +00:00
  • 22453b13ad implemented local host address discovery as posted in http://forum.yacy-websuche.de/viewtopic.php?p=21310#p21310 orbiter 2010-11-29 19:18:44 +00:00
  • cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed - renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags: /date - for search results ordered by date (most recent up) /near - for search results where search words appear near to each other (closest up) /language/<lang> - for a sorting by language where the wanted language gets up. Example: /language/de orbiter 2010-11-29 18:08:20 +00:00
  • a9f754c45f removed unused CR accumulation and distribution process this was never used and extended in the last years. The resulting YBR ranking criteria is still a good idea and will be used in the future. Possible generation methods for YBR ranking are: - "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS) - "block-rank" calculated from the local link structure - a distributed "block-rank" using the xml API to the link structure from other peers orbiter 2010-11-29 11:07:42 +00:00
  • 3d945bb442 fix for ftp client: suppress bad directory listing time-out orbiter 2010-11-29 08:41:29 +00:00
  • d4a1a1850b removed warnings orbiter 2010-11-29 07:52:10 +00:00
  • 3b5830b7d4 *) Fixed typo. low012 2010-11-28 03:05:22 +00:00
  • 9b3fae9496 *) cleaning up the code a little bit *) program to interface, not implementation low012 2010-11-28 02:57:31 +00:00
  • 7bb4b001ed - view image files from cache - fixed generic header settings; affects CORS functionality orbiter 2010-11-27 09:16:16 +00:00
  • e7552bd719 *) cleaning up the code a little bit low012 2010-11-27 00:54:59 +00:00
  • a9741cc876 *) HTML fixes low012 2010-11-26 22:38:08 +00:00
  • 01ddb6d2ef *) HTML fixes low012 2010-11-26 22:19:31 +00:00
  • 321eb012fe removed two warnings and reverted one change orbiter 2010-11-26 11:15:42 +00:00
  • 737aaf6952 various small changes to ymarks apfelmaennchen 2010-11-25 21:16:47 +00:00
  • 8a50670546 some code clean up for the last post apfelmaennchen 2010-11-24 23:40:55 +00:00
  • 442497868d another step towards an auto tagging function for YMarks apfelmaennchen 2010-11-24 23:26:29 +00:00
  • dad5818b40 *) cleaning up the code a little bit low012 2010-11-24 01:31:41 +00:00
  • 9057e4d58c *) hopefully fixed bug described in http://www.yacy-forum.org/viewtopic.php?f=12&t=385 low012 2010-11-23 23:10:45 +00:00
  • 741a87a3e9 * make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers) f1ori 2010-11-22 19:12:51 +00:00
  • fd74bc388c * fix small bug in sessionid-removal * add testcase for seesionid-removal f1ori 2010-11-21 23:55:40 +00:00
  • dca9e16f51 * don't index pages, which redirect, twice * there fore auto-redirection of HTTPClient for crawling is disabled and the old code is reactivated f1ori 2010-11-21 22:46:12 +00:00
  • eb79b952ef *) cleaner code low012 2010-11-21 03:39:53 +00:00
  • 38fdf43587 *) renamed classes according to standard Java coding conventions *) String.isEmpty() was introduced in Java 1.6, but we still use Java 1.5 low012 2010-11-21 01:29:32 +00:00
  • 8281d12305 *) Ooops! low012 2010-11-21 00:41:45 +00:00
  • 025e3f4790 *) renamed classes according to standard Java coding conventions *) removed unsused code low012 2010-11-21 00:39:21 +00:00
  • 3b9aa0504e *) removed unsused code low012 2010-11-21 00:28:32 +00:00
  • db3db0fdb9 *) trying to make this class less confusing (probably failing) low012 2010-11-21 00:13:08 +00:00
  • 54e63b556e intermediate step for a YMark auto-tagging function based on word frequencies. apfelmaennchen 2010-11-17 15:17:29 +00:00
  • 403ee9c014 added a drill-down for metadata and word count to /api/ymarks/test_treeview.html apfelmaennchen 2010-11-16 00:48:38 +00:00
  • a025b1da89 * fix bug when browsing local filesystem (e. g. repository) with yacy f1ori 2010-11-15 14:47:16 +00:00
  • 28a290336d de.lng: Started translation of YMark feature here http://localhost:8080/Table_YMark_p.html Table_YMark_p.html: removed some labels because columns is not filled yet and added ids for other label references and added full enabled="enabled" tags because SHORTTAG is not specified in XHTML 1.0 Strict mikeworks 2010-11-15 01:48:54 +00:00
  • 25426c6548 change language immediately lotus 2010-11-13 14:27:30 +00:00
  • 11ae5b108e enabled rebuildIndex for /Table_YMark_p.html (rebuilds the tags and folders index) apfelmaennchen 2010-11-13 13:02:56 +00:00
  • f147a022f8 enabled YMark Import for /Table_YMark_p.html apfelmaennchen 2010-11-13 10:32:37 +00:00
  • 2726606fc8 slightly enhanced interface for /Table_YMark_p.html apfelmaennchen 2010-11-12 21:13:27 +00:00
  • b87bf88ac8 using less memory on merging and rewriting blobs sixcooler 2010-11-12 16:02:20 +00:00
  • 94a9be18a4 added a ymark table administration: /Table_YMark_p.html apfelmaennchen 2010-11-10 22:53:27 +00:00
  • 25339f93c7 more updates to ymarks - working xbel import/export - exported xbel includes yacy specific metadata but still validates against PUBLIC DTD apfelmaennchen 2010-11-09 17:01:31 +00:00
  • d62e449a11 * fix FilterEngine, forgot comparision-operator f1ori 2010-11-08 09:37:44 +00:00
  • cdd65aca71 update to ymarks - get_xbel.xml is almost working - startet ymark api documentation info.html apfelmaennchen 2010-11-07 20:03:01 +00:00
  • 808edffaf6 ymarks - some refactoring - working xbel and html import (/api/ymarks/test_import.html) - working treeview (/api/ymarks/test_treeview.html) apfelmaennchen 2010-11-06 20:26:13 +00:00
  • 2c539b514a * add domaincheck (local/global/domainlist) to urlcleaner f1ori 2010-11-06 16:50:33 +00:00
  • 442bebca2b * %0 does not belong to the IPv6-Address -> entry does not work on some systems f1ori 2010-11-06 15:09:28 +00:00
  • 9fc940aa35 release 0.99 0.99 orbiter 2010-11-05 13:20:51 +00:00
  • 117fc86b3d fix for http://forum.yacy-websuche.de/viewtopic.php?p=21199#p21199 orbiter 2010-11-05 13:19:37 +00:00
  • 441fbc26e2 security patch for WeakPriorityBlockingQueue (produced a deadlock) orbiter 2010-11-05 09:38:31 +00:00
  • 5dcb838293 - removed thread overhead when calling dns services - fixed localsearch (changed it by accident) orbiter 2010-11-05 00:29:32 +00:00
  • 4c50d3428e smaller file size for array stacks to support smaller deletion sizes orbiter 2010-11-04 13:29:19 +00:00
  • 09badc697b - low-memory patch for crawler orbiter 2010-11-04 13:26:27 +00:00
  • 6ac4f8142e * allow proxy requests from localhost via ipv6 (%0 does not belong to the address) f1ori 2010-11-04 10:52:54 +00:00
  • 274d5b3a87 de.lng: Added missed translation string in SVn 7301 ConfigHTCache_p.html: Added missing id for label pointing to actualCacheSize in span tag for XHTML 1.0 Strictness mikeworks 2010-11-04 07:30:00 +00:00
  • 9239ac1e56 de.lng: Added translation for new page http://localhost:8080/ConfigHTCache_p.html and old one http://localhost:8080/IndexControlRWIs_p.html ConfigHTCache_p.html: Removed additional </form> and changed title text mikeworks 2010-11-04 00:53:25 +00:00
  • becc463d8a enhanced did-you-mean orbiter 2010-11-04 00:25:19 +00:00
  • 43586a2ace a update to ymarks (please test if you wish): - import HTML (e.g. FF export) via /api/ymarks/import.html - view your import via /api/ymarks/test.html - get a xml list via /api/ymarks/get_ymark_list.xml?tags=&folders= - delete bookmark tables via standard interface /Tables_p.html it is still very experimental!! apfelmaennchen 2010-11-03 22:52:03 +00:00
  • 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113 fixed a concurrent modification exception during search and a time-out problem orbiter 2010-11-03 20:58:50 +00:00
  • 04932dc268 added rdf data structure for rss feeds orbiter 2010-11-03 20:06:23 +00:00
  • 84f2953cd8 fix for rss loader / rss type recognition orbiter 2010-11-03 19:58:01 +00:00