Commit Graph

  • a31b9097a4 preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls orbiter 2007-10-29 01:43:20 +00:00
  • a718858e8b seed.CCOUNT is interpreted as a double value not int fuchsi 2007-10-24 23:25:48 +00:00
  • d85821a88c fix for SVN 4178 orbiter 2007-10-24 22:00:34 +00:00
  • 0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values. fuchsi 2007-10-24 21:38:19 +00:00
  • f8318436a1 fix for last commit orbiter 2007-10-22 16:32:39 +00:00
  • 7d57b80598 distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158 orbiter 2007-10-22 15:26:47 +00:00
  • 9a7b093eed tried to avoid endless loop, see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=467&hilit= orbiter 2007-10-22 14:35:45 +00:00
  • 9d539ec621 added option to display the network name as page greeting instead the page greeting string orbiter 2007-10-22 08:01:44 +00:00
  • b856e377a9 some additions and a small bugfix to SVN 4158 orbiter 2007-10-21 23:26:22 +00:00
  • 501a7aae90 Small correction hermens 2007-10-20 12:02:31 +00:00
  • caff520988 Removed unnecessary and unused code. hermens 2007-10-20 11:56:15 +00:00
  • d732840f8a Avoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing. hermens 2007-10-19 23:36:40 +00:00
  • 35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources. fuchsi 2007-10-19 21:47:07 +00:00
  • a9aef8e5e0 remove duplicate entries fuchsi 2007-10-19 16:24:37 +00:00
  • 38bbd4a4b3 no code changes. just touched yacyClient.java to trigger a rebuild of the file in an uncleaned tree. NOTE: run "ant clean" before building SVN 4166/4167 in a tree that includes class files from a previous build to make sure, that every class file is rebuilt! fuchsi 2007-10-19 15:31:38 +00:00
  • f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers. - some minor code cleanups (mostly unnecessary casts, null checks) fuchsi 2007-10-19 04:13:46 +00:00
  • ca83f5a8d9 Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3). Add the packages to the eclipse .classpath. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=453 fuchsi 2007-10-18 19:53:52 +00:00
  • 3352474dd8 Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers. fuchsi 2007-10-16 13:29:11 +00:00
  • 06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code). NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update. fuchsi 2007-10-16 02:12:31 +00:00
  • e77aec8c9d fix handling of encrypted PDF-Documents (with default user password "") - update PDFBox package to current version 0.7.3 - use new security model in PDFBox to "guess" wether we can decrypt a document or not NOTE: When upgrading to this version make sure the old PDFBox-0.7.2.jar is removed from libx/ fuchsi 2007-10-15 13:18:38 +00:00
  • b54fcd732b *) fixed exceptions that occured when non-integer values were entered where integers were expected low012 2007-10-12 19:09:20 +00:00
  • 52c68875bd *) removed (hopefully only) surplus double encodings (http://forum.yacy-websuche.de/viewtopic.php?t=368) low012 2007-10-12 15:27:23 +00:00
  • b5f7df8d0a Speed up remove operations in rowCollections. - Array element shifting during remove is only done when it is necessary to keep the order of a row collection. - This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation) fuchsi 2007-10-11 17:17:08 +00:00
  • 5c91359297 accidently commited personal testing values as defaults in last commit. fuchsi 2007-10-10 15:20:22 +00:00
  • e255888095 Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now. fuchsi 2007-10-10 15:19:07 +00:00
  • ce0bb1dc8a Increase defaults for the DHT Recieve Limits to prevent "busy" states. see fuchsi 2007-10-10 10:07:16 +00:00
  • fdb0b861f8 *) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer *) added network QPH *) banner is cached for 1 second to avoid DOS *) still no logo low012 2007-10-09 21:47:37 +00:00
  • 3b8540198b finally fix the init script. tested this time. fuchsi 2007-10-09 14:53:16 +00:00
  • 508de558f7 sbStackCrawlThread is null during first cleanProfiles() run at startup. fuchsi 2007-10-08 15:56:40 +00:00
  • 70614385ef Attempt to fix the "lost profile handle" bug. It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner. fuchsi 2007-10-08 15:11:26 +00:00
  • 905e7e60f5 change dir to the yacy base directory before starting it making sure relative path specifications work properly. see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=405 fuchsi 2007-10-08 13:40:49 +00:00
  • 507ecd8afa *) added banner that can be displayed like this: http://localhost:8080/Banner.png possible arguments: textcolor, bgcolor, bordercolor example: http://localhost:8000/Banner.png?textcolor=ffffff&bgcolor=121212&bordercolor=ffffff take care: YaCy uses CMY color model! *) there are still some known bugs, but I can't continue coding right now low012 2007-10-07 21:59:36 +00:00
  • 70884da0eb Add external package JARs to classpath. (copied from startYACY.sh) fuchsi 2007-10-07 19:28:21 +00:00
  • ebfd1e0b42 remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required. fuchsi 2007-10-05 18:52:15 +00:00
  • 4ce25b3661 - documentation update - start of new development cycle. in case your don't know: commits until 0.553 will not automatically be used be the auto-update funktion orbiter 2007-10-04 20:43:52 +00:00
  • 1a0f89d7e8 release 0.55 orbiter 2007-10-04 20:12:08 +00:00
  • ed20531e68 don't encode in channel element as well fuchsi 2007-10-04 12:12:27 +00:00
  • 9b0948cb4c gnarf. mixed up the positions. finally fixed... fuchsi 2007-10-04 10:58:01 +00:00
  • c0f5fc51ef bugfix for last commit fuchsi 2007-10-04 10:47:48 +00:00
  • 33fb2f756d added emergency fail case in remote crawls in extreme situations this will cause that no remote crawls are send out any more this is bad, but it protects the case where failing remote crawls fill up the local queue too much, which is even worse orbiter 2007-10-04 10:40:30 +00:00
  • c5a8585ac6 fix more encooding problems in yacysearch.rss. - URL encoding for search terms where required - removed "ugly" CDATA escaping - UTF-8 encoding for the XML - no HTML style escaping for XML/RSS element values Note: some unicode characters might still be encooded in a wrong way. fuchsi 2007-10-04 09:21:03 +00:00
  • 6b00fe0c4e fix ArrayIndexOutOfBoundsException fuchsi 2007-10-04 08:50:33 +00:00
  • e2f3268c13 *) removed double encoding (http://forum.yacy-websuche.de/viewtopic.php?t=368) low012 2007-10-03 20:13:32 +00:00
  • 3e60ae93b9 modified remote search snippet fetch behavior: do not fetch snippets for more than 300 milliseconds, even if the snippets can be found locally without online fetch orbiter 2007-10-03 16:42:11 +00:00
  • 97f1ca52bd fox for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=390 orbiter 2007-10-03 15:45:12 +00:00
  • 143fa40d77 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=394&p=2382#p2382 orbiter 2007-10-03 15:34:16 +00:00
  • 711641f167 extended client connection clean-up: there are now two time-outs, one for the complete connection time, and one for an idle time connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed orbiter 2007-10-03 15:06:12 +00:00
  • b19bb6e5b1 - reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours - removed and added some debugging lines orbiter 2007-10-03 14:34:05 +00:00
  • 1eba408d2f Make sure that sockets which couldn't be opened aren't handled as active connections, in which case they wouldn't be closed. fuchsi 2007-10-03 12:18:26 +00:00
  • 03c5b4ad68 more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org - RFC-822 date time had to include the time instead of date only - <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1 - <link> elements are mandatory for <channel> and <item> fuchsi 2007-10-03 04:00:52 +00:00
  • e3c6236eef fixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case. see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt fuchsi 2007-10-03 00:46:30 +00:00
  • d69d386f7d added additional forced client connection closing if a specific number of simultanous connections is reached the limit is currently set to 64 connections orbiter 2007-10-03 00:21:53 +00:00
  • dea7bee049 - increased minimum time before an active connection is interrupted from 1 minute to 10 minutes - added sorting by connection time in client connection tabe of connectionTimeComparatorInstance orbiter 2007-10-02 23:56:04 +00:00
  • f8e69ce4dc removed progress bar in Network list orbiter 2007-10-02 22:50:47 +00:00
  • c1440d2241 fixed problem with redirection: redirected URLs had not been tested with the double-check see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=348 orbiter 2007-10-02 22:40:53 +00:00
  • b183bf6f42 - fixed opensearch bugs - added 'full domain' button to expert crawl start - removed not-workin 'only one domain' button, the regex allowed crawling of other domains orbiter 2007-10-02 21:43:05 +00:00
  • 7404f2c35c Fix some of the issues with the RSS search interface, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=392 fuchsi 2007-10-02 21:28:29 +00:00
  • 98abe0804d another enhancement to crawl starts with link files orbiter 2007-10-02 20:30:42 +00:00
  • ed2ca8fc4c Add search type to top word suggestion searches. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=391 fuchsi 2007-10-02 19:49:50 +00:00
  • aef1ab9526 #updated German translation daburna 2007-10-02 18:42:08 +00:00
  • 1b42152a76 fixed and enhanced some details in crawl start with file orbiter 2007-10-02 00:49:38 +00:00
  • 16e101f135 - fix for bad xml tag in Network.xml - switched on automatic deletion of passive peers in pro versions orbiter 2007-10-01 22:45:44 +00:00
  • 4465db7399 removed debug information from network grafic orbiter 2007-10-01 12:32:10 +00:00
  • 01e0669264 re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary orbiter 2007-10-01 12:30:23 +00:00
  • d547c3b4bd Avoid NullPointerException in yacySeedDB.lookupByIP hermens 2007-09-29 11:18:09 +00:00
  • 5b1a937ed8 fix for crawl stack database format change, introduced in SVN 4113 orbiter 2007-09-28 08:17:08 +00:00
  • af25c98306 enhanced local search performance in case of a remote search: there is no waiting until the local search terminates to show the result page. the local search appear like all other results from remote peers using a separated thread. This has especially a stron effect, if the local index for a specific word is large. orbiter 2007-09-28 01:36:22 +00:00
  • 842308ea97 - redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown orbiter 2007-09-28 01:21:31 +00:00
  • 341f7cb327 steps to enhance remote search performance: - added a file size limitation, that disallows parsing of large documents during (offline-) remote search - added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation orbiter 2007-09-26 10:11:50 +00:00
  • 2f1ff048ba some fixes to socket connection time-out orbiter 2007-09-25 23:45:05 +00:00
  • 3c74014004 automatic deletion of dead client connections orbiter 2007-09-25 22:46:11 +00:00
  • 49f1c58d64 restoring alternative update location orbiter 2007-09-25 21:43:16 +00:00
  • 11b4f80bde - fixed non-closing client connections - added client connection tracker in connections servelet orbiter 2007-09-25 21:36:08 +00:00
  • d352853f2d fix for non-closing client sessions orbiter 2007-09-24 08:42:07 +00:00
  • 1488769e1f cleanup of unmaintained and outdated performance methods: removed object pools in httpc. Object pooling is not recommended, if the creation of the object is not time-intensive. Object pools are only useful, if there is much computation necessary to create some basic data that is stored in the object pool and can be re-used. This does not apply to object pools in YaCy. Object pooling of client sessions would make sense if they would allow re-use of living connections to other yacy clients. But every connection is closed after usage of an object in the client pool, therefore the YaCy server client objects are not such that hold hardware/network-allocated entities. See: http://www.javaperformancetuning.com/news/qotm033.shtml http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling http://docs.sun.com/source/816-7159-10/pt_chap5.html http://www.microjava.com/articles/techtalk/recylcle2 orbiter 2007-09-23 20:49:52 +00:00
  • 3cb9cdc9be try to fix connection problem, possible cause for wrong junior status and non-passive passive peers: the YaCy client treats disconnections during data transmissions as error and discards all data transmitted so far this did not happen so far until I removed a delay time at the end of the daemon session which prevented this case. To fix this problem, disconnections during transmissions are not treated as error now, which means that end-of-transmissions with sudden disconnections are not a cause for peer diconnections any more. To be nice to non-updated peers, the sleep time at the end of server sessions is also re-enabled. orbiter 2007-09-23 17:31:29 +00:00
  • 00dab81077 simpler solution to last commit + works with and without navigation collumn on the left fuchsi 2007-09-20 01:52:10 +00:00
  • eb16a99e94 avoid floating of long page titles around the favicon in search results fuchsi 2007-09-19 22:08:56 +00:00
  • 9524b9c16a second try of rev 4100 :). Tested in Iceweasel/Firefox 2.0.6, Konqueror 3.5.7, Opera 9.23 (all linux) and IE6-SP1 (wine) fuchsi 2007-09-17 19:39:15 +00:00
  • 6b8faaadb6 undo last commit for further evaluation, a progressbar element is used on other pages as well... fuchsi 2007-09-17 03:36:35 +00:00
  • 1880bba420 A few changes to the progress bar and search result statistics layout influenced by the discussion in <http://forum.yacy-websuche.de/viewtopic.php?f=5&t=268> with the idea of saving vertical space. Please check in every available browser and comment wether it's better than before. ;) fuchsi 2007-09-16 14:30:53 +00:00
  • 404ebf1474 # update of de.lng - NO unused strings anymore!!! daburna 2007-09-16 10:17:22 +00:00
  • 041922652a # update of de.lng - removed or updated unsused strings - updated some files daburna 2007-09-15 13:10:56 +00:00
  • ba59de773f again and again junior - test borg-0300 2007-09-13 17:05:53 +00:00
  • 9fa75ef4d1 Limit the percentage of the progress indicator to reasonable values hermens 2007-09-13 16:37:23 +00:00
  • 4275727d69 fix for peer ping problem (implemented a 3-time re-ping); cause for 'Connection reset' still unknown orbiter 2007-09-12 00:42:53 +00:00
  • e78098be9b According to HTML-Specs "name" and "id" attributes share the same namespace. So we can't have one element with name="offset" and another one with id="offset". Additionally IE6's getElementById() returns elements with matching names as well and Opera is mimicing this behaviour. fuchsi 2007-09-11 16:21:14 +00:00
  • 07d1e98909 fixed round-robin method of peer-ping order (the successfully pinged peer was not updated to current last-seed date) orbiter 2007-09-11 16:07:35 +00:00
  • a1dcd065ad some tweaks to the search results layout fuchsi 2007-09-11 15:56:14 +00:00
  • 76e4c2d69e fix for peer-ping in case that remote peer does not respond with valid values orbiter 2007-09-11 15:27:01 +00:00
  • e192f99134 fix small bug introduced in r4089 that appeared when we tried to remove "gzip" encoding from Accept-Encodings header closes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=336 fuchsi 2007-09-10 21:46:40 +00:00
  • ae4b9308ef Fix problems with some web servers which couldn't handle the way yacy was sending requests. Thx to celle for the patch. http://forum.yacy-websuche.de/viewtopic.php?f=5&t=320 fuchsi 2007-09-10 09:15:28 +00:00
  • 6601e37512 clear caches after changing blacklists, closes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=241&p=1964#p1964 fuchsi 2007-09-10 08:15:25 +00:00
  • 5b0c1449e1 various fixes and cleanups for blacklist handling: 1. avoid adding duplicate file name entries in config properties for lists, 2. correctly merge all path masks from all list files for the same host masks, 3. rewrite helper methods standard java methods for Collection transformations, 4. merged various methods with identical functionality for different Collection implementations into one, 5. minor refactoring to improve code readability. fuchsi 2007-09-10 06:20:27 +00:00
  • e27aeb7fdc patch for bad crawl filter at crawl start orbiter 2007-09-09 19:21:41 +00:00
  • 841cf71022 fix for NPE in DHT transfer selection, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=327 orbiter 2007-09-09 19:08:13 +00:00
  • 3047ae2cd9 fixed some more old links to new hompage location orbiter 2007-09-09 18:43:39 +00:00
  • dbd1eeead5 fix for missing object miss-cache flush value: the value is alway zero because there is no miss-cache flush see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=288 orbiter 2007-09-09 18:35:05 +00:00
  • f2a3434407 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=238&p=1341#p1341 orbiter 2007-09-09 17:31:29 +00:00
  • 229ca2ba48 fixed/re-implemented rss-version of search result page orbiter 2007-09-09 12:30:18 +00:00