Commit Graph

  • 875096552f fix for NPE in case that remote search results are empty orbiter 2007-11-12 15:54:50 +00:00
  • 64b3b79e44 - fix for termination problem with uniq() - addition to seed dna interpretation orbiter 2007-11-12 14:39:30 +00:00
  • 0abf33ed03 - tried to remove deadlock - enhanced searchtime in kelondroRowSets - enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles orbiter 2007-11-12 01:14:51 +00:00
  • a4010f7dc8 *) fixed bug where dots were added after numbers < 1000: "123" was transformed to "123." which is undesirable low012 2007-11-11 21:42:50 +00:00
  • da73cde86e #German language file - some cosmetics daburna 2007-11-11 20:32:51 +00:00
  • 2421127612 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=513&hilit= orbiter 2007-11-11 01:32:54 +00:00
  • 2e91b724ad fix for yacysearch/rss-feed bug orbiter 2007-11-11 00:42:31 +00:00
  • d0d2771883 disabled multiprocessoring of rowCollection.sort for testing purpose orbiter 2007-11-11 00:28:22 +00:00
  • edc4da5317 fix for division by zero in test reoutine orbiter 2007-11-10 08:57:00 +00:00
  • df38aaf7bd update to RowCollection sort speed-enhancements: - better handling of small collections (less overhead) - usage of pre-sorted limits - different re-sort limit - more testing procedures orbiter 2007-11-09 15:34:11 +00:00
  • 0eb60cfe6f better handling of seed properties orbiter 2007-11-09 09:40:42 +00:00
  • ecba35de72 enhanced computing speed of kelondro core function: sorting the enhancement was made by using better organized data structures and multi-threading during the sort. A sort can be divided into two separate processes when the first partition of the quicksort algorithm was done. Generating a separate thread and starting the thread takes only 10 milliseconds, so using a separate thread makes only sense if the data amount is large. statistics about the speed-up: without ehancement: 250 milliseconds for 100000 entries with data structure enhancement: 170 milliseconds for 100000 entries with additional second thread (if second processor is present): 130 milliseconds. orbiter 2007-11-09 00:51:38 +00:00
  • 6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before. orbiter 2007-11-07 22:38:09 +00:00
  • 425e4ead66 Allow absolute paths in configuration settings. - before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging). - abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path. fuchsi 2007-11-04 10:36:25 +00:00
  • e8d32d9f62 other loglevel borg-0300 2007-11-02 16:06:54 +00:00
  • a5d28785b1 less OOM (works for me) borg-0300 2007-11-02 14:55:46 +00:00
  • 794d296129 project link update orbiter 2007-11-01 20:40:15 +00:00
  • ccbfb15b6b enhancement to crawl stacker enqueue order orbiter 2007-11-01 00:57:32 +00:00
  • 93905e5c7b fix for show-more bug orbiter 2007-11-01 00:55:39 +00:00
  • 5c5344ae97 Beautify log hermens 2007-10-31 16:29:07 +00:00
  • 35cf196204 transferRanking(): Do not flush more ranking files than requested by caller. hermens 2007-10-31 15:55:52 +00:00
  • d0aa8cf25d Only update handshaked peer's last seed date if it has not been updated recently. Unil now the newer data was overwritten by old data from before the handshake. hermens 2007-10-31 15:47:48 +00:00
  • 8f9d65da67 Small corrections to dhtFlushControl() - Test wCacheMaxChunk against maxURLinCache(), not getMaxWordCount(). This triggered a flush everytime dhtFlushControl() was called. - If triggered, flush at least 1 entry. hermens 2007-10-31 14:21:58 +00:00
  • 55c87b3b12 changed behavior of crawl stacker - final flush only when tabletype = RAM - prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100 - number of maximun entries in stacker is configurable in yacy.init (stacker.slots) orbiter 2007-10-31 11:32:40 +00:00
  • 18144043e6 Correct UTC Offset at beginning/end of daylight savings time hermens 2007-10-30 19:20:02 +00:00
  • 4fefa53135 removed parser object pool, see also svn 4106 orbiter 2007-10-29 12:14:18 +00:00
  • 35b1bd66cd The (old) yacy web page does not need to part of the yacy distribution. The old yacy home page will be replaced by a new one. orbiter 2007-10-29 11:27:50 +00:00
  • 87b297b4d2 update of link to english forum orbiter 2007-10-29 10:50:27 +00:00
  • a31b9097a4 preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls orbiter 2007-10-29 01:43:20 +00:00
  • a718858e8b seed.CCOUNT is interpreted as a double value not int fuchsi 2007-10-24 23:25:48 +00:00
  • d85821a88c fix for SVN 4178 orbiter 2007-10-24 22:00:34 +00:00
  • 0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values. fuchsi 2007-10-24 21:38:19 +00:00
  • f8318436a1 fix for last commit orbiter 2007-10-22 16:32:39 +00:00
  • 7d57b80598 distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158 orbiter 2007-10-22 15:26:47 +00:00
  • 9a7b093eed tried to avoid endless loop, see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=467&hilit= orbiter 2007-10-22 14:35:45 +00:00
  • 9d539ec621 added option to display the network name as page greeting instead the page greeting string orbiter 2007-10-22 08:01:44 +00:00
  • b856e377a9 some additions and a small bugfix to SVN 4158 orbiter 2007-10-21 23:26:22 +00:00
  • 501a7aae90 Small correction hermens 2007-10-20 12:02:31 +00:00
  • caff520988 Removed unnecessary and unused code. hermens 2007-10-20 11:56:15 +00:00
  • d732840f8a Avoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing. hermens 2007-10-19 23:36:40 +00:00
  • 35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources. fuchsi 2007-10-19 21:47:07 +00:00
  • a9aef8e5e0 remove duplicate entries fuchsi 2007-10-19 16:24:37 +00:00
  • 38bbd4a4b3 no code changes. just touched yacyClient.java to trigger a rebuild of the file in an uncleaned tree. NOTE: run "ant clean" before building SVN 4166/4167 in a tree that includes class files from a previous build to make sure, that every class file is rebuilt! fuchsi 2007-10-19 15:31:38 +00:00
  • f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers. - some minor code cleanups (mostly unnecessary casts, null checks) fuchsi 2007-10-19 04:13:46 +00:00
  • ca83f5a8d9 Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3). Add the packages to the eclipse .classpath. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=453 fuchsi 2007-10-18 19:53:52 +00:00
  • 3352474dd8 Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers. fuchsi 2007-10-16 13:29:11 +00:00
  • 06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code). NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update. fuchsi 2007-10-16 02:12:31 +00:00
  • e77aec8c9d fix handling of encrypted PDF-Documents (with default user password "") - update PDFBox package to current version 0.7.3 - use new security model in PDFBox to "guess" wether we can decrypt a document or not NOTE: When upgrading to this version make sure the old PDFBox-0.7.2.jar is removed from libx/ fuchsi 2007-10-15 13:18:38 +00:00
  • b54fcd732b *) fixed exceptions that occured when non-integer values were entered where integers were expected low012 2007-10-12 19:09:20 +00:00
  • 52c68875bd *) removed (hopefully only) surplus double encodings (http://forum.yacy-websuche.de/viewtopic.php?t=368) low012 2007-10-12 15:27:23 +00:00
  • b5f7df8d0a Speed up remove operations in rowCollections. - Array element shifting during remove is only done when it is necessary to keep the order of a row collection. - This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation) fuchsi 2007-10-11 17:17:08 +00:00
  • 5c91359297 accidently commited personal testing values as defaults in last commit. fuchsi 2007-10-10 15:20:22 +00:00
  • e255888095 Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now. fuchsi 2007-10-10 15:19:07 +00:00
  • ce0bb1dc8a Increase defaults for the DHT Recieve Limits to prevent "busy" states. see fuchsi 2007-10-10 10:07:16 +00:00
  • fdb0b861f8 *) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer *) added network QPH *) banner is cached for 1 second to avoid DOS *) still no logo low012 2007-10-09 21:47:37 +00:00
  • 3b8540198b finally fix the init script. tested this time. fuchsi 2007-10-09 14:53:16 +00:00
  • 508de558f7 sbStackCrawlThread is null during first cleanProfiles() run at startup. fuchsi 2007-10-08 15:56:40 +00:00
  • 70614385ef Attempt to fix the "lost profile handle" bug. It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner. fuchsi 2007-10-08 15:11:26 +00:00
  • 905e7e60f5 change dir to the yacy base directory before starting it making sure relative path specifications work properly. see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=405 fuchsi 2007-10-08 13:40:49 +00:00
  • 507ecd8afa *) added banner that can be displayed like this: http://localhost:8080/Banner.png possible arguments: textcolor, bgcolor, bordercolor example: http://localhost:8000/Banner.png?textcolor=ffffff&bgcolor=121212&bordercolor=ffffff take care: YaCy uses CMY color model! *) there are still some known bugs, but I can't continue coding right now low012 2007-10-07 21:59:36 +00:00
  • 70884da0eb Add external package JARs to classpath. (copied from startYACY.sh) fuchsi 2007-10-07 19:28:21 +00:00
  • ebfd1e0b42 remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required. fuchsi 2007-10-05 18:52:15 +00:00
  • 4ce25b3661 - documentation update - start of new development cycle. in case your don't know: commits until 0.553 will not automatically be used be the auto-update funktion orbiter 2007-10-04 20:43:52 +00:00
  • 1a0f89d7e8 release 0.55 orbiter 2007-10-04 20:12:08 +00:00
  • ed20531e68 don't encode in channel element as well fuchsi 2007-10-04 12:12:27 +00:00
  • 9b0948cb4c gnarf. mixed up the positions. finally fixed... fuchsi 2007-10-04 10:58:01 +00:00
  • c0f5fc51ef bugfix for last commit fuchsi 2007-10-04 10:47:48 +00:00
  • 33fb2f756d added emergency fail case in remote crawls in extreme situations this will cause that no remote crawls are send out any more this is bad, but it protects the case where failing remote crawls fill up the local queue too much, which is even worse orbiter 2007-10-04 10:40:30 +00:00
  • c5a8585ac6 fix more encooding problems in yacysearch.rss. - URL encoding for search terms where required - removed "ugly" CDATA escaping - UTF-8 encoding for the XML - no HTML style escaping for XML/RSS element values Note: some unicode characters might still be encooded in a wrong way. fuchsi 2007-10-04 09:21:03 +00:00
  • 6b00fe0c4e fix ArrayIndexOutOfBoundsException fuchsi 2007-10-04 08:50:33 +00:00
  • e2f3268c13 *) removed double encoding (http://forum.yacy-websuche.de/viewtopic.php?t=368) low012 2007-10-03 20:13:32 +00:00
  • 3e60ae93b9 modified remote search snippet fetch behavior: do not fetch snippets for more than 300 milliseconds, even if the snippets can be found locally without online fetch orbiter 2007-10-03 16:42:11 +00:00
  • 97f1ca52bd fox for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=390 orbiter 2007-10-03 15:45:12 +00:00
  • 143fa40d77 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=394&p=2382#p2382 orbiter 2007-10-03 15:34:16 +00:00
  • 711641f167 extended client connection clean-up: there are now two time-outs, one for the complete connection time, and one for an idle time connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed orbiter 2007-10-03 15:06:12 +00:00
  • b19bb6e5b1 - reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours - removed and added some debugging lines orbiter 2007-10-03 14:34:05 +00:00
  • 1eba408d2f Make sure that sockets which couldn't be opened aren't handled as active connections, in which case they wouldn't be closed. fuchsi 2007-10-03 12:18:26 +00:00
  • 03c5b4ad68 more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org - RFC-822 date time had to include the time instead of date only - <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1 - <link> elements are mandatory for <channel> and <item> fuchsi 2007-10-03 04:00:52 +00:00
  • e3c6236eef fixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case. see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt fuchsi 2007-10-03 00:46:30 +00:00
  • d69d386f7d added additional forced client connection closing if a specific number of simultanous connections is reached the limit is currently set to 64 connections orbiter 2007-10-03 00:21:53 +00:00
  • dea7bee049 - increased minimum time before an active connection is interrupted from 1 minute to 10 minutes - added sorting by connection time in client connection tabe of connectionTimeComparatorInstance orbiter 2007-10-02 23:56:04 +00:00
  • f8e69ce4dc removed progress bar in Network list orbiter 2007-10-02 22:50:47 +00:00
  • c1440d2241 fixed problem with redirection: redirected URLs had not been tested with the double-check see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=348 orbiter 2007-10-02 22:40:53 +00:00
  • b183bf6f42 - fixed opensearch bugs - added 'full domain' button to expert crawl start - removed not-workin 'only one domain' button, the regex allowed crawling of other domains orbiter 2007-10-02 21:43:05 +00:00
  • 7404f2c35c Fix some of the issues with the RSS search interface, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=392 fuchsi 2007-10-02 21:28:29 +00:00
  • 98abe0804d another enhancement to crawl starts with link files orbiter 2007-10-02 20:30:42 +00:00
  • ed2ca8fc4c Add search type to top word suggestion searches. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=391 fuchsi 2007-10-02 19:49:50 +00:00
  • aef1ab9526 #updated German translation daburna 2007-10-02 18:42:08 +00:00
  • 1b42152a76 fixed and enhanced some details in crawl start with file orbiter 2007-10-02 00:49:38 +00:00
  • 16e101f135 - fix for bad xml tag in Network.xml - switched on automatic deletion of passive peers in pro versions orbiter 2007-10-01 22:45:44 +00:00
  • 4465db7399 removed debug information from network grafic orbiter 2007-10-01 12:32:10 +00:00
  • 01e0669264 re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary orbiter 2007-10-01 12:30:23 +00:00
  • d547c3b4bd Avoid NullPointerException in yacySeedDB.lookupByIP hermens 2007-09-29 11:18:09 +00:00
  • 5b1a937ed8 fix for crawl stack database format change, introduced in SVN 4113 orbiter 2007-09-28 08:17:08 +00:00
  • af25c98306 enhanced local search performance in case of a remote search: there is no waiting until the local search terminates to show the result page. the local search appear like all other results from remote peers using a separated thread. This has especially a stron effect, if the local index for a specific word is large. orbiter 2007-09-28 01:36:22 +00:00
  • 842308ea97 - redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown orbiter 2007-09-28 01:21:31 +00:00
  • 341f7cb327 steps to enhance remote search performance: - added a file size limitation, that disallows parsing of large documents during (offline-) remote search - added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation orbiter 2007-09-26 10:11:50 +00:00
  • 2f1ff048ba some fixes to socket connection time-out orbiter 2007-09-25 23:45:05 +00:00
  • 3c74014004 automatic deletion of dead client connections orbiter 2007-09-25 22:46:11 +00:00
  • 49f1c58d64 restoring alternative update location orbiter 2007-09-25 21:43:16 +00:00