875096552ffix for NPE in case that remote search results are empty
orbiter
2007-11-12 15:54:50 +00:00
64b3b79e44- fix for termination problem with uniq() - addition to seed dna interpretation
orbiter
2007-11-12 14:39:30 +00:00
0abf33ed03- tried to remove deadlock - enhanced searchtime in kelondroRowSets - enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles
orbiter
2007-11-12 01:14:51 +00:00
a4010f7dc8*) fixed bug where dots were added after numbers < 1000: "123" was transformed to "123." which is undesirable
low012
2007-11-11 21:42:50 +00:00
da73cde86e#German language file - some cosmetics
daburna
2007-11-11 20:32:51 +00:00
2e91b724adfix for yacysearch/rss-feed bug
orbiter
2007-11-11 00:42:31 +00:00
d0d2771883disabled multiprocessoring of rowCollection.sort for testing purpose
orbiter
2007-11-11 00:28:22 +00:00
edc4da5317fix for division by zero in test reoutine
orbiter
2007-11-10 08:57:00 +00:00
df38aaf7bdupdate to RowCollection sort speed-enhancements: - better handling of small collections (less overhead) - usage of pre-sorted limits - different re-sort limit - more testing procedures
orbiter
2007-11-09 15:34:11 +00:00
0eb60cfe6fbetter handling of seed properties
orbiter
2007-11-09 09:40:42 +00:00
ecba35de72enhanced computing speed of kelondro core function: sorting the enhancement was made by using better organized data structures and multi-threading during the sort. A sort can be divided into two separate processes when the first partition of the quicksort algorithm was done. Generating a separate thread and starting the thread takes only 10 milliseconds, so using a separate thread makes only sense if the data amount is large. statistics about the speed-up: without ehancement: 250 milliseconds for 100000 entries with data structure enhancement: 170 milliseconds for 100000 entries with additional second thread (if second processor is present): 130 milliseconds.
orbiter
2007-11-09 00:51:38 +00:00
6eaa5a0e64enhanced local search speed. The ranking process is now 6 times faster that before.
orbiter
2007-11-07 22:38:09 +00:00
425e4ead66Allow absolute paths in configuration settings. - before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging). - abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.
fuchsi
2007-11-04 10:36:25 +00:00
35cf196204transferRanking(): Do not flush more ranking files than requested by caller.
hermens
2007-10-31 15:55:52 +00:00
d0aa8cf25dOnly update handshaked peer's last seed date if it has not been updated recently. Unil now the newer data was overwritten by old data from before the handshake.
hermens
2007-10-31 15:47:48 +00:00
8f9d65da67Small corrections to dhtFlushControl() - Test wCacheMaxChunk against maxURLinCache(), not getMaxWordCount(). This triggered a flush everytime dhtFlushControl() was called. - If triggered, flush at least 1 entry.
hermens
2007-10-31 14:21:58 +00:00
55c87b3b12changed behavior of crawl stacker - final flush only when tabletype = RAM - prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100 - number of maximun entries in stacker is configurable in yacy.init (stacker.slots)
orbiter
2007-10-31 11:32:40 +00:00
18144043e6Correct UTC Offset at beginning/end of daylight savings time
hermens
2007-10-30 19:20:02 +00:00
4fefa53135removed parser object pool, see also svn 4106
orbiter
2007-10-29 12:14:18 +00:00
35b1bd66cdThe (old) yacy web page does not need to part of the yacy distribution. The old yacy home page will be replaced by a new one.
orbiter
2007-10-29 11:27:50 +00:00
87b297b4d2update of link to english forum
orbiter
2007-10-29 10:50:27 +00:00
a31b9097a4preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls
orbiter
2007-10-29 01:43:20 +00:00
a718858e8bseed.CCOUNT is interpreted as a double value not int
fuchsi
2007-10-24 23:25:48 +00:00
d85821a88cfix for SVN 4178
orbiter
2007-10-24 22:00:34 +00:00
0e1738899f* Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '<') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.
fuchsi
2007-10-24 21:38:19 +00:00
f8318436a1fix for last commit
orbiter
2007-10-22 16:32:39 +00:00
7d57b80598distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158
orbiter
2007-10-22 15:26:47 +00:00
caff520988Removed unnecessary and unused code.
hermens
2007-10-20 11:56:15 +00:00
d732840f8aAvoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing.
hermens
2007-10-19 23:36:40 +00:00
35303f9504add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
fuchsi
2007-10-19 21:47:07 +00:00
38bbd4a4b3no code changes. just touched yacyClient.java to trigger a rebuild of the file in an uncleaned tree. NOTE: run "ant clean" before building SVN 4166/4167 in a tree that includes class files from a previous build to make sure, that every class file is rebuilt!
fuchsi
2007-10-19 15:31:38 +00:00
f717beecb1- Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers. - some minor code cleanups (mostly unnecessary casts, null checks)
fuchsi
2007-10-19 04:13:46 +00:00
ca83f5a8d9Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3). Add the packages to the eclipse .classpath. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=453
fuchsi
2007-10-18 19:53:52 +00:00
3352474dd8Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers.
fuchsi
2007-10-16 13:29:11 +00:00
06e6a1ff62Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code). NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.
fuchsi
2007-10-16 02:12:31 +00:00
e77aec8c9dfix handling of encrypted PDF-Documents (with default user password "") - update PDFBox package to current version 0.7.3 - use new security model in PDFBox to "guess" wether we can decrypt a document or not NOTE: When upgrading to this version make sure the old PDFBox-0.7.2.jar is removed from libx/
fuchsi
2007-10-15 13:18:38 +00:00
b54fcd732b*) fixed exceptions that occured when non-integer values were entered where integers were expected
low012
2007-10-12 19:09:20 +00:00
b5f7df8d0aSpeed up remove operations in rowCollections. - Array element shifting during remove is only done when it is necessary to keep the order of a row collection. - This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)
fuchsi
2007-10-11 17:17:08 +00:00
5c91359297accidently commited personal testing values as defaults in last commit.
fuchsi
2007-10-10 15:20:22 +00:00
e255888095Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now.
fuchsi
2007-10-10 15:19:07 +00:00
ce0bb1dc8aIncrease defaults for the DHT Recieve Limits to prevent "busy" states. see
fuchsi
2007-10-10 10:07:16 +00:00
fdb0b861f8*) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer *) added network QPH *) banner is cached for 1 second to avoid DOS *) still no logo
low012
2007-10-09 21:47:37 +00:00
3b8540198bfinally fix the init script. tested this time.
fuchsi
2007-10-09 14:53:16 +00:00
508de558f7sbStackCrawlThread is null during first cleanProfiles() run at startup.
fuchsi
2007-10-08 15:56:40 +00:00
70614385efAttempt to fix the "lost profile handle" bug. It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner.
fuchsi
2007-10-08 15:11:26 +00:00
70884da0ebAdd external package JARs to classpath. (copied from startYACY.sh)
fuchsi
2007-10-07 19:28:21 +00:00
ebfd1e0b42remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required.
fuchsi
2007-10-05 18:52:15 +00:00
4ce25b3661- documentation update - start of new development cycle. in case your don't know: commits until 0.553 will not automatically be used be the auto-update funktion
orbiter
2007-10-04 20:43:52 +00:00
ed20531e68don't encode in channel element as well
fuchsi
2007-10-04 12:12:27 +00:00
9b0948cb4cgnarf. mixed up the positions. finally fixed...
fuchsi
2007-10-04 10:58:01 +00:00
c0f5fc51efbugfix for last commit
fuchsi
2007-10-04 10:47:48 +00:00
33fb2f756dadded emergency fail case in remote crawls in extreme situations this will cause that no remote crawls are send out any more this is bad, but it protects the case where failing remote crawls fill up the local queue too much, which is even worse
orbiter
2007-10-04 10:40:30 +00:00
c5a8585ac6fix more encooding problems in yacysearch.rss. - URL encoding for search terms where required - removed "ugly" CDATA escaping - UTF-8 encoding for the XML - no HTML style escaping for XML/RSS element values Note: some unicode characters might still be encooded in a wrong way.
fuchsi
2007-10-04 09:21:03 +00:00
3e60ae93b9modified remote search snippet fetch behavior: do not fetch snippets for more than 300 milliseconds, even if the snippets can be found locally without online fetch
orbiter
2007-10-03 16:42:11 +00:00
711641f167extended client connection clean-up: there are now two time-outs, one for the complete connection time, and one for an idle time connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed
orbiter
2007-10-03 15:06:12 +00:00
b19bb6e5b1- reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours - removed and added some debugging lines
orbiter
2007-10-03 14:34:05 +00:00
1eba408d2fMake sure that sockets which couldn't be opened aren't handled as active connections, in which case they wouldn't be closed.
fuchsi
2007-10-03 12:18:26 +00:00
03c5b4ad68more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org - RFC-822 date time had to include the time instead of date only - <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1 - <link> elements are mandatory for <channel> and <item>
fuchsi
2007-10-03 04:00:52 +00:00
e3c6236eeffixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case. see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt
fuchsi
2007-10-03 00:46:30 +00:00
d69d386f7dadded additional forced client connection closing if a specific number of simultanous connections is reached the limit is currently set to 64 connections
orbiter
2007-10-03 00:21:53 +00:00
dea7bee049- increased minimum time before an active connection is interrupted from 1 minute to 10 minutes - added sorting by connection time in client connection tabe of connectionTimeComparatorInstance
orbiter
2007-10-02 23:56:04 +00:00
f8e69ce4dcremoved progress bar in Network list
orbiter
2007-10-02 22:50:47 +00:00
aef1ab9526#updated German translation
daburna
2007-10-02 18:42:08 +00:00
1b42152a76fixed and enhanced some details in crawl start with file
orbiter
2007-10-02 00:49:38 +00:00
16e101f135- fix for bad xml tag in Network.xml - switched on automatic deletion of passive peers in pro versions
orbiter
2007-10-01 22:45:44 +00:00
4465db7399removed debug information from network grafic
orbiter
2007-10-01 12:32:10 +00:00
01e0669264re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary
orbiter
2007-10-01 12:30:23 +00:00
d547c3b4bdAvoid NullPointerException in yacySeedDB.lookupByIP
hermens
2007-09-29 11:18:09 +00:00
5b1a937ed8fix for crawl stack database format change, introduced in SVN 4113
orbiter
2007-09-28 08:17:08 +00:00
af25c98306enhanced local search performance in case of a remote search: there is no waiting until the local search terminates to show the result page. the local search appear like all other results from remote peers using a separated thread. This has especially a stron effect, if the local index for a specific word is large.
orbiter
2007-09-28 01:36:22 +00:00
842308ea97- redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown
orbiter
2007-09-28 01:21:31 +00:00
341f7cb327steps to enhance remote search performance: - added a file size limitation, that disallows parsing of large documents during (offline-) remote search - added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation
orbiter
2007-09-26 10:11:50 +00:00
2f1ff048basome fixes to socket connection time-out
orbiter
2007-09-25 23:45:05 +00:00
3c74014004automatic deletion of dead client connections
orbiter
2007-09-25 22:46:11 +00:00
49f1c58d64restoring alternative update location
orbiter
2007-09-25 21:43:16 +00:00