a31b9097a4preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls
orbiter
2007-10-29 01:43:20 +00:00
a718858e8bseed.CCOUNT is interpreted as a double value not int
fuchsi
2007-10-24 23:25:48 +00:00
d85821a88cfix for SVN 4178
orbiter
2007-10-24 22:00:34 +00:00
0e1738899f* Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '<') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.
fuchsi
2007-10-24 21:38:19 +00:00
f8318436a1fix for last commit
orbiter
2007-10-22 16:32:39 +00:00
7d57b80598distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158
orbiter
2007-10-22 15:26:47 +00:00
caff520988Removed unnecessary and unused code.
hermens
2007-10-20 11:56:15 +00:00
d732840f8aAvoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing.
hermens
2007-10-19 23:36:40 +00:00
35303f9504add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
fuchsi
2007-10-19 21:47:07 +00:00
38bbd4a4b3no code changes. just touched yacyClient.java to trigger a rebuild of the file in an uncleaned tree. NOTE: run "ant clean" before building SVN 4166/4167 in a tree that includes class files from a previous build to make sure, that every class file is rebuilt!
fuchsi
2007-10-19 15:31:38 +00:00
f717beecb1- Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers. - some minor code cleanups (mostly unnecessary casts, null checks)
fuchsi
2007-10-19 04:13:46 +00:00
ca83f5a8d9Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3). Add the packages to the eclipse .classpath. Closes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=453
fuchsi
2007-10-18 19:53:52 +00:00
3352474dd8Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers.
fuchsi
2007-10-16 13:29:11 +00:00
06e6a1ff62Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code). NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.
fuchsi
2007-10-16 02:12:31 +00:00
e77aec8c9dfix handling of encrypted PDF-Documents (with default user password "") - update PDFBox package to current version 0.7.3 - use new security model in PDFBox to "guess" wether we can decrypt a document or not NOTE: When upgrading to this version make sure the old PDFBox-0.7.2.jar is removed from libx/
fuchsi
2007-10-15 13:18:38 +00:00
b54fcd732b*) fixed exceptions that occured when non-integer values were entered where integers were expected
low012
2007-10-12 19:09:20 +00:00
b5f7df8d0aSpeed up remove operations in rowCollections. - Array element shifting during remove is only done when it is necessary to keep the order of a row collection. - This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)
fuchsi
2007-10-11 17:17:08 +00:00
5c91359297accidently commited personal testing values as defaults in last commit.
fuchsi
2007-10-10 15:20:22 +00:00
e255888095Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now.
fuchsi
2007-10-10 15:19:07 +00:00
ce0bb1dc8aIncrease defaults for the DHT Recieve Limits to prevent "busy" states. see
fuchsi
2007-10-10 10:07:16 +00:00
fdb0b861f8*) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer *) added network QPH *) banner is cached for 1 second to avoid DOS *) still no logo
low012
2007-10-09 21:47:37 +00:00
3b8540198bfinally fix the init script. tested this time.
fuchsi
2007-10-09 14:53:16 +00:00
508de558f7sbStackCrawlThread is null during first cleanProfiles() run at startup.
fuchsi
2007-10-08 15:56:40 +00:00
70614385efAttempt to fix the "lost profile handle" bug. It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner.
fuchsi
2007-10-08 15:11:26 +00:00
70884da0ebAdd external package JARs to classpath. (copied from startYACY.sh)
fuchsi
2007-10-07 19:28:21 +00:00
ebfd1e0b42remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required.
fuchsi
2007-10-05 18:52:15 +00:00
4ce25b3661- documentation update - start of new development cycle. in case your don't know: commits until 0.553 will not automatically be used be the auto-update funktion
orbiter
2007-10-04 20:43:52 +00:00
ed20531e68don't encode in channel element as well
fuchsi
2007-10-04 12:12:27 +00:00
9b0948cb4cgnarf. mixed up the positions. finally fixed...
fuchsi
2007-10-04 10:58:01 +00:00
c0f5fc51efbugfix for last commit
fuchsi
2007-10-04 10:47:48 +00:00
33fb2f756dadded emergency fail case in remote crawls in extreme situations this will cause that no remote crawls are send out any more this is bad, but it protects the case where failing remote crawls fill up the local queue too much, which is even worse
orbiter
2007-10-04 10:40:30 +00:00
c5a8585ac6fix more encooding problems in yacysearch.rss. - URL encoding for search terms where required - removed "ugly" CDATA escaping - UTF-8 encoding for the XML - no HTML style escaping for XML/RSS element values Note: some unicode characters might still be encooded in a wrong way.
fuchsi
2007-10-04 09:21:03 +00:00
3e60ae93b9modified remote search snippet fetch behavior: do not fetch snippets for more than 300 milliseconds, even if the snippets can be found locally without online fetch
orbiter
2007-10-03 16:42:11 +00:00
711641f167extended client connection clean-up: there are now two time-outs, one for the complete connection time, and one for an idle time connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed
orbiter
2007-10-03 15:06:12 +00:00
b19bb6e5b1- reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours - removed and added some debugging lines
orbiter
2007-10-03 14:34:05 +00:00
1eba408d2fMake sure that sockets which couldn't be opened aren't handled as active connections, in which case they wouldn't be closed.
fuchsi
2007-10-03 12:18:26 +00:00
03c5b4ad68more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org - RFC-822 date time had to include the time instead of date only - <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1 - <link> elements are mandatory for <channel> and <item>
fuchsi
2007-10-03 04:00:52 +00:00
e3c6236eeffixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case. see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt
fuchsi
2007-10-03 00:46:30 +00:00
d69d386f7dadded additional forced client connection closing if a specific number of simultanous connections is reached the limit is currently set to 64 connections
orbiter
2007-10-03 00:21:53 +00:00
dea7bee049- increased minimum time before an active connection is interrupted from 1 minute to 10 minutes - added sorting by connection time in client connection tabe of connectionTimeComparatorInstance
orbiter
2007-10-02 23:56:04 +00:00
f8e69ce4dcremoved progress bar in Network list
orbiter
2007-10-02 22:50:47 +00:00
aef1ab9526#updated German translation
daburna
2007-10-02 18:42:08 +00:00
1b42152a76fixed and enhanced some details in crawl start with file
orbiter
2007-10-02 00:49:38 +00:00
16e101f135- fix for bad xml tag in Network.xml - switched on automatic deletion of passive peers in pro versions
orbiter
2007-10-01 22:45:44 +00:00
4465db7399removed debug information from network grafic
orbiter
2007-10-01 12:32:10 +00:00
01e0669264re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary
orbiter
2007-10-01 12:30:23 +00:00
d547c3b4bdAvoid NullPointerException in yacySeedDB.lookupByIP
hermens
2007-09-29 11:18:09 +00:00
5b1a937ed8fix for crawl stack database format change, introduced in SVN 4113
orbiter
2007-09-28 08:17:08 +00:00
af25c98306enhanced local search performance in case of a remote search: there is no waiting until the local search terminates to show the result page. the local search appear like all other results from remote peers using a separated thread. This has especially a stron effect, if the local index for a specific word is large.
orbiter
2007-09-28 01:36:22 +00:00
842308ea97- redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown
orbiter
2007-09-28 01:21:31 +00:00
341f7cb327steps to enhance remote search performance: - added a file size limitation, that disallows parsing of large documents during (offline-) remote search - added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation
orbiter
2007-09-26 10:11:50 +00:00
2f1ff048basome fixes to socket connection time-out
orbiter
2007-09-25 23:45:05 +00:00
3c74014004automatic deletion of dead client connections
orbiter
2007-09-25 22:46:11 +00:00
49f1c58d64restoring alternative update location
orbiter
2007-09-25 21:43:16 +00:00
d352853f2dfix for non-closing client sessions
orbiter
2007-09-24 08:42:07 +00:00
1488769e1fcleanup of unmaintained and outdated performance methods: removed object pools in httpc. Object pooling is not recommended, if the creation of the object is not time-intensive. Object pools are only useful, if there is much computation necessary to create some basic data that is stored in the object pool and can be re-used. This does not apply to object pools in YaCy. Object pooling of client sessions would make sense if they would allow re-use of living connections to other yacy clients. But every connection is closed after usage of an object in the client pool, therefore the YaCy server client objects are not such that hold hardware/network-allocated entities. See: http://www.javaperformancetuning.com/news/qotm033.shtmlhttp://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_poolinghttp://docs.sun.com/source/816-7159-10/pt_chap5.htmlhttp://www.microjava.com/articles/techtalk/recylcle2
orbiter
2007-09-23 20:49:52 +00:00
3cb9cdc9betry to fix connection problem, possible cause for wrong junior status and non-passive passive peers: the YaCy client treats disconnections during data transmissions as error and discards all data transmitted so far this did not happen so far until I removed a delay time at the end of the daemon session which prevented this case. To fix this problem, disconnections during transmissions are not treated as error now, which means that end-of-transmissions with sudden disconnections are not a cause for peer diconnections any more. To be nice to non-updated peers, the sleep time at the end of server sessions is also re-enabled.
orbiter
2007-09-23 17:31:29 +00:00
00dab81077simpler solution to last commit + works with and without navigation collumn on the left
fuchsi
2007-09-20 01:52:10 +00:00
eb16a99e94avoid floating of long page titles around the favicon in search results
fuchsi
2007-09-19 22:08:56 +00:00
9524b9c16asecond try of rev 4100 :). Tested in Iceweasel/Firefox 2.0.6, Konqueror 3.5.7, Opera 9.23 (all linux) and IE6-SP1 (wine)
fuchsi
2007-09-17 19:39:15 +00:00
6b8faaadb6undo last commit for further evaluation, a progressbar element is used on other pages as well...
fuchsi
2007-09-17 03:36:35 +00:00
1880bba420A few changes to the progress bar and search result statistics layout influenced by the discussion in <http://forum.yacy-websuche.de/viewtopic.php?f=5&t=268> with the idea of saving vertical space. Please check in every available browser and comment wether it's better than before. ;)
fuchsi
2007-09-16 14:30:53 +00:00
404ebf1474# update of de.lng - NO unused strings anymore!!!
daburna
2007-09-16 10:17:22 +00:00
041922652a# update of de.lng - removed or updated unsused strings - updated some files
daburna
2007-09-15 13:10:56 +00:00
ba59de773fagain and again junior - test
borg-0300
2007-09-13 17:05:53 +00:00
9fa75ef4d1Limit the percentage of the progress indicator to reasonable values
hermens
2007-09-13 16:37:23 +00:00
4275727d69fix for peer ping problem (implemented a 3-time re-ping); cause for 'Connection reset' still unknown
orbiter
2007-09-12 00:42:53 +00:00
e78098be9bAccording to HTML-Specs "name" and "id" attributes share the same namespace. So we can't have one element with name="offset" and another one with id="offset". Additionally IE6's getElementById() returns elements with matching names as well and Opera is mimicing this behaviour.
fuchsi
2007-09-11 16:21:14 +00:00
07d1e98909fixed round-robin method of peer-ping order (the successfully pinged peer was not updated to current last-seed date)
orbiter
2007-09-11 16:07:35 +00:00
a1dcd065adsome tweaks to the search results layout
fuchsi
2007-09-11 15:56:14 +00:00
76e4c2d69efix for peer-ping in case that remote peer does not respond with valid values
orbiter
2007-09-11 15:27:01 +00:00
5b0c1449e1various fixes and cleanups for blacklist handling: 1. avoid adding duplicate file name entries in config properties for lists, 2. correctly merge all path masks from all list files for the same host masks, 3. rewrite helper methods standard java methods for Collection transformations, 4. merged various methods with identical functionality for different Collection implementations into one, 5. minor refactoring to improve code readability.
fuchsi
2007-09-10 06:20:27 +00:00
e27aeb7fdcpatch for bad crawl filter at crawl start
orbiter
2007-09-09 19:21:41 +00:00