Commit Graph

  • 13cb0916ee changes to statistics and content of thread dump servlet (points now more directly to performance leaks without mentioning class calls inside of sun/java calls that cannot be changed anyway) orbiter 2008-12-11 20:13:14 +00:00
  • db6b3bf5a3 speed enhancement for integrated http server: - tuning hacks in template engine - bypassing the template engine if no servlet present orbiter 2008-12-11 20:10:37 +00:00
  • 7cd08bd5fb fix for NPE in BLOBCompressor orbiter 2008-12-11 13:33:24 +00:00
  • 5b94498643 fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method orbiter 2008-12-10 14:35:01 +00:00
  • 1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile - added a write buffer to BLOBHeap - modified the BLOBBuffer (is now only to buffer non-compressed content) - added content compression to the HTCache The new read cache will decrease the start/initialization time of BLOB files, like the HTCache, RobotsTxt and other BLOBHeap structures. orbiter 2008-12-10 11:15:19 +00:00
  • e1acdb952c fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376 orbiter 2008-12-08 00:17:45 +00:00
  • 2c682d649b - no stop shortcut (-> stop via tray) - store registry keys on current profile lotus 2008-12-07 19:37:49 +00:00
  • e918d64c23 show hand-cursor an labels lotus 2008-12-06 17:32:53 +00:00
  • 4a2dac659e more speed hacks: - modified and activated write buffer - increased cache flush factor - fixed a problem with deadlocking of indexing process orbiter 2008-12-05 13:55:48 +00:00
  • 07d7653de1 update to JRE 6u11 lotus 2008-12-05 11:23:01 +00:00
  • 1fb518a5b4 display <String> etc. lotus 2008-12-04 20:21:53 +00:00
  • 47292e696a more performance hacks orbiter 2008-12-04 12:54:16 +00:00
  • 759cef23dd fix for bug in kelondroAbstractRA.readFully orbiter 2008-12-03 23:32:07 +00:00
  • bd1dc9cd5d thread dump with statistics, a little bit of profiling orbiter 2008-12-03 23:26:25 +00:00
  • d39d420b39 performance hacks orbiter 2008-12-03 15:38:29 +00:00
  • 5280ad638d added basic performance page other performance settings can be found on advanced settings lotus 2008-12-03 14:10:01 +00:00
  • 1a51d9fcfd display proper values lotus 2008-12-02 17:57:30 +00:00
  • 0b4808ba3d added new interactive search feature: - during the user types search queries, the local database is searched - results are presented interactively orbiter 2008-12-02 15:24:25 +00:00
  • 74a3d86114 fixed a error response that might present classified information orbiter 2008-12-01 23:14:42 +00:00
  • c6525ab75f fix for NPE in seed handling orbiter 2008-12-01 23:08:27 +00:00
  • fea82b54ef more contrast on search snippets lotus 2008-11-26 19:57:13 +00:00
  • 1951d30a62 addendum to last commit handle words with length < 3 correctly lotus 2008-11-26 19:43:40 +00:00
  • 325ba7bfb8 only query words with length > 2 this is not complete, yet lotus 2008-11-26 16:41:38 +00:00
  • 489edb4473 improved pattern selection lotus 2008-11-26 10:06:38 +00:00
  • e423fa9846 *) added method to only get file names in directory listing which match a filter *) only files which end with .black will be listed as blacklists *) added a little bit of Javadoc low012 2008-11-25 20:26:06 +00:00
  • 577b53aee6 added more search engines lotus 2008-11-24 13:05:20 +00:00
  • 7f4d411c0d npe-fix lotus 2008-11-24 13:04:57 +00:00
  • 513179f404 changed interface to colletctionIndex and adopted all implementing classes: do not return a result of a double-check when adding entries with addUnique orbiter 2008-11-23 23:55:08 +00:00
  • 9d64693cfb reverting again the changes to new concurrent chunkIterator orbiter 2008-11-23 22:22:44 +00:00
  • 45ad1c3dd5 - re-activated concurrent iterator for EcoFiles - added javadoc for new concurrent intialization in kelondroBytesLongMap - switched default value for commons storage to false - version step orbiter 2008-11-23 18:25:40 +00:00
  • 2e2120046f speed enhancement for BLOBHeap opening process using concurrency of FileIO and content processing orbiter 2008-11-23 17:38:01 +00:00
  • 1545e5440a * index deletion: checkbox-confirmation * watch crawler: less load on exhausted peers; wait for data before reloading again lotus 2008-11-23 12:02:58 +00:00
  • fa26a8f25a fix for deadlock-like behavior in balancer orbiter 2008-11-22 11:25:01 +00:00
  • 1918a0173e added more exception handling during crawling orbiter 2008-11-22 00:40:18 +00:00
  • 10f5ec1040 reverted last commit (more testing needed) orbiter 2008-11-22 00:12:50 +00:00
  • 5af8923f37 * distribute forgotten jar-file in parser f1ori 2008-11-22 00:05:04 +00:00
  • b0f2003792 fast database initialization and fast start.up of yacy: - applied knowledge about concurrent files stream reading and index processing from the wikimedia reader to the EcoTable initialization process: the file reader is now concurrent to the index generation - changed also some initialization processes to avoid some pauses during initialization orbiter 2008-11-21 23:21:33 +00:00
  • ba5b274b8c #translation update: -blacklist -crawlstart ... daburna 2008-11-21 16:45:45 +00:00
  • 0ca4bc7b79 - added reader and visualization for mediawiki-export files: files exported from mediawiki using the xml schema according to http://www.mediawiki.org/xml/export-0.3/ can be processed to be viewed in a YaCy servlet. To acces such a file, place it into DATA/HTCACHE/mediawiki/ i.e. the export from german wikipedia would be: DATA/HTCACHE/mediawiki/wikipedia.de.xml This file can then be accessed using the URL http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy if this is done the first time, an index file is created (for this case: more than 4 million lines must be written, this takes about 15 minutes) Then try the same url again. orbiter 2008-11-20 18:31:52 +00:00
  • 2e63f03ca5 copy&paste vergessen :/ danielr 2008-11-20 11:41:11 +00:00
  • cd8082b4e3 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166 danielr 2008-11-20 11:18:19 +00:00
  • 4f996a7651 fix for logparser pattern lotus 2008-11-17 16:23:17 +00:00
  • d18c18971e * dirlisting in UTF-8 encoding * fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550&hilit=#p11108 f1ori 2008-11-15 20:49:03 +00:00
  • bb570716e6 added more testfiles lotus 2008-11-15 09:00:24 +00:00
  • 867d0f2f56 removed some unnecessary pause delays orbiter 2008-11-14 23:36:33 +00:00
  • d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset * fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092 and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550 f1ori 2008-11-14 20:49:16 +00:00
  • 8c96bc2ac1 do not use proxy caching rules for crawling orbiter 2008-11-14 16:31:04 +00:00
  • fd83e59f8e new remote search average lotus 2008-11-14 11:50:46 +00:00
  • dba7ef5144 extended crawling constraints: - removed never-used secondary crawl depth - added a must-not-match filter that can be used to exclude urls from a crawl - added stub for crawl tags which will be used to identify search results that had been produced from specific crawls please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'. Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never) orbiter 2008-11-14 09:58:56 +00:00
  • 96174b2b56 more debugging / better result status logging for parser/caching errors orbiter 2008-11-13 23:41:43 +00:00
  • 84185baa81 added more test files for windows from lulabad orbiter 2008-11-13 23:17:30 +00:00
  • 73c44573e8 revert; used to hide memory tables and thread timings lotus 2008-11-13 21:02:18 +00:00
  • 8e1636e6d0 removed unused config-option danielr 2008-11-13 13:08:08 +00:00
  • 90e78b2cf6 * improve encoding detection of http service f1ori 2008-11-12 21:06:32 +00:00
  • 3246358485 mistake -> rename orbiter 2008-11-12 20:10:52 +00:00
  • 55ec57d27f added linux umlute test files from low012 orbiter 2008-11-12 20:02:19 +00:00
  • 0ae84f4f8e set some default values for a crawl start that should cause less confusion and mistakes orbiter 2008-11-12 19:48:22 +00:00
  • e9262b3890 re-named old test files added more mac test files orbiter 2008-11-12 19:41:48 +00:00
  • ff2a54da68 added more umlaute test files: mac orbiter 2008-11-12 19:33:48 +00:00
  • 4745e89451 auto-choose crawl type lotus 2008-11-12 14:44:23 +00:00
  • 421d056550 *) changed layout of blacklist adminstration (less cluttered) *) it is possible to move/edit/delete more than one entry at a time now *) it is easier to choose a target for blacklist import now *) fixed several bugs *) to be continued... low012 2008-11-12 00:47:54 +00:00
  • ef66438662 - more space in error db to store larger error messages - added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files orbiter 2008-11-11 21:42:12 +00:00
  • 674ad2d55b different handling of error cases that occur during loading files with http or ftp: methods throw exception instead of returning an error string orbiter 2008-11-11 21:33:40 +00:00
  • 538359a0ff simple fix to get DHT working again (maybe something more has to be done ;) fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1578 danielr 2008-11-11 18:55:16 +00:00
  • 8ba0c9d1e9 cosmetics lotus 2008-11-10 15:35:05 +00:00
  • a94b3be80b - modern ui for Windows installer - installer has native system language now lotus 2008-11-10 15:33:54 +00:00
  • ae80f3e6a5 * extend opensearchdescription to support compare_yacy.html f1ori 2008-11-09 00:23:19 +00:00
  • 7e1fe05e3c * added utf8-encoding to many getBytes-calls * utf8 should work now f1ori 2008-11-08 20:24:31 +00:00
  • fad044fb54 update to snippet marker: - do not display indexed html (solves xss issues) the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags. - improved speed for existing routine heavy used regex pattern are precompiled now lotus 2008-11-08 10:08:53 +00:00
  • 16723d0fa6 ask another peer if crawljob loading fails lotus 2008-11-06 14:14:34 +00:00
  • 1b18d4bcf3 enhancement to crawling and remote crawling: - for redirector and remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue - when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again orbiter 2008-11-06 12:30:55 +00:00
  • 3f746be5d4 - consolidation and refactoring of many DHT target - computing methods - implemented vertical DHT acceptance ("my own DHT") to accept new targets - added new target computation for global search: addresses vertical targets also - enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries) - better performance value computations for PPM selection in network configuration orbiter 2008-11-06 10:07:53 +00:00
  • d014b2728a Design-check, Extension and Refactoring of DHT target position computation: - two different computations (but mathematical equivalent) of the DHT distance had been consolidated - moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets - added fast Long - to - hash computation - high-precision target computation of gaps for new peers - added new target computation for horizontal and vertical DHT targets (not yet in use) - old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions orbiter 2008-11-03 00:27:23 +00:00
  • dd27ce7216 added control logic to ECO tables that deletes ram copies of the tables if they get too large table copies in ram are now abandoned if less than 20 MB ram is left orbiter 2008-11-02 23:53:09 +00:00
  • 38e6ba5d00 forgot to re-rename commonsPath orbiter 2008-11-02 23:39:02 +00:00
  • 22989d0d8a added property index.storeCommons to switch commons storage on or off with index.storeCommons=false all currently stored commons are deleted! Default is now 'true', but in future full releases it will be switched to 'false' orbiter 2008-11-02 23:30:09 +00:00
  • 4b4ce75396 * http-server: submit charset from html metatags f1ori 2008-11-01 23:17:51 +00:00
  • 69e695bd4b * detect charset for directory index f1ori 2008-11-01 22:14:31 +00:00
  • 340ecd919d * include non ascii characters in visible characters f1ori 2008-11-01 21:13:57 +00:00
  • 5cf0cbb47e javadoc lotus 2008-11-01 08:56:58 +00:00
  • 8d07607d1d update to resource observer: - returns high/medium/low disk space - pauses crawling on medium disk space - disables index receive on low disk space lotus 2008-10-31 11:33:17 +00:00
  • 83967f8c77 *) servlet does not forget chosen blacklist anymore when editing, moving or delting an entry *) move or edit will only be performed if new value actually differs from old one low012 2008-10-30 00:03:14 +00:00
  • 04e41a392f *) fixed bug where RegExes were not deleted and even added to the list a second time when the user tried to edit them low012 2008-10-29 22:49:44 +00:00
  • d0543a7c39 * fix the debug ant-target * fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556) f1ori 2008-10-27 22:16:56 +00:00
  • 7bac4796d2 *) added servlet which returns all shared blacklists of a peer without information about which part of YaCy (crawler, proxy, ...) blacklist is activated for (to be used for better online import) low012 2008-10-27 17:33:43 +00:00
  • baae3d91b1 *) fixed warning when compiling listManager *) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for *) replaced regular put() with putXML() in several cases low012 2008-10-27 16:56:19 +00:00
  • 444575e33d *) prevent XSS when importing blacklist low012 2008-10-27 11:06:38 +00:00
  • a4fb76e93c undo r5300 (not fixed as seen after longer run) danielr 2008-10-25 23:20:09 +00:00
  • a99a629ed4 *) quick fix to prevent comments for blog entries which don't exist (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1554) low012 2008-10-25 12:04:10 +00:00
  • 00e27e5050 *) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist *) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this) *) further minor fixes *) to be continued... low012 2008-10-25 00:11:03 +00:00
  • 0f9c0bd0d5 fix for ConcurrentModificationException at de.anomic.index.indexContainerHeap$heapCacheIterator.next(indexContainerHeap.java:324) danielr 2008-10-24 14:00:41 +00:00
  • 103ad2a437 some javadoc danielr 2008-10-24 13:58:26 +00:00
  • b098522977 some very small advances to index utf-8 (not working yet), inserted also debugging code orbiter 2008-10-22 22:04:13 +00:00
  • 2f49666908 integrated the character decoding into the parser, removed old code orbiter 2008-10-22 20:56:13 +00:00
  • 49293c1358 fix for deadlock in new encoder :-( orbiter 2008-10-22 19:36:34 +00:00
  • 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html. The old process used a not really efficient way to detect html encoding strings in texts. All calling methods had been adoped to call the new class in an enhanced way with less parameters. orbiter 2008-10-22 18:59:04 +00:00
  • 958ec20cd0 removed specialized umlaute-handling in html parser. This has to be replaced by something that is able to transfer all possible html encodings into utf-8. Please see SVN 5293 for test cases. orbiter 2008-10-22 11:11:55 +00:00
  • 204220ecd5 added test files for UTF-8 / Umlaute - Testing: These 3 files contain the same text in different HTML encodings. We use this documents to test if the parser and indexer creates the same set of word hashes for all three texts. orbiter 2008-10-22 11:07:14 +00:00
  • 2e53cbc66a should compile now f1ori 2008-10-22 08:50:30 +00:00
  • f3bf2e379e should compile again f1ori 2008-10-22 07:35:49 +00:00