Commit Graph

  • d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset * fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092 and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550 f1ori 2008-11-14 20:49:16 +00:00
  • 8c96bc2ac1 do not use proxy caching rules for crawling orbiter 2008-11-14 16:31:04 +00:00
  • fd83e59f8e new remote search average lotus 2008-11-14 11:50:46 +00:00
  • dba7ef5144 extended crawling constraints: - removed never-used secondary crawl depth - added a must-not-match filter that can be used to exclude urls from a crawl - added stub for crawl tags which will be used to identify search results that had been produced from specific crawls please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'. Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never) orbiter 2008-11-14 09:58:56 +00:00
  • 96174b2b56 more debugging / better result status logging for parser/caching errors orbiter 2008-11-13 23:41:43 +00:00
  • 84185baa81 added more test files for windows from lulabad orbiter 2008-11-13 23:17:30 +00:00
  • 73c44573e8 revert; used to hide memory tables and thread timings lotus 2008-11-13 21:02:18 +00:00
  • 8e1636e6d0 removed unused config-option danielr 2008-11-13 13:08:08 +00:00
  • 90e78b2cf6 * improve encoding detection of http service f1ori 2008-11-12 21:06:32 +00:00
  • 3246358485 mistake -> rename orbiter 2008-11-12 20:10:52 +00:00
  • 55ec57d27f added linux umlute test files from low012 orbiter 2008-11-12 20:02:19 +00:00
  • 0ae84f4f8e set some default values for a crawl start that should cause less confusion and mistakes orbiter 2008-11-12 19:48:22 +00:00
  • e9262b3890 re-named old test files added more mac test files orbiter 2008-11-12 19:41:48 +00:00
  • ff2a54da68 added more umlaute test files: mac orbiter 2008-11-12 19:33:48 +00:00
  • 4745e89451 auto-choose crawl type lotus 2008-11-12 14:44:23 +00:00
  • 421d056550 *) changed layout of blacklist adminstration (less cluttered) *) it is possible to move/edit/delete more than one entry at a time now *) it is easier to choose a target for blacklist import now *) fixed several bugs *) to be continued... low012 2008-11-12 00:47:54 +00:00
  • ef66438662 - more space in error db to store larger error messages - added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files orbiter 2008-11-11 21:42:12 +00:00
  • 674ad2d55b different handling of error cases that occur during loading files with http or ftp: methods throw exception instead of returning an error string orbiter 2008-11-11 21:33:40 +00:00
  • 538359a0ff simple fix to get DHT working again (maybe something more has to be done ;) fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1578 danielr 2008-11-11 18:55:16 +00:00
  • 8ba0c9d1e9 cosmetics lotus 2008-11-10 15:35:05 +00:00
  • a94b3be80b - modern ui for Windows installer - installer has native system language now lotus 2008-11-10 15:33:54 +00:00
  • ae80f3e6a5 * extend opensearchdescription to support compare_yacy.html f1ori 2008-11-09 00:23:19 +00:00
  • 7e1fe05e3c * added utf8-encoding to many getBytes-calls * utf8 should work now f1ori 2008-11-08 20:24:31 +00:00
  • fad044fb54 update to snippet marker: - do not display indexed html (solves xss issues) the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags. - improved speed for existing routine heavy used regex pattern are precompiled now lotus 2008-11-08 10:08:53 +00:00
  • 16723d0fa6 ask another peer if crawljob loading fails lotus 2008-11-06 14:14:34 +00:00
  • 1b18d4bcf3 enhancement to crawling and remote crawling: - for redirector and remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue - when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again orbiter 2008-11-06 12:30:55 +00:00
  • 3f746be5d4 - consolidation and refactoring of many DHT target - computing methods - implemented vertical DHT acceptance ("my own DHT") to accept new targets - added new target computation for global search: addresses vertical targets also - enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries) - better performance value computations for PPM selection in network configuration orbiter 2008-11-06 10:07:53 +00:00
  • d014b2728a Design-check, Extension and Refactoring of DHT target position computation: - two different computations (but mathematical equivalent) of the DHT distance had been consolidated - moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets - added fast Long - to - hash computation - high-precision target computation of gaps for new peers - added new target computation for horizontal and vertical DHT targets (not yet in use) - old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions orbiter 2008-11-03 00:27:23 +00:00
  • dd27ce7216 added control logic to ECO tables that deletes ram copies of the tables if they get too large table copies in ram are now abandoned if less than 20 MB ram is left orbiter 2008-11-02 23:53:09 +00:00
  • 38e6ba5d00 forgot to re-rename commonsPath orbiter 2008-11-02 23:39:02 +00:00
  • 22989d0d8a added property index.storeCommons to switch commons storage on or off with index.storeCommons=false all currently stored commons are deleted! Default is now 'true', but in future full releases it will be switched to 'false' orbiter 2008-11-02 23:30:09 +00:00
  • 4b4ce75396 * http-server: submit charset from html metatags f1ori 2008-11-01 23:17:51 +00:00
  • 69e695bd4b * detect charset for directory index f1ori 2008-11-01 22:14:31 +00:00
  • 340ecd919d * include non ascii characters in visible characters f1ori 2008-11-01 21:13:57 +00:00
  • 5cf0cbb47e javadoc lotus 2008-11-01 08:56:58 +00:00
  • 8d07607d1d update to resource observer: - returns high/medium/low disk space - pauses crawling on medium disk space - disables index receive on low disk space lotus 2008-10-31 11:33:17 +00:00
  • 83967f8c77 *) servlet does not forget chosen blacklist anymore when editing, moving or delting an entry *) move or edit will only be performed if new value actually differs from old one low012 2008-10-30 00:03:14 +00:00
  • 04e41a392f *) fixed bug where RegExes were not deleted and even added to the list a second time when the user tried to edit them low012 2008-10-29 22:49:44 +00:00
  • d0543a7c39 * fix the debug ant-target * fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556) f1ori 2008-10-27 22:16:56 +00:00
  • 7bac4796d2 *) added servlet which returns all shared blacklists of a peer without information about which part of YaCy (crawler, proxy, ...) blacklist is activated for (to be used for better online import) low012 2008-10-27 17:33:43 +00:00
  • baae3d91b1 *) fixed warning when compiling listManager *) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for *) replaced regular put() with putXML() in several cases low012 2008-10-27 16:56:19 +00:00
  • 444575e33d *) prevent XSS when importing blacklist low012 2008-10-27 11:06:38 +00:00
  • a4fb76e93c undo r5300 (not fixed as seen after longer run) danielr 2008-10-25 23:20:09 +00:00
  • a99a629ed4 *) quick fix to prevent comments for blog entries which don't exist (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1554) low012 2008-10-25 12:04:10 +00:00
  • 00e27e5050 *) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist *) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this) *) further minor fixes *) to be continued... low012 2008-10-25 00:11:03 +00:00
  • 0f9c0bd0d5 fix for ConcurrentModificationException at de.anomic.index.indexContainerHeap$heapCacheIterator.next(indexContainerHeap.java:324) danielr 2008-10-24 14:00:41 +00:00
  • 103ad2a437 some javadoc danielr 2008-10-24 13:58:26 +00:00
  • b098522977 some very small advances to index utf-8 (not working yet), inserted also debugging code orbiter 2008-10-22 22:04:13 +00:00
  • 2f49666908 integrated the character decoding into the parser, removed old code orbiter 2008-10-22 20:56:13 +00:00
  • 49293c1358 fix for deadlock in new encoder :-( orbiter 2008-10-22 19:36:34 +00:00
  • 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html. The old process used a not really efficient way to detect html encoding strings in texts. All calling methods had been adoped to call the new class in an enhanced way with less parameters. orbiter 2008-10-22 18:59:04 +00:00
  • 958ec20cd0 removed specialized umlaute-handling in html parser. This has to be replaced by something that is able to transfer all possible html encodings into utf-8. Please see SVN 5293 for test cases. orbiter 2008-10-22 11:11:55 +00:00
  • 204220ecd5 added test files for UTF-8 / Umlaute - Testing: These 3 files contain the same text in different HTML encodings. We use this documents to test if the parser and indexer creates the same set of word hashes for all three texts. orbiter 2008-10-22 11:07:14 +00:00
  • 2e53cbc66a should compile now f1ori 2008-10-22 08:50:30 +00:00
  • f3bf2e379e should compile again f1ori 2008-10-22 07:35:49 +00:00
  • dd8441f102 fix bug: data from plasmaParser is allready converted to UTF-8 After removing the restrictions in the code, YaCy should be able to index Unicode-charaters! f1ori 2008-10-21 20:19:10 +00:00
  • 47f0c3b002 replaced the cacheAdmin with the ViewFile servlet, because the cacheAdmin was an interface to the old HTCACHE data structure which does not exist any more. Changed links to point to the ViewFile servlets. orbiter 2008-10-21 11:27:50 +00:00
  • 6941bf42b1 performance hacks orbiter 2008-10-20 14:07:09 +00:00
  • 9b0c4b1063 redesign of parts of the new BLOB buffer orbiter 2008-10-19 22:30:44 +00:00
  • 1778fb420d - added some performance tweaks to the new BLOB buffer - removed the now superfluous HT storage thread - reduced number of file decompression by shifting the compression moment to the future orbiter 2008-10-19 18:10:42 +00:00
  • 77e41da7d2 *) further propagation of display value (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1536) *) removed another depreciated parameter "time" which led to ugly -UNRESOLVED_PATTERN- in URL low012 2008-10-18 19:39:46 +00:00
  • 9663e61449 added another class to handle BLOB writings to the new HTCACHE data storage: - entries are buffered and written as stream with many entries at once (saves many IO accesses) - entries are compressed with gzip: increases capacity of cache - concurrency for stream-writing and compression: all writings to the cache are non-blocking orbiter 2008-10-18 08:57:48 +00:00
  • ff46ce8520 *) fixed display=2 (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1536) low012 2008-10-17 19:57:39 +00:00
  • 382226da94 fix for bug introduced in SVN 5281: parameters were switched orbiter 2008-10-17 12:14:57 +00:00
  • f2fd043797 refactoring (moved duplicate code into methods) danielr 2008-10-17 10:39:32 +00:00
  • c612046e5e r5278 java 1.5 compatible danielr 2008-10-17 09:59:59 +00:00
  • af71ec93bf ops, forgot to import something f1ori 2008-10-16 22:44:25 +00:00
  • 9e65e9141c * always use UTF-8 for encoding hashes f1ori 2008-10-16 22:35:27 +00:00
  • 03d2b323c2 * remove bad mimetype tag so I see my changes f1ori 2008-10-16 22:31:38 +00:00
  • 826ca79735 refactoring and new architecture to store the files of the web cache: - files are not stored any more as individual files - a new database structure using BLOBHeap files stores many cache entries in common files - all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods orbiter 2008-10-16 21:24:09 +00:00
  • 2b18a9b2c4 *) removed depreciated parameter "time" which led to ugly -UNRESOLVED_PATTERN- in URL low012 2008-10-16 19:31:29 +00:00
  • f095137238 - respecting httpdMaxBusySessions (refusing new connections if limit is hit) - comments in serverBusyThread converted to JavaDoc - better debug output for npe-case in diskUsage danielr 2008-10-16 10:53:32 +00:00
  • 94e43ece41 unter debian yacy als (fast) letztes starten und erstes beenden danielr 2008-10-14 21:53:25 +00:00
  • 9d69964d3d start daemon in different runlevels on debian and fedora/openSUSE-Systems f1ori 2008-10-14 15:28:24 +00:00
  • 8ba33f104e fix for npe orbiter 2008-10-13 21:59:53 +00:00
  • 998861acfd - some refactoring in BLOBHeap to enable more gap processing functions - better gap merging in BLOBHeap - shrinking of heap file if gap is at end of file when file is closed orbiter 2008-10-13 21:15:54 +00:00
  • 9d50bfd0b3 fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562 lotus 2008-10-13 09:09:53 +00:00
  • 766cad6e93 enhancement in memory management of BLOB Heap files / merging of deleted entries orbiter 2008-10-12 22:15:01 +00:00
  • 7860d5d632 fix for bug in seed list management (cause was bad class overloading, only visual effects!) orbiter 2008-10-12 19:51:53 +00:00
  • 603282bcf4 fix for out of bounds exception lotus 2008-10-11 07:47:34 +00:00
  • ffed5fc415 fixed problem with lost peers in database migrated seedDB from BLOBTree to BLOBHeap orbiter 2008-10-10 14:40:02 +00:00
  • 6fb865fbdc - fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing - some refactoring of classes that use kelondroMap (Map instead of HashMap) orbiter 2008-10-10 08:39:11 +00:00
  • 2d65887723 - fix for bug in new profile handling - added a new feature in ymageChart (cannot be seen yet, just wait... will be used in profiling chart) orbiter 2008-10-09 22:31:43 +00:00
  • 4df63626f5 sorry lotus 2008-10-09 13:51:43 +00:00
  • 736dd86193 - option enableSimpleConfig can disable hidden tables - corrected some Xmx values - friendlier welcome message format lotus 2008-10-09 12:48:43 +00:00
  • ff68f394dd fix for problem with balancer and lost crawl profiles: if crawl profile ist lost, no robots.txt is loaded any more orbiter 2008-10-08 18:26:36 +00:00
  • d197a62faf * wrong default parameter in initscript * should fix http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1500 f1ori 2008-10-08 10:54:25 +00:00
  • f0c7166a48 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1505 danielr 2008-10-08 09:48:04 +00:00
  • 88fdd4c4f2 * remove unix wrappers from installonlinux-target, because they are useless on systemwide installations f1ori 2008-10-05 22:13:09 +00:00
  • 9286afad4e new development cycle - nun aber wirklich :-P daburna 2008-10-05 22:08:14 +00:00
  • ed51ccf725 * correct error message f1ori 2008-10-05 21:34:13 +00:00
  • fe6142a6ab *) ...and even more languages. low012 2008-10-05 20:20:21 +00:00
  • 3717d2057a YaCy-UI: fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1483 apfelmaennchen 2008-10-05 18:50:43 +00:00
  • 9f0cc3afdd *) more languages low012 2008-10-05 12:28:40 +00:00
  • e96a3d0472 *) added statistics for a few languages: Czech, Esperanto, Irish, Turkish low012 2008-10-05 10:07:22 +00:00
  • fb8d9850ea fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1462 lotus 2008-10-05 10:03:02 +00:00
  • 0d1a2f6183 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1461 lotus 2008-10-04 12:36:11 +00:00
  • d0bdcdd57c small changes to attributes of DoS attack protection parameters orbiter 2008-10-03 19:44:42 +00:00
  • 9ac16f565b - fixed several bugs in database management functions - fixed a display bug for the performance graph - fixed deadlock when initialization of awt happens simultanously - removed some debugging output orbiter 2008-10-03 18:57:02 +00:00
  • 1976b89bef new development cycle daburna 2008-10-03 16:10:40 +00:00