Commit Graph

  • a9e2acd6c0 *) new dark skin by KoH theli 2006-10-13 11:20:39 +00:00
  • c628df43a4 *) removed unused image-file auron_x 2006-10-13 09:23:55 +00:00
  • c6d46f7ebd null pointer bugfix orbiter 2006-10-13 08:03:11 +00:00
  • decb09df6d *) Trying to be more tolerant against wrong charset names theli 2006-10-13 05:30:20 +00:00
  • e9afe39cbb *) Trying to be more tolerant against wrong charset names See: http://www.yacy-forum.de/viewtopic.php?p=26662 theli 2006-10-13 05:08:56 +00:00
  • 7526c831a8 *) Suppressing stracktrace theli 2006-10-13 04:34:49 +00:00
  • 50f2578c55 - some bugfixing and code cleanup - now assortments can completely left out if they do not exist before startup and collection index is selected. orbiter 2006-10-13 01:19:26 +00:00
  • bdf4c7c51e added missing files for last commit orbiter 2006-10-12 23:17:16 +00:00
  • a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) orbiter 2006-10-12 23:14:41 +00:00
  • 130cc76927 loop detection and termination in deletedHandles method see also: http://www.yacy-forum.de/viewtopic.php?p=26655#26655 orbiter 2006-10-12 19:50:09 +00:00
  • 1c4076da8a First version of the MS Powerpoint parser based on Apache POI octoate 2006-10-12 17:28:53 +00:00
  • 1cd9e9aad4 First version of the MS Powerpoint parser based on Apache POI octoate 2006-10-12 17:28:33 +00:00
  • e126598a0f *) small enhancement to webinterface, progressbars are now not stretched images, but <div>'s with colored background -> all skin files were set to use green progressbars (should be changed to colors fitting the skins appearence) auron_x 2006-10-12 17:10:28 +00:00
  • 5b75d64d7d *) bugfix for last commit theli 2006-10-12 09:39:25 +00:00
  • 71ed104bc7 *) adding additional rpm mimetype (used by packman) theli 2006-10-12 09:32:24 +00:00
  • 76d959122b new constants, finals, Stringbuffer, cleanup borg-0300 2006-10-11 22:23:48 +00:00
  • 581dd2ec72 *)Proper arrow-function on Network.html, but ordering is still broken. Perhaps someone could fix that? *)Removed double creation of DATA directory. New warning message in case of insufficient rights. *) Removed roland-ramthun.de-seedlist temporarily, because of server changes rramthun 2006-10-11 18:27:38 +00:00
  • 6396f5971e bugfixes and migration attempt toward new kelondroFlex db - more synchronization - bugfix for remove in collections - bugfix in kelondroFlex (wrong exception condition!) - options to use RAM, FLEX and TREE tables for Crawl URL stacker - default for Crawl URL stacker is now FLEX (!) orbiter 2006-10-11 00:46:45 +00:00
  • 48f81acc0e reverse SVN 2744, it is not needed (this resulted from a small misunderstanding of the newest cache layout) hermens 2006-10-10 22:02:23 +00:00
  • 1da9aece12 Repair DNS prefetch during cacheScan hermens 2006-10-10 21:34:27 +00:00
  • 97e24b63c7 release 0.48 orbiter 2006-10-10 20:24:23 +00:00
  • 918b59dc5e - bugfix for snippet profile (no delete button) - bugfix for search process (avoided null pointer exception in case other peer does not respond) orbiter 2006-10-10 20:16:20 +00:00
  • 2bb529cedb added peer tags for peers in robinson mode orbiter 2006-10-10 20:09:26 +00:00
  • f7447894f1 *) fixed link to WatchCrawler_p.html low012 2006-10-10 12:39:29 +00:00
  • afbb547f3d extended options for abstracts generation in remote search interface orbiter 2006-10-10 12:22:16 +00:00
  • 3730ec3440 moving to a _p page. allo 2006-10-10 10:31:21 +00:00
  • 22649408ad *) Better errorhandling for charset encoding problem during content parsing See: http://www.yacy-forum.de/viewtopic.php?t=2952 theli 2006-10-10 10:14:03 +00:00
  • 89ee215ff0 *) better detection of svn revision number in old xml format theli 2006-10-10 09:32:05 +00:00
  • a9c7e3f061 *) Bugfix for NoSuchElementException theli 2006-10-10 08:39:27 +00:00
  • f25f61d9d3 documentation of compile problem. See http://www.yacy-forum.de/viewtopic.php?p=26407#26407 orbiter 2006-10-09 23:11:03 +00:00
  • c8f3a7d363 added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles orbiter 2006-10-09 23:07:10 +00:00
  • 2cfd4633ac *) even better handling of searchwords in snippets, words can consist of letters and numbers now low012 2006-10-09 21:08:13 +00:00
  • b062847797 fix for http://www.yacy-forum.de/viewtopic.php?p=26439#26439 orbiter 2006-10-09 20:07:42 +00:00
  • e17fea7015 files in htcache are now stored in different hash/tree subdirectories according to storage method orbiter 2006-10-09 18:18:49 +00:00
  • 661f005214 fix for seed upload build script orbiter 2006-10-09 15:22:20 +00:00
  • 2d3b7251a4 *) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details) low012 2006-10-09 13:37:38 +00:00
  • ddf8f220f6 fix for build fail orbiter 2006-10-09 11:34:30 +00:00
  • 2e4aa6a170 refactoring of Advanced Config: - removed settings that are in Basic Settings - joined pages that belong together - moved include pages from yacy/ to / orbiter 2006-10-09 10:24:54 +00:00
  • 25ae3d3161 generalized definition of hexhash orbiter 2006-10-09 10:07:07 +00:00
  • 86047f439d removed very bad bug that prevented production of any remote search result :-((( Please update! orbiter 2006-10-09 04:04:00 +00:00
  • f0d747c723 removed deprecated method orbiter 2006-10-09 02:47:37 +00:00
  • 5ff77612ac bugfix for old WORDS storage method orbiter 2006-10-09 02:20:27 +00:00
  • 0f10bdde22 more generic cache methods orbiter 2006-10-09 02:13:13 +00:00
  • 72482b1426 fixed scraper orbiter 2006-10-09 01:24:01 +00:00
  • 6557112d8f small fix for plasmaURLPool.getURL() needed for new alternative htcache layout hermens 2006-10-08 17:32:01 +00:00
  • 440c6ee657 Implement alternative htcache layout mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205 hermens 2006-10-08 17:25:19 +00:00
  • 226f2c5b2c first version, of the Serverlet Debugger allo 2006-10-08 14:25:54 +00:00
  • adf1f74ab2 bugfix for java 1.5 compile problem with serverCharBuffer.append(char) orbiter 2006-10-08 10:35:35 +00:00
  • fd61209797 lines inside tags without punctuation are extended by a single dot. This enables the condenser to distinguish the lines in a better way. The result is a better preparation of snippets. orbiter 2006-10-08 01:24:00 +00:00
  • e25172853a fixed license notice allo 2006-10-07 22:25:05 +00:00
  • 1d0c0edda3 first version of posts/get from the del.icio.us api allo 2006-10-07 22:16:09 +00:00
  • 1969522dc1 removed lowercase of snippets (and other things): - added new sentence parser to condenser - sentence parsing can now handle charsets orbiter 2006-10-07 00:06:09 +00:00
  • 43614f1b36 bugfix in collection index. the index for collections was not created correctly The bugfix includes a migration function which starts automatically after startup of yacy. This applies only to you, if you are using the new collection index. orbiter 2006-10-05 23:47:08 +00:00
  • 07155ef3b0 *) added a few constraints to prevent exceptions when clicking on stop or pause on IndexCleaner_p.html when no thread is started low012 2006-10-05 21:32:07 +00:00
  • 1dfab1abe3 more control for seed receive orbiter 2006-10-04 08:55:01 +00:00
  • 1c0e65f55f *) Bugfix for problems with charset detection See: http://www.yacy-forum.de/viewtopic.php?p=26196 theli 2006-10-04 04:54:21 +00:00
  • db294687ea enhanced logging - more logging output - fix in log line preparation - added filter to log page - some small bugfixes orbiter 2006-10-03 22:55:59 +00:00
  • 08aa9d4c07 duplicate removes borg-0300 2006-10-03 19:55:28 +00:00
  • a9a0f51303 *) suppressing InterruptedException errormessage See: http://www.yacy-forum.de/viewtopic.php?t=2915 theli 2006-10-03 15:40:18 +00:00
  • ce7ee74316 *) better errorhandling in filehandler (try catch block now starts before argument parsing) theli 2006-10-03 14:21:46 +00:00
  • 1d4fb680ce *) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB TODO: make this limit configurable theli 2006-10-03 12:16:25 +00:00
  • 1586d57187 *) odtParser: better handling of large files theli 2006-10-03 12:00:26 +00:00
  • f17ce28b6d *) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file *) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested *) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) *) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO *) plasmaCondenser: adding new function getWords that can directly operate on input streams *) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers theli 2006-10-03 11:05:48 +00:00
  • 630a955674 read snippets from cache in case they are not provided in RAM orbiter 2006-10-02 17:18:24 +00:00
  • b114def2f8 duplicate classpath entry allo 2006-10-02 16:09:54 +00:00
  • 2ab09e71a7 removing absolute Classpaths allo 2006-10-02 16:07:52 +00:00
  • a723c2809d -t(aillog) option, to start monitoring the log after startup. So you see the log, but can stop viewing it with ctrl+c, without stopping yacy. allo 2006-10-02 16:00:04 +00:00
  • fda7031991 further cleanup allo 2006-10-02 15:52:20 +00:00
  • f0ed7f43c4 more sh (i.e. /bin/dash instead of /bin/bash as sh) compatibility allo 2006-10-02 15:29:52 +00:00
  • bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication orbiter 2006-10-02 13:35:38 +00:00
  • c40fca08a2 fixed bad handling of string separation you can now use a new encoding attribute to create strings from byte arrays orbiter 2006-10-02 10:21:14 +00:00
  • 5a40ea7866 refactoring of wget string list generation orbiter 2006-10-02 09:59:20 +00:00
  • dbc2e039bb added time-out option parameter to call hierarchy orbiter 2006-10-02 09:40:18 +00:00
  • b59d4576af increased version number to emphasise that the snippet fix _dramatically_ increased search speed orbiter 2006-10-02 01:50:57 +00:00
  • d4c239e4be - fixed problem in collection index with deletion of single url references - added automatic deletion of not-found snippets after search orbiter 2006-10-02 01:40:52 +00:00
  • 00746ca232 identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' orbiter 2006-10-02 01:15:02 +00:00
  • 4d9e1b43dd surftipps appearance update orbiter 2006-10-02 00:13:59 +00:00
  • b033a80750 better control of failure in node seek of kelondroTree orbiter 2006-10-02 00:13:19 +00:00
  • ca8ef0ca9f *)Documented the lng-file format *)Updated language files to the new standard, especially German *)Wrote language highlighting definition for Notepad++ *)Corrected News.html rramthun 2006-10-01 12:23:35 +00:00
  • 310f1c41cd added option to see ranking scores in surftipps and some cleanups orbiter 2006-09-30 23:28:03 +00:00
  • 7c0e6de366 bugfix for surftipps votes (wrong page) orbiter 2006-09-30 23:06:38 +00:00
  • 3ad0709b53 added a delete button to crawl profile list. orbiter 2006-09-30 22:35:59 +00:00
  • 971bfc6f15 added ChangeLog based on Rolands Newsletter. allo 2006-09-30 11:07:32 +00:00
  • a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls theli 2006-09-30 10:09:01 +00:00
  • cd5f349666 *) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory *) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array *) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions *) httpd.java: better error handling if the soaphander is not installed *) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents *) rtfParser.java: Bugfix for UTF-8 support *) tarParser.java: better handling of large documents *) zipParser.java: better handling of large documents *) plasmaCrawlEURL.java: new errorcode for encrypted documents *) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file theli 2006-09-30 09:31:53 +00:00
  • 8b2ceddb91 *) Displaying servere and warning logging messages in different colors on ViewLog_p.html theli 2006-09-30 08:12:22 +00:00
  • f8ac694e51 *) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998) low012 2006-09-30 00:27:42 +00:00
  • df1629b05a - code cleanup - version 0.471 - moved surftipps to own web page orbiter 2006-09-29 22:27:20 +00:00
  • b78d171b1e Windows installer allo 2006-09-29 21:13:56 +00:00
  • c665f6cddb *) handling of quotes in charset string theli 2006-09-28 06:29:15 +00:00
  • b73efd5565 *) missing changes needed because of last commit theli 2006-09-28 05:48:28 +00:00
  • 65c1f13d11 *) migration to newer odt parser lib theli 2006-09-28 04:47:39 +00:00
  • 140ddba93f *) adding soap functions to pause and resume the crawler theli 2006-09-27 05:22:43 +00:00
  • ed8227d222 *) Bugfix for NullpoinerException in IndexCreateIndexingQueue_p.java See: http://www.yacy-forum.de/viewtopic.php?p=25874 theli 2006-09-27 04:35:02 +00:00
  • c0f7a4124c *) Bugfix for soap templates theli 2006-09-27 04:24:32 +00:00
  • 2463e5624a 'quick' release 0.47 - documentation update - necessary bugfixes (missing css for new peers) - reduced effect of search result redundancy filter - removed some debug output, but not all orbiter 2006-09-26 23:41:54 +00:00
  • 3433dfb5e2 *) Bugfix for soap search template: correction for resultCount tags, cdata for snippet tag theli 2006-09-26 16:18:04 +00:00
  • d42dcead1d *) Bugfix renaming snippet tag in soap search template theli 2006-09-26 16:11:38 +00:00
  • 49fbb688df *) SOAP: old urlInfo renamed to urlInfoByHash, new urlInfo Function added. theli 2006-09-26 15:14:33 +00:00
  • 8f143d516b *) make snippet fetcher accessible via soap api theli 2006-09-26 15:07:16 +00:00