Commit Graph

  • b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) *) better logging of parser failures *) simplified usage of plasmaparser through switchboard *) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) *) snippet-fetcher: more verbose error messages *) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams theli 2006-09-20 12:25:07 +00:00
  • aa38721cf6 new features for surftipps - new generation with less memory - removal of doubles - positive votes can generate entries without original news (so they can live on) - link deletion on search results are now also negative votes for surftipps (but they may rarely hit any news) orbiter 2006-09-20 12:01:51 +00:00
  • 64b2ef5aae *) Trying to bugfix shutdown problem theli 2006-09-20 10:13:23 +00:00
  • e03427871e enhanced surftipps: - added switchh to show or hide surftipps - more news contribute to surftipps - added voting system for surftipps orbiter 2006-09-20 07:17:41 +00:00
  • e745b63c77 *) Bugfix for different behavior of indexDistributeWhileCrawling to other checkboxes on IndexControl_p.html See: http://www.yacy-forum.de/viewtopic.php?t=2849 theli 2006-09-20 04:44:56 +00:00
  • 1dc12d6659 *) Bugfix for shutdown problem caused by cacheScan thread See: http://www.yacy-forum.de/viewtopic.php?p=25729 theli 2006-09-20 04:36:25 +00:00
  • 42173462f5 rename cutUrlText to shortenURLString; other little things; borg-0300 2006-09-19 20:47:45 +00:00
  • af1d89e381 check url == null added; borg-0300 2006-09-19 20:12:26 +00:00
  • cc667b0aa5 *) htmlFilterContentScraper.java: adding support for link tag theli 2006-09-19 16:13:13 +00:00
  • 16ba5d1b46 topwords: only [a-z] words, quality is better; blank removes; properties added; borg-0300 2006-09-19 10:44:45 +00:00
  • 66a58502df *) configure logging filehandler to use UTF-8 for logging messages theli 2006-09-19 05:39:50 +00:00
  • 26dfbb7499 *) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui theli 2006-09-19 05:19:41 +00:00
  • cf6acff2c2 *) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the default InputStream Buffer size. theli 2006-09-19 04:58:34 +00:00
  • f18304ddd3 unused/not needed imports removes; properties added; borg-0300 2006-09-18 22:21:18 +00:00
  • ec031eb993 first version of surftipps see http://localhost:8080/index.html orbiter 2006-09-18 20:14:21 +00:00
  • b174fbd0ca "import ...*" removed; properties added; borg-0300 2006-09-18 18:31:27 +00:00
  • 807756150e patch for strange bug reported by email orbiter 2006-09-18 16:50:31 +00:00
  • 5c6251bced *) some improvements for extended html document charset support - new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract the charset meta data. This is only enabled for the crawler at the moment. Integration into proxy needs more testing. - adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed about detected tags (used by the htmlFilterInputStream.java) theli 2006-09-18 15:36:04 +00:00
  • 33f0f703c0 *) reinserting type cast again theli 2006-09-18 13:21:12 +00:00
  • 8c11a543dc fixed line ending coding orbiter 2006-09-18 13:17:31 +00:00
  • b690597275 *) adding casts to avoid compatibility problems between java 1.4 and java 1.5 writer class usage theli 2006-09-18 12:17:51 +00:00
  • 5afb0cbce8 *) setting default charset (for unkown documents) to iso-8859-1 *) theli 2006-09-18 11:39:06 +00:00
  • f453c14b5d removed unreacheable catch blocks and unused imports orbiter 2006-09-18 11:23:58 +00:00
  • ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class theli 2006-09-18 11:04:16 +00:00
  • 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file theli 2006-09-18 10:12:11 +00:00
  • fc594e8eda *) adding httpContentLengthInputStream.java class to allow reading of http response bodies until EOF even if a persistent connection is used *) httpdByteCountInputStream.java: adding skip method *) httpHeader.java: adding getCharacterEncoding function theli 2006-09-18 10:00:28 +00:00
  • cd636eb00e *) Fix for the fix... low012 2006-09-18 01:24:26 +00:00
  • f9a5b55a9e *) Fixed bug described in http://www.yacy-forum.de/viewtopic.php?p=25448#25448 low012 2006-09-18 01:19:54 +00:00
  • 3aac5b26da - added automatic tag generation when a web page from the search results is added - added new image 'B' in front of search results for bookmark generation - added news generation when a public bookmark is added - the '+' in front of search results has new meaning: positive rating for that result - added news generation when a '+' is hit orbiter 2006-09-18 00:37:02 +00:00
  • 8a30c5343d *) Fixed bug where exclamation marks could get lost between [=...=] and <pre>...</pre> low012 2006-09-17 23:42:36 +00:00
  • d8f4b17e31 *) Hopefully fixed bug described in http://www.yacy-forum.de/viewtopic.php?t=2825. low012 2006-09-17 22:57:10 +00:00
  • 2d9496577f Removed double labels for forms in Blacklist_p.html michitux 2006-09-17 08:07:21 +00:00
  • aa46269eff Less margin/padding for dls (e.g. in Messages) michitux 2006-09-17 07:42:07 +00:00
  • 567c40f5f0 Bookmark/delete-links now visible when mouse is over the searchresult, in standard-compliant browsers with css, in Microsoft Internet Explorer via JavaScript michitux 2006-09-16 16:56:22 +00:00
  • 0e84a969d6 *) Bugfix for serverCharBuffer read from file operation theli 2006-09-16 13:11:32 +00:00
  • 90ef19d778 *) first version of a serverCharBuffer theli 2006-09-16 12:56:03 +00:00
  • d374ef2bbe bugfix for tryRemoveURLs orbiter 2006-09-16 00:34:34 +00:00
  • f644a1c3a7 better evaluation of index abstracts orbiter 2006-09-16 00:07:09 +00:00
  • 1b48473bc5 bugfix to utf8 recognition orbiter 2006-09-15 23:55:06 +00:00
  • 90f7241b59 serverByteBuffer.trim() can now recognize utf-8 characters orbiter 2006-09-15 23:52:26 +00:00
  • 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611 allo 2006-09-15 17:48:41 +00:00
  • 20e1754379 Various fixes for the languages rramthun 2006-09-15 16:02:37 +00:00
  • e34d9b3fec *) charset aware headlines (after the serverByteBuffer.trim problem is solved) theli 2006-09-15 15:07:35 +00:00
  • 8115ac47b5 *) charset aware metadata parsing theli 2006-09-15 15:01:25 +00:00
  • 3ac30bdf22 *) some todo markers added for additional charset support theli 2006-09-15 14:49:43 +00:00
  • d54144a4e3 fixed bad snippet behavior (hopefully) orbiter 2006-09-15 14:17:18 +00:00
  • 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title theli 2006-09-15 14:05:28 +00:00
  • 5015e780c2 - simplified watchCrawler code - changed display of watchCrawler slightly orbiter 2006-09-15 13:54:10 +00:00
  • 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser) *) htmlFilterContentScraper.java: using proper charset for document title *) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding theli 2006-09-15 13:18:12 +00:00
  • c5d3020941 *) better errorhandling for last commit theli 2006-09-15 12:56:01 +00:00
  • d0a5a53789 *) changes needed for multi-language support - parsers may need to know the charset of the byte stream theli 2006-09-15 12:52:46 +00:00
  • 31d6cdea53 WatchCrawler.html now valid xhtml, added the class TableCellActive to default skin, please update your skins (sorry, I removed it before because I hadn't seen it in any html-file) michitux 2006-09-15 11:50:25 +00:00
  • d82875c72b removed removal of 'funny symbols' that may have caused utf-8 problems orbiter 2006-09-15 09:08:15 +00:00
  • 26ab1fa885 fixed null pointer exception See http://www.yacy-forum.de/viewtopic.php?p=25598#25598 orbiter 2006-09-15 08:50:16 +00:00
  • 9bed90f8dc bugfix in js allo 2006-09-15 06:33:22 +00:00
  • b0e8ff6eda *) some TODO makers for UTF-8 problem theli 2006-09-15 05:31:30 +00:00
  • b5904705ab *) Bugfix for "determineRevisionNr: build.xml:98: SVN entries file does not exist" bug See: http://www.yacy-forum.de/viewtopic.php?t=2824 theli 2006-09-15 04:38:36 +00:00
  • c42b011648 added watch crawler to menu orbiter 2006-09-15 01:09:34 +00:00
  • 41e27b85b7 fix for crawler condition orbiter 2006-09-15 00:38:45 +00:00
  • 92157febcd Bugfix for Blacklist_p.html: Adding of new patterns possible again michitux 2006-09-14 15:20:32 +00:00
  • 0ee7e45413 bugfix for merge method (caused by bad refactoring) see http://www.yacy-forum.de/viewtopic.php?p=25529#25529 orbiter 2006-09-14 10:30:25 +00:00
  • 40965e183e bugfix for minimizeurldb and urldbcleanup see http://www.yacy-forum.de/viewtopic.php?p=25539#25539 orbiter 2006-09-14 10:12:41 +00:00
  • 5c2f30eaca adjustments to dhtInCache write orbiter 2006-09-14 09:28:17 +00:00
  • 9ecf7f0da2 *) some TODO makers for UTF-8 problem theli 2006-09-14 05:37:46 +00:00
  • e2f8339827 *) some bugfixes for UTF-8 related problems theli 2006-09-14 05:16:36 +00:00
  • f4af607b79 *) just some typos low012 2006-09-14 01:11:49 +00:00
  • e03740c306 small fix for last commit orbiter 2006-09-14 00:57:41 +00:00
  • c89d8142bb replaced old 'kCache' by a full-controlled cache there are now two full-controlled caches for incoming indexes: - dhtIn - dhtOut during indexing, all indexes that shall not be transported to remote peers because they belong to the own peer are stored to dhtIn. It is furthermore ensured that received indexes are not again transmitted to other peers directly. They may, however be transmitted later if the network grows. orbiter 2006-09-14 00:51:02 +00:00
  • 6e2907135a bugfixes for remote search server part orbiter 2006-09-13 22:19:34 +00:00
  • 2c6f2a1f74 First language fixes for new XHTML-layout rramthun 2006-09-13 20:03:15 +00:00
  • cf9884e22b first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. orbiter 2006-09-13 17:13:28 +00:00
  • 2a06ce5538 *) next bugfix for UTF-8 - Sending UFT-8 messages to other peers did not work - httpd.java: minor corrections for UTF-8 theli 2006-09-13 15:47:56 +00:00
  • bdc51591ae *) UTF-8 Bug solved (hopefully) See: http://www.yacy-forum.de/viewtopic.php?p=25522 theli 2006-09-13 14:48:58 +00:00
  • 13d0cff257 right dhtml. allo 2006-09-13 14:02:34 +00:00
  • ef751b9d33 *) removing all string operations from the template engine - engine should fully operate on bytes now theli 2006-09-13 13:56:10 +00:00
  • 7ef80c1026 more debugging orbiter 2006-09-13 13:52:46 +00:00
  • dfc0ac1958 syntax error fixes orbiter 2006-09-13 12:02:02 +00:00
  • 6e03f61daa fix for highlighting searched words in snippets allo 2006-09-13 11:26:26 +00:00
  • b251076e64 avoid ConcurrentModificationException orbiter 2006-09-13 10:36:18 +00:00
  • 99699a4d70 *) bugfix for new svn revision number ant task theli 2006-09-13 10:19:22 +00:00
  • da9f67a56d *) bugfix needed because of new svn version 1.4 (new .svn/entries file format) - lib/svnRevNr.jar: customized ant task to read the revision number out of the .svn/entries file - build.xml: calling new ant task theli 2006-09-13 09:59:32 +00:00
  • 3bbe6a77da Smaller font-size for tables in PerformanceQueues_p.html and PerformanceMemory_p.html michitux 2006-09-12 21:41:16 +00:00
  • d6204fd956 Forms in Blacklist_p.html splitted as suggested by KoH to avoid wrong submits when pressing [enter] michitux 2006-09-12 21:34:02 +00:00
  • b573f5b4c2 New layout in Blacklist_p.html, more padding and margin for fieldsets michitux 2006-09-12 20:31:42 +00:00
  • 48d8da44d1 Design changes: less margin in fieldsets, search form in yacysearch.html not centered, smaller rows in IndexMonitor.html michitux 2006-09-12 16:15:02 +00:00
  • 140c3e1db9 Some bugfixes: updated ids for labels in DetailedSearch.html, fixed a template-bug in Network.html and added a workaround for a bug in the template engine in IndexTransfer_p.html michitux 2006-09-12 14:16:36 +00:00
  • 75b198bc02 - updated references to indexContainer - more bugfixes and debugging for indexAbstract processing orbiter 2006-09-12 11:13:27 +00:00
  • 0bed3b9ac3 removed superfluous interface orbiter 2006-09-12 11:09:51 +00:00
  • b7e7808ea6 wordmigration now works also for new index database if the new database is switched on, no 'too big' messages appear, all the WORDS files can be completely migrated orbiter 2006-09-12 08:23:47 +00:00
  • a0ddf2ec11 *) AbstractCrawlWorker.java: delete already downloaded data on crawling error *) plasmaSwitchboard.java: log unexpected errors while parsing/indexing theli 2006-09-12 04:50:12 +00:00
  • 4f9e42d5ed more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive orbiter 2006-09-12 00:42:42 +00:00
  • 8219ce6c67 bugfix inn DetailedSearch form names orbiter 2006-09-11 23:07:28 +00:00
  • 462c64a935 removed superfluous file orbiter 2006-09-11 22:13:41 +00:00
  • 1137605edf - small change to DetailedSearch layout - version 0.463 for new xhtml interface orbiter 2006-09-11 22:11:05 +00:00
  • 31393312d0 New XHTML-template for a large part of the frontend, for details see http://yacy-websuche.de/wiki/index.php/Dev:XHTML If you don't use the default skin, the style will be broken or at least not complete. YaCy now has two css-files: base.css in htroot/env and the skin. In base.css the layout and black/white text-formating-rules are defined. Colors are only defined in the skin. The skin is now very easy to read and to change. If you want to make more changes than the colors you see in the default-skin, feel free to use the full power of css, but you are warned: The code is still not ready and may change, but we try to avoid changes which affect anything in the default-style. Translation will be broken too because the language-files contain HTML-Code which has changed. michitux 2006-09-11 18:18:12 +00:00
  • 005400a137 *) reverted last commit auron_x 2006-09-11 14:41:06 +00:00
  • a7281a9b4d fix for last commit orbiter 2006-09-11 11:12:42 +00:00
  • 82a6054275 - fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches orbiter 2006-09-11 10:39:25 +00:00
  • fded1f4a5d *) better handling of maximum file size limit in crawler theli 2006-09-11 08:26:39 +00:00
  • 416b4e5c6b ups orbiter 2006-09-11 08:17:55 +00:00