Commit Graph

  • 58b7417a59 - added a new 'easy' crawl start menu which can be used for the special case of loading a complete domain - the previous crawl start servet was renamed to CrawlStartExpert_p - easy crawl start is now default orbiter 2010-09-16 12:02:43 +00:00
  • 461a2a6ec7 enhanced remote crawling: - 300 ppm is default now (but this is switched off by default; if you switch it on you may want more traffic?) - better timing for busy queue - better amount of remote url retrieval - better time-out values - better tracking of availability of remote crawl urls - more logging for result of receipt sending orbiter 2010-09-16 09:34:17 +00:00
  • 670ba4d52b - removed the remote crawl option from the network configuration submenu and - added a remote crawl menu item to the index create menu. This menu also shows a list of peers that provide remote crawl urls - set remote crawl option by default to off. This option may be important but it also confuses first-time users orbiter 2010-09-16 00:39:05 +00:00
  • 89c2d8b81e better initial hash computation orbiter 2010-09-15 22:11:52 +00:00
  • 34e2f7f487 enhanced snippet fetch strategy: concurrent snippet fetch even for offline-snippet searches. This improves speed since it is now possible to fetch snippets offline and parsing of source files from the htcache can be enhanced using concurrency. This improves local and remote search. orbiter 2010-09-15 21:09:14 +00:00
  • 0cf006865e refactoring and enhanced concurrency orbiter 2010-09-15 11:38:03 +00:00
  • 83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before) - added use of LookAheadIterator which should prevent mistakes when coding iterators with embedded iterators - added a fail-safe reaction in case of database corruption using iterators over database elements (no interruption then) orbiter 2010-09-15 10:43:14 +00:00
  • f9a27a05e5 migrated to log4j 1.2.16 orbiter 2010-09-15 09:18:35 +00:00
  • 5c67e6ca49 migrated to latest apache commons fileupload 1.2.2 orbiter 2010-09-15 08:54:41 +00:00
  • 5702419194 fixed a bug in HTTPClient: keep-alive must be set to false, otherwise servers hold connections 2 seconds open until response. orbiter 2010-09-14 22:25:35 +00:00
  • 5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart - changed data structure for sorting in search which performs better in that specific case (too many updates) orbiter 2010-09-14 21:03:50 +00:00
  • ac1c08924e more performance hacks orbiter 2010-09-14 15:27:27 +00:00
  • 14c843d364 more performance hacks orbiter 2010-09-14 15:00:34 +00:00
  • 39f409a7bb performance hacks orbiter 2010-09-14 14:32:24 +00:00
  • 7ebef56add - redesign of a part of the remote search client to make it possible to have a test environment for remote search performance tests - added a remote search test main methods in yacyClient orbiter 2010-09-14 13:35:47 +00:00
  • 2e75879504 fix for latest commit orbiter 2010-09-14 13:01:18 +00:00
  • 6e4653cf50 remove DoS protection in remote search for intranet hosts orbiter 2010-09-14 12:38:05 +00:00
  • 3c0e07ba72 removed all delays in shutdown process orbiter 2010-09-14 09:13:28 +00:00
  • 906c572621 - enhanced index create menu structure - clear search log caches each time a search is done orbiter 2010-09-14 09:06:27 +00:00
  • fc924f024e import of oai sources from a list using a command line interface: if you have a list of oai servers you can import them all using the linux command: bin/importOAIList.sh <name-of-oai-list-file> orbiter 2010-09-13 10:13:34 +00:00
  • 64860dc1bb enhanced search event logging (to be used for further improvements) orbiter 2010-09-13 09:33:04 +00:00
  • 7dbc357593 patch to identify corrupted database files orbiter 2010-09-13 07:20:53 +00:00
  • 17eebd4ef8 counting crawler traffic again: fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2808 sixcooler 2010-09-11 15:58:15 +00:00
  • 547d5226ae fix banner reload parameters (were no html errors) adapted default colours lotus 2010-09-11 11:05:07 +00:00
  • d2a3d08c44 avoid div. by zero lotus 2010-09-11 10:58:33 +00:00
  • 2c7edea35e - better shutdown behavior for the GUI (waits until data is written if GUI is killed) - release 0.97 orbiter 2010-09-10 12:47:24 +00:00
  • 34a25856a5 - added navigation to next/prev search page using arrow keys (left/right) - better information text for YaCy GUI application orbiter 2010-09-10 10:42:01 +00:00
  • 32f73d1aaa added copy for Info.plist for Mac application release updates (this file contains class paths and start parameters) orbiter 2010-09-10 09:48:09 +00:00
  • 5ce679a053 focus search field on load, no click necessary anymore lotus 2010-09-10 08:45:26 +00:00
  • 013926f01c added 'francais' as language option for default configuration orbiter 2010-09-10 08:35:47 +00:00
  • 4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more) - fixed bug in WeakPriorityBlockingQueue - show favicon faster using pre-loading (same technique as used for fast image search) orbiter 2010-09-09 23:25:19 +00:00
  • 570ca577c6 performance hacks orbiter 2010-09-09 22:42:54 +00:00
  • 348dece62f redesign of the SortStack and SortStore classes: created a WeakPriorityBlockingQueue as special implementation of a PriorityBlockingQueue with a weak object binding. - better abstraction of ordering technique - fixed some bugs according to result numbering (distinguish different counters in Queue) - fixed a ordering bug in post-ranking (ordering was decreased instead of increased) - reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now). orbiter 2010-09-09 15:30:25 +00:00
  • 03eb021568 Fix for byte[] Objects as keys hermens 2010-09-09 14:41:20 +00:00
  • 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements - removed old parser - removed old importer framework (was only used by removed old parser) - added a new sitemap parser in parser framework - linked new parser with parser access in old sitemap processing routines orbiter 2010-09-08 14:13:15 +00:00
  • b73ea6581d fix json in case of query includes " lotus 2010-09-08 11:54:25 +00:00
  • c0b08ac59b slighlty changed way of pdf parser integration orbiter 2010-09-08 07:32:47 +00:00
  • 6d83c7cb62 removed unnecessary Override statements (produces errors in strict validation) orbiter 2010-09-08 07:15:41 +00:00
  • 6a09f1f7e5 fix dedicated upnp testing lotus 2010-09-07 18:17:23 +00:00
  • 848ef6937e licenses for pdfbox orbiter 2010-09-07 17:17:08 +00:00
  • 5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1 - added some clear statements that shall clear static cache size within the pdfbox library - the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on. orbiter 2010-09-07 17:13:47 +00:00
  • c757a4aa9f - corrected lifetime computation for search events - made search event cache cleanup concurrent because cleanup may cause index modifications orbiter 2010-09-06 16:05:19 +00:00
  • ec8f1c0446 adapted _debug.bat to regular starter script lotus 2010-09-06 13:36:57 +00:00
  • 5dff8f62c4 fix tray information display for non-windows lotus 2010-09-06 13:30:40 +00:00
  • 24502fe3de performance hacks orbiter 2010-09-06 12:59:33 +00:00
  • a6c2e9ef0c add file.encoding=UTF8 to debian init script f1ori 2010-09-06 12:53:03 +00:00
  • ffaa9a1c51 avoiding double-loading of the same resource from the web in case that a seond attempt to load the resource is started while the first attempt is still loading the content from the web. This will delay the second attempt to the time when the first attempt has finished with the possible result that the second attempt reads only from the web cache, not from the web. This will also enhance the process of image result display from SVN 7105 orbiter 2010-09-06 10:23:30 +00:00
  • fb828f3767 - performance enhancements in search response time using faster query ID computation and an ID cache - code cleanup orbiter 2010-09-06 10:00:07 +00:00
  • 0ab6a462ee - added a missing entry in YaCy interface robots.txt for bookmarks - changed default robots.txt deny list to include some more interface pages because the loading of such pages are a peer load issue for YaCy when crawlers come by and information on these pages are not useful for public search. orbiter 2010-09-06 09:58:54 +00:00
  • d865ef77a8 removed re-read of index in case of a bad index. This may not solve the problem but it applies a 100% CPU problem on the peer. I'm afraid bad index files must be abandoned, and cannot be fixed this way. orbiter 2010-09-06 09:55:04 +00:00
  • b2c9db48ea Performance enhancement - introduced byte[] - based ARC method for MapHeap which avoids a String generation each time the cache is accessed - bugfixing in required class ComparableARC orbiter 2010-09-06 09:53:33 +00:00
  • ae07e11bc5 enhanced image search result display: concurrent loading of images before they are displayed orbiter 2010-09-05 23:02:46 +00:00
  • 72a096fccb using -XX:-UseGCOverheadLimit when starting java. see also: http://forum.yacy-websuche.de/viewtopic.php?p=20709#p20709 orbiter 2010-09-05 14:19:02 +00:00
  • 22047ffad5 enhanced computation speed of many replaceAll string operations orbiter 2010-09-05 13:19:42 +00:00
  • e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation orbiter 2010-09-05 11:13:12 +00:00
  • 9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible. with this fix search results are (even) faster. Also enhanced: image search. This is now speeded up using a image search result look-ahead orbiter 2010-09-04 22:57:12 +00:00
  • fa2eb9676e removed unused class orbiter 2010-09-04 21:45:33 +00:00
  • 5f391fcfa9 *) cleaned up in wikiCode parser (more to be done) *) HTML fixes low012 2010-09-04 14:01:34 +00:00
  • b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by: - storing the startup properties when yacy is started - using the properties in the restart-script again. this transports also the DATA directory location as parameter of the -gui option that is used when the Mac version of YaCy is started orbiter 2010-09-03 23:08:43 +00:00
  • d4e4967e19 cleaned up code in yacyRelease (there will be work to do there) orbiter 2010-09-03 22:35:48 +00:00
  • 7be988768d simple selection of views in ViewFile.html (omit usage of button) orbiter 2010-09-03 22:35:07 +00:00
  • 2cb8cf5b41 added overall utf-8 default support for mac releases orbiter 2010-09-03 22:34:08 +00:00
  • 698730c4eb better utf-8 support everywhere using a default utf-8 encoding for all string conversions. todo: add this also for debian start script (how do I do that?) orbiter 2010-09-03 22:18:10 +00:00
  • ca0a03e9ea ... migrating to HttpComponents-Client-4.x ... ssl-stuff: accept almost everything sixcooler 2010-09-03 16:02:52 +00:00
  • d8f52c5b9c added a changelog url to download orbiter 2010-09-03 12:55:36 +00:00
  • 1da5241c2d do not block server session if maximum number of sessions is reached, just try to clean up once orbiter 2010-09-03 12:05:37 +00:00
  • 3988a95fb5 added ability in rss reader to parse atom feeds orbiter 2010-09-03 08:53:24 +00:00
  • 5de70c3d7c changed way of storage for search requests: - the search request cache can now get as large as 1000 entries - if more entries arrive, unused are deleted - the elements may stay in the cache up to 10 minutes and longer if they are used - the elements are deleted earlier that 10 minutes if the memory gets low This commit was mainly done for metager-feeding peers that have a query load of 50000 queries each day. Also added: - a monitor for cache hit/cache miss in PerformanceMemory_p.html (see at bottom of page) orbiter 2010-09-02 21:52:45 +00:00
  • 9d080f387e change in handling of the all-visible home path for storage in YaCy: the home path can now be distinguished between - data home; the path where the DATA directory is created - application home; everything else This will make it possible to store application data on Mac releases within the ~/Library/YaCy directory; a place where Mac applications write their data. Similar techniques will be possible for debian and windows. To use the new data path, YaCy can be started with -start <data path> or -gui <data path> orbiter 2010-09-02 19:24:22 +00:00
  • fa5683adfe create a mac dmg file (a disc image) for mac releases in ant orbiter 2010-09-02 19:11:49 +00:00
  • 875741bcff fix for http://forum.yacy-websuche.de/viewtopic.php?p=20657#p20657 orbiter 2010-09-02 10:05:04 +00:00
  • 091281c9f2 Mac app ant task building a ready-to-distribute zip file extending r7080 lotus 2010-09-02 08:01:01 +00:00
  • 65eaf30f77 redesign of crawl profiles data structure. target will be: - permanent storage of auto-dom statistics in profile - storage of profiles in WorkTable data structure not finished yet. No functional change yet. orbiter 2010-08-31 15:47:47 +00:00
  • 3f1d5a061f by default store crawled pages to HTCache to support verify=false snippet generation orbiter 2010-08-31 09:28:01 +00:00
  • 2009999162 show landing page after installation finished lotus 2010-08-30 20:04:19 +00:00
  • 938676265f fix shutdown command, close HttpClient connection pool f1ori 2010-08-30 17:48:20 +00:00
  • 55da979291 disable revision detection for git f1ori 2010-08-30 17:11:19 +00:00
  • 6d2e0f5fb4 always kill shutdown java instance, even if yacy succeeded, in future, the TERM-signal should be used, but currently not all threads are joined during shutdown f1ori 2010-08-29 23:26:03 +00:00
  • be0abd92cd always use kill command in initscript, after timeout elapsed and yacy didn't finished f1ori 2010-08-29 18:15:22 +00:00
  • 2a4ddc48bb adjustment for new java download method see http://forum.yacy-websuche.de/viewtopic.php?p=20616#p20616 lotus 2010-08-27 18:55:44 +00:00
  • e9160ea1e5 Mac ant task according to r7023 lotus 2010-08-27 18:40:32 +00:00
  • 93d2c22e60 adapted memory for first run to current standard values lotus 2010-08-27 18:38:02 +00:00
  • 104318d58a - added nice colors to feed indexing state messages - added a 'remove all' button for new and scheduled rss feed list - made adding of new rss feeds concurrent so interface is more responsible orbiter 2010-08-27 11:56:51 +00:00
  • 23ba107834 UPnP port forwarding default on now. This also displays a message on the entry settings page if not successful, so the user gets an extra hint to open his ports. lotus 2010-08-27 08:45:00 +00:00
  • d5ccbb99f9 the Windows installer now always requires admin level for installation (Vista/7) unfortunately some users seem to forget to manually install the downloaded Java runtime and therefore could not start YaCy - added concept to always distribute the latest Java version via external php script lotus 2010-08-26 16:53:20 +00:00
  • 4f22e2df41 bugfixes for - next-execution-time in scheduler - deletion of scheduled rss feed loading (now deletes also the scheduling entry) orbiter 2010-08-26 16:42:00 +00:00
  • 42414a6ae3 added two more tables in rss reader interface: - fresh recorded rss feeds (not yet loaded or in scheduler) - rss feeds in scheduler The first list has a button that can be used to place rss feeds into the scheduler The second list has a button to delete rss feeds from the scheduler orbiter 2010-08-26 16:01:45 +00:00
  • 0010cd9db1 Support for indexing of RSS feeds! - added a scanning in html parser for rss feeds - storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss - rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed - a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed - indexing of rss feeds can be placed in scheduler orbiter 2010-08-25 18:24:54 +00:00
  • 0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>> - refactoring of method names to comply with Map method names orbiter 2010-08-24 12:36:56 +00:00
  • cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as Map<byte[], Map<String, byte[]>> Because such Maps with byte[] keys cannot be stored in hash maps (bad hashing on byte[]) another ARC with comparable Maps has been added orbiter 2010-08-23 23:38:03 +00:00
  • c60d0282fd more abstraction for tables stored in heaps: the BEncodedHeap now implements Map<byte[], Map<String, byte[]>> This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface. orbiter 2010-08-23 21:27:58 +00:00
  • d1be64d491 removed wrong assert orbiter 2010-08-23 21:02:28 +00:00
  • 3197ca42ed preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora orbiter 2010-08-23 12:32:02 +00:00
  • 844f158686 - removed dependencies in header framework: moved http date methods from DateFormatter to HeaderFramework changed logging to log4j - added ftp load access to MultiProtocolURI - ensured termination of RSS feed iteration orbiter 2010-08-23 11:41:12 +00:00
  • 80ba543d4c svn fix for uppercase problem orbiter 2010-08-23 01:16:17 +00:00
  • 5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs orbiter 2010-08-23 01:08:56 +00:00
  • caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora) orbiter 2010-08-22 22:51:31 +00:00
  • 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come) orbiter 2010-08-22 22:32:39 +00:00
  • d0fb6bc2bc cleaned up superfluous classes after sixcoolers migration to HttpComponents-Client-4.x orbiter 2010-08-22 22:04:31 +00:00
  • dcd9065c84 next try to fix loading of network picture orbiter 2010-08-22 22:02:54 +00:00