Commit Graph

  • 0702dd2507 added a profiling script to analyse search process timing orbiter 2008-04-24 13:28:18 +00:00
  • d0e2830e01 enhanced the thread dump to make it usable for scripted remote-debugging orbiter 2008-04-24 13:25:38 +00:00
  • e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search the difference is, that a local search will no not cause a re-indexing of loaded pages orbiter 2008-04-24 08:42:08 +00:00
  • 2c0c8f0f0c SRU compliance according to http://www.loc.gov/standards/sru/specs/search-retrieve.html The example given on this page can be used to retrieve opensearch-compatible rss pages with YaCy orbiter 2008-04-23 16:16:41 +00:00
  • 9b03310f8a bin jetzt wach :/ danielr 2008-04-23 07:50:21 +00:00
  • 7bd8601f04 delete old releases compatible with java 1.5 ;) danielr 2008-04-23 07:22:20 +00:00
  • e90282da1c added experimental javascript terminal for rss feeds (not used yet anywhere yet, expect the worst) .. possibly to be used as content for iframes within monitoring pages not ready yet! orbiter 2008-04-22 23:09:24 +00:00
  • da386a1924 fixed deleteOldDownloads if there are no downloads danielr 2008-04-22 21:36:52 +00:00
  • 21418a22a3 removed DEBUG output danielr 2008-04-22 17:14:34 +00:00
  • 79a3edeeef deleting downloaded releases after x days (default 30) danielr 2008-04-22 16:53:53 +00:00
  • 763f9d4f5d serverCore: setting timeout for new connection before SSLDetect danielr 2008-04-22 09:03:16 +00:00
  • 1995faef8d - refactoring of Colage back-end: move to plasma package - renamed also the plasmaCrawlResults to have a consistent naming for url and image queues - added a double-check for the images - added additional queues for the images: all worse-quality images go there, so the queue can be used also if no sizes are given; no image is lost - added a cleanup for the stacks so they cannot flood the memory orbiter 2008-04-21 22:42:49 +00:00
  • d7e89c2aca fixed near-deadlock situation when deleting crawl profiles orbiter 2008-04-20 22:10:26 +00:00
  • 5e3ce46339 - better logging when rejecting a url because it is not in declared domain - more XSS attack protection orbiter 2008-04-20 21:36:25 +00:00
  • 6d1be66822 - longer refresh rate for reload of WatchCrawler page forwarding to indexing start (does not work in IE) - better names for search pages - Release 0.58 orbiter 2008-04-20 08:10:52 +00:00
  • 2149728227 - major rework on YaCy-UI - search results are retrieved from rss/xml, no other servlet needed - added double accordion sidebar menus apfelmaennchen 2008-04-19 11:31:41 +00:00
  • c270d02176 Reverting SVN 4716 orbiter 2008-04-19 09:58:36 +00:00
  • 48ffd61e6a changed "patched wrong" to warning, so it goes to the logfile danielr 2008-04-19 07:54:44 +00:00
  • 2f629d20a7 - tried to fix the '4217666-problem' - removed more unused code orbiter 2008-04-19 04:24:29 +00:00
  • 512f48e7d6 - removed unused methods - fixed xss attack on peer list in CrawlStartSimple orbiter 2008-04-19 03:33:07 +00:00
  • 14384e7a45 deactivated unnecessary and very CPU-intensive deletion check for blacklisted URLs in index receive orbiter 2008-04-19 03:02:44 +00:00
  • 701f769c66 * removed comma, which caused invalid xml f1ori 2008-04-18 15:07:36 +00:00
  • 3c76342619 - added servlet to configure the search page greeting line - added information output about the current network definition in the network servlet - better description and usage of profile entries in User Profile servlet regarding FOAF format - reformatting of menues at status page orbiter 2008-04-18 13:58:56 +00:00
  • b9602e891a * added CrawlProfileEditor_p.xml for monitoring in yacybar f1ori 2008-04-18 09:13:02 +00:00
  • d03940f2ec - included patch from http://forum.yacy-websuche.de/viewtopic.php?p=7193#p7193 - fixed problem with crawl profile editor after deletion of a crawl profile orbiter 2008-04-17 22:21:03 +00:00
  • d1ee231866 HTTPC close more unused connections danielr 2008-04-15 16:37:51 +00:00
  • 181796cffb - HTTPC ConnectionInfo entfernen bei Exceptions, unnötigen Code entfernt - FTPC (GET-)connections bei Fehlern auf jeden Fall schliessen danielr 2008-04-15 15:27:32 +00:00
  • 04c1226c80 added/fixed missing integrity-test else-case during deploy in case that we update with a tar file orbiter 2008-04-15 15:20:35 +00:00
  • 6155f0e634 last small changes until main release orbiter 2008-04-14 07:26:33 +00:00
  • 45ae3da7e7 another patch to prevent NPE in EcoTable orbiter 2008-04-14 05:33:32 +00:00
  • cb93ded5c6 applied configuration path patch orbiter 2008-04-14 04:10:51 +00:00
  • 96e39b297a reduced StackTraces (by connect timed out) danielr 2008-04-14 03:50:49 +00:00
  • 93376acdca fixed a bad chunkcache limit check which could have caused ArrayIndexOutOfBoundsExceptions orbiter 2008-04-14 03:49:02 +00:00
  • 1cab240198 patch for possible NPE in EcoTable iterator orbiter 2008-04-14 03:20:37 +00:00
  • 9a32a4c328 fixed concurrentModificationException during hello-process orbiter 2008-04-14 03:04:28 +00:00
  • 64c33e717f catched ConcurrentModificationException in ConnectionInfo.cleanUp so cleanUp is not interrupted danielr 2008-04-14 03:02:44 +00:00
  • 70826bb501 -small update for de.lng daburna 2008-04-14 01:14:44 +00:00
  • d8677ba611 fixed ConcurrentModificationException in HttpConnectionInfos danielr 2008-04-13 11:25:41 +00:00
  • c7021c14bb patch for ArrayIndexOutOfBoundsException in BMP parser (may occur in case of malformed BMPs) orbiter 2008-04-13 03:28:26 +00:00
  • 8dd35f74c8 fixed redirect problem (does not work for POST) see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1068&hilit= orbiter 2008-04-12 16:35:09 +00:00
  • 8313d58ae7 - integrated the collage into the Web Visualization menu - added a counter for the public and private queue on the page (testing..) - fixed wrong public/private categorization orbiter 2008-04-12 15:45:57 +00:00
  • c5d1d7faca undo wrong commited files danielr 2008-04-12 15:22:57 +00:00
  • 2617f4dcdb Connections_p.html: better formatting and remove very old entries danielr 2008-04-12 15:19:18 +00:00
  • 82bf9ac1c8 - added Collage servlet from datengrab and modified it: * all images are queued * private/public is respected * inserted into switchboard * added collageQueue class that stores all the queued images orbiter 2008-04-12 13:24:21 +00:00
  • 959f448e5f - disabled redirects in proxy (so client sees real path) - added connection stats (only connections currently in use) - remove "old" connections (closed or idle for some time) - synchronized shared parts of proxyHandler danielr 2008-04-12 11:39:48 +00:00
  • 8fe39ebd74 -fixed file transmission with POST. The only usage was in ranking transmission, therefore: -fixed ranking transmission orbiter 2008-04-12 08:12:51 +00:00
  • 82a9861779 fix for last commit orbiter 2008-04-11 12:55:43 +00:00
  • 5d1fbb25e7 fix for bad deploy: - the name of downloaded release files is adopted if the httpc delivers uncompressed tar.gz files (the .gz is removed from the file name) - the deploy method is able to handle tar-file (not tar.gz-files) orbiter 2008-04-11 12:37:17 +00:00
  • 202a3adb3e refactoring of HttpClient Writer processes orbiter 2008-04-10 22:47:05 +00:00
  • 8aa9fd8f24 HTTPC with only 1 retry danielr 2008-04-10 16:47:57 +00:00
  • 444dce7e81 more performance hacks orbiter 2008-04-10 15:28:58 +00:00
  • 2c2dcd12a2 - enhanced performance of Eco-Tables: less time-consuming size() - operations - will increase speed of indexing and collection.index creation orbiter 2008-04-10 13:24:55 +00:00
  • e356625b22 - refacotring of stream copy handling to support time-consuming operations - made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer) - introduced another timeout setting (java internal property) - more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake) orbiter 2008-04-10 09:53:07 +00:00
  • f01c50cf8d Proxy logging error (first step to resolution!?) danielr 2008-04-10 06:56:06 +00:00
  • c3342e1178 - removed class with only one static method - removed connection method with too long time-out orbiter 2008-04-09 23:35:20 +00:00
  • f97971b63b fixed NPE problems doing a shutdown from command-line orbiter 2008-04-09 22:59:17 +00:00
  • 7a35126e91 http timeouts von alten httpc wieder gesetzt danielr 2008-04-09 11:02:14 +00:00
  • 2c1c3bb6eb - some refactoring (sorry Daniel, hab in deinem Code rumgewütet) - fixed broken downloads (flush was missing) - different problem handling when download is corrupted - different default values in yacy.init orbiter 2008-04-08 21:36:33 +00:00
  • d96e2badc7 - fixed POST in proxy - prepared http connection tracking - refactoring (mainly moving StreamTools to serverFileUtils) danielr 2008-04-08 21:17:40 +00:00
  • 14404d31a8 - enhanced performance graph (more info) - added conditions for rarely used logging lines to prevent unnecessary CPU usage for non-printed info orbiter 2008-04-08 14:44:39 +00:00
  • 696b8ee3f5 fix for http://forum.yacy-websuche.de/viewtopic.php?p=6806#p6806 - removed all InputStream.available() because this does not work for files > 2GB - iterator terminate when a IOException occurs - added handling of non-executing index.add methods to enhance assert usage - added index for file indexes > 2GB, to be used in new indexHeap orbiter 2008-04-08 11:55:59 +00:00
  • 94d3d3a86f fixed Proxy (for GET, POST still does not work!) danielr 2008-04-08 09:34:20 +00:00
  • 081ed1d3ec HTTPLoader: reduced stackTraces danielr 2008-04-07 16:56:15 +00:00
  • 8b2efb6f8c fixed garbage in HTCACHE danielr 2008-04-07 16:46:45 +00:00
  • 225f9fd429 various fixes - shutdown behavior (killing of client sessions) - EcoFS reading better - another synchronization in balancer.size() orbiter 2008-04-07 13:12:58 +00:00
  • 6e36c156e8 added more logging to EcoFS orbiter 2008-04-07 09:52:25 +00:00
  • fb541f9162 HTTPC: default timeout half-hour danielr 2008-04-07 09:48:49 +00:00
  • a94f6cdca4 HTTPC: allowed self-signed certs danielr 2008-04-07 09:21:43 +00:00
  • ab330cfdca Network.html: removed ; from location danielr 2008-04-07 08:13:38 +00:00
  • 319144f4b2 fix for outofbounds-excception in EcoFS chunk iterator orbiter 2008-04-06 22:28:17 +00:00
  • 41e9c5723c try to fix shown location (instead of 'Europe/de) JakartaHttpClient/3.') danielr 2008-04-06 22:17:09 +00:00
  • ac8592a102 eclipse build path update orbiter 2008-04-06 20:35:05 +00:00
  • a9cf6cf2f4 generalization of index container-heap class. orbiter 2008-04-06 20:31:16 +00:00
  • f099061944 protection against bad dht-flush word selection orbiter 2008-04-06 20:25:05 +00:00
  • 5e4fddc1e6 more logging for new EcoFS.ChunkIterator to find bug for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1024&hilit=&p=6806#p6806 orbiter 2008-04-06 18:47:49 +00:00
  • 4894df71ab *) moved commons-httpclient from libx to lib (seems to work on my system, I hope one everybody else's too) low012 2008-04-06 18:24:38 +00:00
  • 117ae78001 speed enhancement for reading of eco-table indexes orbiter 2008-04-06 11:50:15 +00:00
  • e96ecd269f *) hopefully fixed build script (included commons-httpclient) low012 2008-04-06 09:49:45 +00:00
  • 7c149a4ee8 - undo less 'binary data found' - removed duplicate stackTrace danielr 2008-04-05 17:46:11 +00:00
  • 96cce8bed9 reduced 'Binary data found' errors danielr 2008-04-05 14:20:01 +00:00
  • 2aef1414f5 removed test (in yacy.init) danielr 2008-04-05 13:49:25 +00:00
  • 5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;) danielr 2008-04-05 13:17:16 +00:00
  • daa04f5db9 added additional check in file handler to prevent that url attacks are hidden in url path encodings orbiter 2008-04-04 12:15:27 +00:00
  • 783a4c9edb strong speed enhancements for the index cache dump and restore: storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and 3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB). orbiter 2008-04-02 13:18:23 +00:00
  • 442204a1c8 fix for concurrentModificationException orbiter 2008-04-01 21:21:37 +00:00
  • d2f4926951 - more logging for balancer to get a hint where the problem is - fix for new concurrency method in kelondroSplitTable orbiter 2008-03-31 18:45:27 +00:00
  • 20dadba426 - added a deadlock prevention function in cache flushing - removed unused methods in collection index orbiter 2008-03-31 17:51:51 +00:00
  • 764a40e37d speed enhancements for crawler and url retrieval (affects also search speed) - concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed - because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed - search speed also profits from LURL-lookup enhancement - changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing - removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests orbiter 2008-03-31 15:41:19 +00:00
  • 3ce3a4a3a1 added stub for new index container heap data structure (purpose: index folding) orbiter 2008-03-30 22:58:42 +00:00
  • 2c34038912 addition/correction to last commit: usage of concurrent-classes orbiter 2008-03-30 21:17:12 +00:00
  • b2150057d2 removed unnecessary cleanup method orbiter 2008-03-30 20:32:08 +00:00
  • 76eac114ed * define global javascript-variable with var to get rid of warnings f1ori 2008-03-30 19:51:19 +00:00
  • b63cf2fc1c *) added button to Crawl Profile Editor to delete all terminated crawl jobs (only visible if there are terminated crawl jobs) low012 2008-03-30 15:15:56 +00:00
  • 2aed6bb3f7 * return valid xml in xml-bookmarks f1ori 2008-03-30 14:58:29 +00:00
  • c4c0d54b22 * added regex extended blacklistengine * removed my own engines lulabad 2008-03-30 08:50:09 +00:00
  • 368593e449 enhanced the concurrency handling of indexing process (better queue size control, better data concept, better shutdown behavior) orbiter 2008-03-30 00:03:44 +00:00
  • 4c3f1b67ad *) refactoring of Blacklist_p.java (moving entries might be slightly slower, but the code is more tidy now) *) added edit functionality for blacklist entries low012 2008-03-29 20:39:46 +00:00
  • 466d49e90c * added login-parameter to be able to force authentication f1ori 2008-03-29 11:10:04 +00:00
  • be58135b3e possible fix for deadlock in search execution orbiter 2008-03-29 07:50:37 +00:00
  • c67350f138 * use putXML with forXML-parameter to ensure urls are valid xml (problem was & in url) f1ori 2008-03-28 22:50:33 +00:00