Commit Graph

  • 25f6145934 *) preventing null pointer exception in case empty search word or only one character is enterd or all search words are removed by filters low012 2009-09-05 20:31:39 +00:00
  • 248f3fd9b5 *) cleaned up code for better readability *) added a few copyright notices *) removed redundancy in constructors of ListToken low012 2009-09-05 20:04:23 +00:00
  • eaddf2d464 - corrected layout of map preview - added caption to maps containing latitude and longitude information - prevented that maps occur on second search page - added location names to did-you-mean - some refactoring of did-you-mean - added equal and compareTo test to Coordinates class to make that work in set - fixed utf-8 support for library files - fixed a bug in images search icon view caption orbiter 2009-09-04 23:33:47 +00:00
  • 4b83875abd Small fixes for the heapCacheIterator in ReferenceContainerCache: - Start the iteration at startWordHash - When used with rotation, let the iteration stop when the cache is empty hermens 2009-09-04 23:27:14 +00:00
  • fd668f531b fixed map layout orbiter 2009-09-04 19:20:27 +00:00
  • 2740d9dd79 added integration of osm maps for search orbiter 2009-09-04 14:32:36 +00:00
  • af3a696fc4 added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast. orbiter 2009-09-03 23:09:53 +00:00
  • 847c3027ff added - a time-out of 10 seconds - and a clear-on-error in yacyui-portalsearch, to make the loading bar disappear when nothing is found. orbiter 2009-09-03 22:13:04 +00:00
  • ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3. orbiter 2009-09-03 20:54:47 +00:00
  • 44579fa06d - fixed a problem loading images through yacy's document loader, this denied non-parseable documents which excluded all images - fixed url of osm tile server orbiter 2009-09-03 11:46:08 +00:00
  • 67eddaec4b changed way to integrate dictionary files: the must be downloaded manually by the user and placed in DATA/DICTIONARIES/source for each externally imported dictionary file there will be a translator that converts the input file once into a YaCy-internat data format. Files that will be provided together with yacy releases may still be placed in <root>/dictionaries orbiter 2009-09-02 18:42:13 +00:00
  • d656a94f55 fix for bad paths in dictionary processing orbiter 2009-09-02 18:24:41 +00:00
  • 3b9aaf9e9f - inserted new library tests inside DidYouMean - some redesign of DidYouMean that was necessary to follow a special rule how a library should be used: - the library provides words that start or end with a test word which may be possibly also an empty set of words - all words that the DidYouMean produced with the four production rules are used to generate a set of library-completed words - if this process results in any words from the library, only library-genrated words are taken - if the is no library-generated word at all, take the artifial generated word - all words that result from these rules are tested against the index - the result is ordered using a lightweight comparator that prefers short words - a not-so-much-io test against the index is beeing prepared next - insered the library initialization into the switchboard orbiter 2009-09-02 13:41:56 +00:00
  • 8c35ffe34c fixes to the dymlib orbiter 2009-09-01 15:05:28 +00:00
  • bfa273bcc1 added a library provider which holds libraries in static objects, which can be used by any other classes to support their functions. libraries are designed in such a way that users can create and insert their own library files, but can also be imported from other sources. As an example the "Korpusbasierte Wortgrundformliste DeReWo des Institut für Deutsche Sprache" from http://www.ids-mannheim.de has been integrated. This dictionary is licensed to be used for all non-profit purposes. In case that YaCy is used for commercial uses, this library must be removed. The new library provilder reads the original source and translates it into a simple word list to be used for the did-you-mean library provider. More libraries may be provided in the future using a download-servlet which puts files from the internet into the <application-root>/dictionaries/ path. orbiter 2009-09-01 13:54:43 +00:00
  • 1762a7bcd6 - moved DidYouMean to the data package - added a DidYouMeanLibrary class that shall support the did you mean function with additional word lists orbiter 2009-09-01 13:04:35 +00:00
  • bf8ed00e9e removed debugging code orbiter 2009-08-30 11:03:34 +00:00
  • ead48c4b25 fix for preparation of search result pages with offset > 10: - less pages are fetched in advance - just-in-time fetch of next required pages - fix for missing hand-over of offset to fetch threads orbiter 2009-08-30 10:28:23 +00:00
  • 39a311d608 better care to do not loose the merge/dump thread orbiter 2009-08-29 23:35:03 +00:00
  • 10d3e856b5 better concurrency, less blocking & performance hacks orbiter 2009-08-29 23:34:14 +00:00
  • 1a9cfd8718 some performance hacks (CPU only, not IO) this will cause better computation speed for single- and multi-core; there are enhancements that will speed up old and slow machines as well as multi-core CPUs. Indexing of surrogates has been speed up from 4000 PPM to over 20000 PPM on a simple dual core office computer. Since the enhancements are mostly in core routines, the hack should also speed up search performance. orbiter 2009-08-28 13:28:11 +00:00
  • 92407009b2 cleanup orbiter 2009-08-27 23:20:59 +00:00
  • 0ba1beaf56 separated rwi constraint evaluation from rwi ranking and added concurrency orbiter 2009-08-27 22:54:32 +00:00
  • ce7924d712 better concurrency for rwi entry parsing during search processing orbiter 2009-08-27 22:06:52 +00:00
  • b0637600d5 enhanced url constraint computation: better position of constraint check during retrieval process orbiter 2009-08-27 20:20:07 +00:00
  • 3ebb228ea1 added smaller icon for widget orbiter 2009-08-27 19:44:39 +00:00
  • 61748285c3 more refactoring of search orbiter 2009-08-27 15:19:48 +00:00
  • 323a8e733d removed unused classes orbiter 2009-08-27 14:42:05 +00:00
  • 72e5407115 refactoring of snippet cache orbiter 2009-08-27 14:34:41 +00:00
  • 0e471ba33b - fixed a bug in fast digest computation - added a open-on-demand hack to heap files: when a heap file is opened the first time, it is first scanned to get a key index and then it is closed again. This will free up file pointers in cases where a really large number of blob files are opened upon initialization of ArrayStack objects. This should solve also a problem reported in http://forum.yacy-websuche.de/viewtopic.php?p=17191#p17191 orbiter 2009-08-27 11:03:21 +00:00
  • 93b2622503 *) repaired and added IM online status indicators *) added some missing SVN properties *) removed unnecessary comment, added missing copyright notice low012 2009-08-26 18:34:00 +00:00
  • e7736d9c8d more refactoring: made all variables in SearchEvent private to prepare splitting of the class into two parts: local and remote search orbiter 2009-08-26 15:59:55 +00:00
  • 4b92d0b9b7 patch for possible problems with normalization of '/' in urls. This applies in rare cases when '/' appear in post-properties orbiter 2009-08-26 15:10:03 +00:00
  • d8ca6e6bf1 more refactoring for search orbiter 2009-08-25 21:27:01 +00:00
  • becb30fa12 changes by Thomas Süß daburna 2009-08-25 09:19:50 +00:00
  • fe4a4e3f6b added missing class orbiter 2009-08-24 21:03:40 +00:00
  • 72ac5bd80f refactoring of search process. this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process. orbiter 2009-08-24 15:24:02 +00:00
  • c4d0e22a77 Further speed upof concurrent DHT-receive hermens 2009-08-23 22:37:15 +00:00
  • 2fbc0696bf Fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2334 hermens 2009-08-21 13:10:59 +00:00
  • 07f505154d updated German translation daburna 2009-08-19 13:58:40 +00:00
  • d515bc11e2 added ooxmlparser f1ori 2009-08-08 15:34:41 +00:00
  • 8c1b02af04 * fix warning in testcase f1ori 2009-08-08 15:18:02 +00:00
  • d9744b1b5d replaced old caching strategy control class with lightweight simplearc orbiter 2009-08-07 23:01:33 +00:00
  • 8e56c2ace6 fix for fixes from this afternoon orbiter 2009-08-07 22:53:49 +00:00
  • cf739edc2e fix for possible deadlock, see http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017 orbiter 2009-08-07 12:11:22 +00:00
  • 6354b5e447 removed possible deadlock, see http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017 orbiter 2009-08-07 12:04:14 +00:00
  • 5cc17ccf8a a better caching with less overhead and more appropriate synchronisation use in more than 10 different data objects orbiter 2009-08-07 11:55:32 +00:00
  • 2e01bd955d wrong display of hints / hints wrong / incomplete orbiter 2009-08-07 11:32:41 +00:00
  • 39ae96450b draw more peers in network picture orbiter 2009-08-07 08:36:15 +00:00
  • 92edd24e70 fixed problem with switching of networks orbiter 2009-07-30 15:49:23 +00:00
  • 0575f12838 fix for deadlock orbiter 2009-07-30 09:08:44 +00:00
  • fbfdaf063d - patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys - check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum. orbiter 2009-07-29 19:43:11 +00:00
  • 65b1d51e70 added xml version of windows office test files orbiter 2009-07-27 12:45:15 +00:00
  • c0e17de2fb - fixes for some problems with the new crawling/caching strategies - speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer - fixed some deadlock- and 100% CPU problems in the balancer orbiter 2009-07-25 21:38:57 +00:00
  • 634a01a9a4 replaced wget-requests with caching requests orbiter 2009-07-24 14:52:27 +00:00
  • c6c97f23ad - added cache usage properties to crawl start - added special rule to balancer to omit forced delays if cache is used exclusively - extended the htCache size by default to 32GB orbiter 2009-07-24 11:54:04 +00:00
  • c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup orbiter 2009-07-23 22:09:02 +00:00
  • 161d2fd2ef redesign of access to the HTCache (now http.client.Cache): - better control to the cache by using combined request-header and content access methods - refactoring of many classes to comply to this new access method - make shure that the cache is always written if something was loaded - some redesign of the process how http response results are feeded into the new indexing queue - introduction of a cache read policy: * never use the cache * use the cache if entry exist * use the cache if the proxy freshness rule confirmes * use only the cache and go never online - added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step. - set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule' - enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc. orbiter 2009-07-23 21:31:51 +00:00
  • da43164dd6 fix for UNRESOLVED_PATTERN see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2300 lulabad 2009-07-23 06:02:36 +00:00
  • d7c9c765bb changes by Thomas Süß daburna 2009-07-22 17:24:04 +00:00
  • ba2e6de538 fix empty version string again f1ori 2009-07-21 19:56:40 +00:00
  • 53081ee6da changes by Thomas Süß daburna 2009-07-21 12:22:05 +00:00
  • 51534df0cb fix for possible synchronization problem see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2292&hilit=&p=16787#p16787 orbiter 2009-07-20 08:21:17 +00:00
  • 4da9042e8a code simplification orbiter 2009-07-19 21:59:29 +00:00
  • 1d8d51075c refactoring: - removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here: http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages. - cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http. - because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source. orbiter 2009-07-19 20:37:44 +00:00
  • 67da20647f * add new odf parser based on sax-xml-parser * remove odf_utils-jar * test metadata in ParserTest f1ori 2009-07-18 15:04:34 +00:00
  • de4f0a006f removed superfluous windows target lotus 2009-07-18 14:46:42 +00:00
  • 06557485f5 * added parser unittest! f1ori 2009-07-17 22:03:34 +00:00
  • 69dfd03985 reactivate unittests * fix old tests * add buildtarget "ant test" f1ori 2009-07-17 20:58:21 +00:00
  • 6d0e6d591b * ops, fix compiler error :( f1ori 2009-07-17 20:02:56 +00:00
  • 3e5beb1654 * fix for empty version in seedlist f1ori 2009-07-17 19:16:26 +00:00
  • 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency. - The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well. - Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified. - Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed. - The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here. - Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object. orbiter 2009-07-17 13:59:21 +00:00
  • 597393db3b changed default visibility of classes/objects in upnp lib (eclipse tells me that this would improve performance, however, this removes compiler warnings) orbiter 2009-07-16 12:19:40 +00:00
  • eea4c17ef2 removed rpm parser - no-one used that thing - loading huge rpm files bay be causes for crashes orbiter 2009-07-16 11:06:49 +00:00
  • b332dfad67 - inserted request object into response object which carries this now instead generating new objects - fixed a problem with the crawler introduced in SVN 6216 orbiter 2009-07-15 23:08:35 +00:00
  • ca72ed7526 -removed superfluous crawl cache -refactoring of crawler classes orbiter 2009-07-15 21:07:46 +00:00
  • 8103ccec4c removed compiler warnings in imported classes orbiter 2009-07-15 20:44:23 +00:00
  • 52e371b8f7 suppress warnings for upnplib code lotus 2009-07-15 16:22:56 +00:00
  • 477807e0e6 * updated jxpath to latest v1.3 * added upnplib as source without packages: jmx remote samples lotus 2009-07-15 16:13:24 +00:00
  • 049fb23a8d removed unused/unsupported ant targets orbiter 2009-07-15 14:16:25 +00:00
  • 13c63f4082 a set of small fixes to crawling behaviour orbiter 2009-07-15 14:15:51 +00:00
  • a564df3984 update to mime types in parsers and httpd.mime orbiter 2009-07-15 14:10:29 +00:00
  • 43c8defd79 enhanced parser with more extension + mime attributes orbiter 2009-07-14 13:32:53 +00:00
  • aee35bff6f replaced StringBuffer with StringBuilder in tar lib orbiter 2009-07-14 13:31:57 +00:00
  • 49bbb9bd45 replaced tar library with integrated apache ant tar lib orbiter 2009-07-14 11:31:40 +00:00
  • f987fc6b4a added tar classes from apache ant tools orbiter 2009-07-14 11:25:40 +00:00
  • f2d4b6d7fa added tar classes from apache ant tools orbiter 2009-07-14 11:25:05 +00:00
  • b2263bc720 enhanced document type recognition orbiter 2009-07-14 11:01:05 +00:00
  • aa38eb5a20 * maxfilesize -1 for infinite filesize lotus 2009-07-14 08:39:39 +00:00
  • 7d493cf8cc moved parser configuration in separate servelet orbiter 2009-07-14 06:57:13 +00:00
  • 9cfe89c8fc * process content-length as soon as it is received * corrected indentation lotus 2009-07-13 19:55:13 +00:00
  • 5240d22773 removed unused library jsmooth orbiter 2009-07-13 18:16:03 +00:00
  • 3d26161dd1 removed unused libraries orbiter 2009-07-13 14:47:09 +00:00
  • 50cf80056f removed jmimemagic library orbiter 2009-07-13 10:58:37 +00:00
  • e3c7f61145 removed unused libraries orbiter 2009-07-13 10:21:22 +00:00
  • 3f113f38a8 removed unused imports removed unused libs from eclipse class path orbiter 2009-07-13 10:19:10 +00:00
  • 9f083bb6b2 check filetype before loading (no more mp4 loading) lotus 2009-07-12 16:50:11 +00:00
  • b118bdd994 *) Deleted obsolete license file. low012 2009-07-12 16:38:13 +00:00
  • 076ae02c44 * added pl and py to extensions excepted by htmlParser f1ori 2009-07-12 16:35:35 +00:00
  • d5e51cfd09 * workaround for non-working build property replacements f1ori 2009-07-12 09:38:03 +00:00