Commit Graph

  • 8e751d754a - add javadoc to busythread with hint about the init parameter useage - remove obsolete 10_httpd config parameter reger 2015-01-09 01:31:57 +01:00
  • 3e6c3e2237 documents pushed over the api/push_p.html interface will have their unique flag set by default Michael Peter Christen 2015-01-06 15:22:59 +01:00
  • 0871e43fcc better scale Michael Peter Christen 2015-01-06 14:22:43 +01:00
  • 35c24608cc fix for division by zero (rare cases) Michael Peter Christen 2015-01-06 14:21:20 +01:00
  • 4144c7cc52 do not write frame links to webgraph Michael Peter Christen 2015-01-06 14:14:25 +01:00
  • 4eb89d7f15 revert clickservlet (default was indeed a mistakenly) reger 2015-01-05 09:10:20 +01:00
  • 61ae9d2d11 do not use the clickservlet by default. From my personal view, this technique should not be used at all! This project is about privacy, the existence of a click servlet is one example why people should NOT use a search portal if such exists. Michael Peter Christen 2015-01-05 08:21:51 +01:00
  • c9e2128260 please commit new files under your own name, this file was not created by me. Michael Peter Christen 2015-01-05 08:18:19 +01:00
  • ebe5faeb01 added url to bookmark icon link url is anyway needed, saves index lookup and works w/o commited url. Removed unused order parameter reger 2015-01-05 06:55:53 +01:00
  • 5594c43d2e bump to Solr-/Lucene-4.10.3 sixcooler 2015-01-04 18:47:47 +01:00
  • d44d8996d0 Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. reger 2015-01-04 11:10:45 +01:00
  • d729386787 fix NPE in viewimage reger 2015-01-04 09:12:30 +01:00
  • 4ff018c9e4 fix ConfigPortal jumps to iframe focus add focus parameter to yacysearch.html too reger 2015-01-04 06:57:13 +01:00
  • c156548efe add info text to metadata page (htmlresponsewriter) on no documents found reger 2015-01-04 02:59:21 +01:00
  • 3ac1d14a21 improve TexParser.mimeOf( fileextension ) by returning 1st defined in supported list. This prevents unusual mapping of supported fileextension -> mimetype (like htm=application/x-tex) reger 2015-01-02 04:20:02 +01:00
  • d2792a43fd do not write iframe and embed links into webgraph, but use them anyway for crawling Michael Peter Christen 2015-01-02 02:44:03 +01:00
  • 5b810f6d70 Merge branch 'master' of gitorious.org:yacy/whitrs-rc1 Michael Peter Christen 2015-01-02 00:57:37 +01:00
  • 3cdbd5f5c6 Fix for progress table background not resizing when the post-processing started/ended. Ryszard Goń 2015-01-02 00:11:32 +01:00
  • 0dfeee154a adjustments for Bookmark icon to act on BookmarkDB, it acts on YMarks but YMark interface seems not maintained, for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB. The YMark related code is (for now) left untouched so both tables are updated. reger 2015-01-01 02:41:20 +01:00
  • 513e9259f5 Merge branch 'master' of git@gitorious.org:yacy/rc1.git Michael Peter Christen 2014-12-30 02:36:17 +01:00
  • e177d69387 remove obsolete config footer option (ConfigPortal user.login) no footer or footer-option in use reger 2014-12-29 03:50:00 +01:00
  • 5d4167f977 reacivated clear stacks code for termination of all crawls because this did not work wihtout that part of the code Michael Peter Christen 2014-12-28 15:52:43 +01:00
  • 3cd7deb3b8 do not flush non-errors to stdout because this is a concurrency issue. the flush-call appeared very often in thread dumps with high load, so this hopefully gives some performances Michael Peter Christen 2014-12-28 15:48:37 +01:00
  • 4e3e2acc69 Merge branch 'master' of gitorious.org:yacy/rc1-fixed_percent-encoding Michael Peter Christen 2014-12-28 15:01:40 +01:00
  • ecb6a59e9e do not translate gif images into png images for thumbnails. Instead, stream the original to the search result thumb viewer. This has two reasons: - animated gifs cause 100% cpu and deadlocks in the jvm gif parser; a known bug which is obviously not yet fixed - animated gifs now appear in the search result also as animation Michael Peter Christen 2014-12-28 14:53:55 +01:00
  • d9603039ff automatically set the Q flag for smb/ftp start urls (split pdf support) Michael Peter Christen 2014-12-28 14:36:43 +01:00
  • 8600ea01dd automatically swith on query option in case intranet protocols (smb/ftp) are used. This supports the new split-pdf option. Michael Peter Christen 2014-12-28 14:27:42 +01:00
  • 3e9871291f Applied URL-decoding prior to HTML-encoding. This removes percent-encoding from text shown in HTML arucard21 2014-12-27 09:52:34 +01:00
  • 3144313974 Postprocessing progress bar fix (Make it work as [probably] actually intended) Ryszard Goń 2014-12-27 03:02:18 +01:00
  • 6a04563578 Init Jetty using setDefaultDescriptor (web.xml) to defaults/web.xml so web.xml in defaults dir is applied first and optional DATA/SETTINGS/web.xml loaded on top. By using this Jetty feature (default web.xml) we assure that changes to the default are applied to existing installations and individual addition/changes are still respected. reger 2014-12-27 00:10:14 +01:00
  • 51ec9c1f44 fix "null" title in response writer for documents with multivalued title reger 2014-12-26 18:23:26 +01:00
  • 73ba5d8ef7 adjust fieldtype and description of field httpstatus_redirect_s in CollectionSchema - the field is not used (delete candidate) reger 2014-12-26 18:21:35 +01:00
  • 1f9389396a fix NPE related 500 (Bad Request) response of UrlProxy on blacklisted urls, by adding parameter HTTPDeamon and removing unused hostAddress lookup code in sendRespondError reger 2014-12-25 02:21:45 +01:00
  • 7e4e9f7e32 improve yacysearchitem, prevent allocation of String (modifyURL) if feature not used reger 2014-12-25 02:16:19 +01:00
  • 61f75d6019 add xmpcore as direct dependency to pom (otherwise it's looked up at pdfbox archive path and not found there) reger 2014-12-25 02:13:44 +01:00
  • 8ef56eda90 Merge branch 'master' of git@gitorious.org:yacy/rc1.git Michael Peter Christen 2014-12-24 12:24:15 +01:00
  • 9fce8bf2a5 crawling of multi-page pdfs with artificial post part on smb or ftp shares is not possible with the disabled setting; this is not temporary disabled until a better solution is on the hand. Michael Peter Christen 2014-12-24 12:23:59 +01:00
  • 682dd94925 fix div by 0 in hello Caused by: java.lang.ArithmeticException: / by zero at hello.respond(hello.java:159) reger 2014-12-24 00:04:35 +01:00
  • 17808898c6 update to SLF4J 1.7.9 reger 2014-12-23 19:11:21 +01:00
  • f856edecb6 fix proxy redirect (http status 302) response fixes http://mantis.tokeek.de/view.php?id=517 reger 2014-12-23 02:01:03 +01:00
  • cc090bcb01 enhanced initialization of autotagging Michael Peter Christen 2014-12-23 00:37:51 +01:00
  • 003ec43bee Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-23 00:33:20 +01:00
  • bef689d0a2 NPE fix Michael Peter Christen 2014-12-23 00:30:34 +01:00
  • 1de33c6a53 add hint to Heuristics Config on "Greedy Learning Mode" in portal config, to point to a option to make this setting permanent. reger 2014-12-22 20:36:29 +01:00
  • 5332c9df21 update to commons-fileupload-1.3.1.jar (includes a security fix) reger 2014-12-22 20:34:13 +01:00
  • a0576ec737 fix for pdf sub-page result preparation Michael Peter Christen 2014-12-22 14:32:09 +01:00
  • 6ad43c4a8b removed debug code Michael Peter Christen 2014-12-22 14:24:09 +01:00
  • 407cfff010 fix to wkhtmltopdf usage Michael Peter Christen 2014-12-22 02:01:55 +01:00
  • 5d321d3dc5 fixes to wkhtmltopdf call Michael Peter Christen 2014-12-21 20:11:39 +01:00
  • eb78388a98 changed prefer strategy for http unique in such a way that http is preferred over https. While this is a bad idea from the standpoint of security it is more common applicable for environments where http and https mix and for some domains https is not available. Then the double-check is possible even if no postprocessing is performed. Michael Peter Christen 2014-12-21 19:17:06 +01:00
  • 84e2cccab4 fix to prevent assertion error in ranking servlet if no vocabularies are present that could be evaluated Michael Peter Christen 2014-12-21 19:08:28 +01:00
  • 9e588944fa prevent NPE during initialization of very large vocabularies Michael Peter Christen 2014-12-21 19:02:36 +01:00
  • aaf7d4775a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-21 18:10:25 +01:00
  • 8c3e5b7b6d added experimental pdf splitting which enables YaCy to split pdfs during parsing into individual pages and add them all using different URLs. These constructed urls are generated from the source url with an appended page=<pagenumber> attribute to the url get/post properties. This will distinguish the different page entries. The search result list will then replace the post parameter with a url anchor # mark which causes that the original url is presented in the search result. These URLs can be opened directly on the correct page using pdf.js which is now built-in into firefox. That means: if you find a search hit on page 5 and click on the search result, firefox will open the pdf viewer and shows page 5. Michael Peter Christen 2014-12-21 18:10:15 +01:00
  • 85773ebd4f removed debug lines Michael Peter Christen 2014-12-21 17:53:06 +01:00
  • d14114697c the miss cache does not seem to work, it sometimes contains urlhashes from documents which actually are inside the index. This can be reproduced using the crawl result table at http://localhost:8090/CrawlResults.html?process=5 The cache is temporary disabled to remove the bad behaviour, however a later reactivation of that feater may be possible. Michael Peter Christen 2014-12-21 17:31:51 +01:00
  • deb75a1dbe fix refactored size() -> filesize() in YMarkMetadata reger 2014-12-21 14:02:06 +01:00
  • 198102304b refactor size() -> filesize() of URIMetadataNode (harmonize with ResultEntry and to not get confused with Collection.size()) reger 2014-12-21 06:05:35 +01:00
  • c6f634a4f2 remove redundant caching of urlhash in URIMetadataNode (is already cached in underlaying DigestURL .url) reger 2014-12-21 03:45:54 +01:00
  • 445fafeb7c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-20 15:38:15 +01:00
  • 0d69089c61 fix for division by zero Michael Peter Christen 2014-12-20 15:11:06 +01:00
  • ac61a39828 use peeraddress for link in remote crawl list to make link work without enabled proxy reger 2014-12-20 01:59:00 +01:00
  • fe5d4e6c7b update to Jetty 9.2.6 reger 2014-12-19 21:54:17 +01:00
  • 5516819354 preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again. Michael Peter Christen 2014-12-19 17:41:38 +01:00
  • d3e71ed070 fixes for searches when initialization of large autotagging libraries have not been finished Michael Peter Christen 2014-12-19 17:38:58 +01:00
  • 28683530cd fixes to usage of no-cache: use and recognize also the no-store directive Michael Peter Christen 2014-12-19 17:37:58 +01:00
  • c9c700b510 reduction of http requests to YaCy using the correct cache-control, expires and last-modified headers in http response. Michael Peter Christen 2014-12-19 11:51:14 +01:00
  • eca578a5fa update to PDFBox 1.8.8 reger 2014-12-19 02:54:38 +01:00
  • 13cca2b114 fix missing AppPath upd Maven plugin versionid reger 2014-12-19 01:58:37 +01:00
  • d7e2f08a89 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-18 14:56:18 +01:00
  • 0f7d4c42e9 include xmpcore.jar in classpath used by metadata-extractor reger 2014-12-16 21:12:37 +01:00
  • bd39e009ac Update russian translation malykhin.dmitry 2014-12-16 23:10:53 +03:00
  • 65125439fe added query modifier 'on'. This makes it possible to search for date occurrences within the (web) page documents (not the document last-modified!). This works only if the solr field dates_in_content_sxt is enabled. A search request may then have the form "term on:<date>", like gift on:24.12.2014 gift on:2014/12/24 * on:2014/12/31 For the date format you may use any kind of human-readable date representation(!yes!) - the on:<date> parser tries to identify language and also knows event names, like: bunny on:eastern .. as long as the date term has no spaces inside (use a dot). Further enhancement will be made to accept also strings encapsulated with quotes. Michael Peter Christen 2014-12-16 13:53:12 +01:00
  • 1cfddea578 added (very experimental) Solr response writer for snapshot image results Michael Peter Christen 2014-12-16 13:18:49 +01:00
  • 7287dd764e added url, date, time and page number on pdf snapshot footer Michael Peter Christen 2014-12-16 12:39:10 +01:00
  • 8b5d074715 fix for image parser (there is a class missing!) Michael Peter Christen 2014-12-16 12:10:15 +01:00
  • 932faafffe reactivated on-demand snapshot loading Michael Peter Christen 2014-12-16 12:09:57 +01:00
  • 2362ad7c34 fix for a count issue in snapshot api Michael Peter Christen 2014-12-16 11:33:30 +01:00
  • 3354cd63be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-15 23:32:57 +01:00
  • 9971e197e0 Added a transaction interface to the snapshots: all documents in the snapshots can now be processed with transactions using commit and rollback commands. Furthermore, a large number of monitoring methods had been added to check the success of transactions. Michael Peter Christen 2014-12-15 23:32:46 +01:00
  • 63846ddb89 add final SolrQueryRequest.close to SolrServlet reger 2014-12-15 22:54:49 +01:00
  • 9edc7308aa update to metadata-extractor-2.7.0.jar add 2 simple JUnit test cases for jpeg and tif parsing reger 2014-12-15 20:45:05 +01:00
  • 578ae29f1e added a note that the servlet is linked using web.xml Michael Peter Christen 2014-12-15 05:56:12 +01:00
  • 6c3f36def1 - fix path to default heuristic.cfg - deprecate unused ProxyServlet reger 2014-12-14 21:27:45 +01:00
  • 00113dcfbd add chardet.jar to Maven dependencies reger 2014-12-14 19:17:13 +01:00
  • 446f374ba9 fix yacy.init comment http://mantis.tokeek.de/view.php?id=513 reger 2014-12-14 19:12:18 +01:00
  • bbf0ac40c3 add the actual DateDetection class... (missed in latest commit) Michael Peter Christen 2014-12-14 13:43:30 +01:00
  • 66b5a56976 Added and integrated new date detection class which can identify date notions within the fulltext of a document. This class attempts to identify also dates given abbreviated or with missing year or described with names for special days, like 'Halloween'. In case that a date has no year given, the current year and following years are considered. Michael Peter Christen 2014-12-14 13:40:45 +01:00
  • c3c2b6999b fixes on wkhtmltopdf Michael Peter Christen 2014-12-14 04:03:20 +01:00
  • 114f0afc1e enable sku as anchor in html response writer Michael Peter Christen 2014-12-14 04:02:13 +01:00
  • aa80cb1159 enhanced tagging preparation speed which reduces initialization time for very large vocabularies Michael Peter Christen 2014-12-13 09:54:41 +01:00
  • 6a1865f507 refactoring date -> lastModified Michael Peter Christen 2014-12-11 23:37:41 +01:00
  • ab6cc3c88c added concurrent generation of snapshot pdfs Michael Peter Christen 2014-12-10 14:10:05 +01:00
  • ff035a20e7 fix for vocabulary import (double term detection) Michael Peter Christen 2014-12-10 14:09:34 +01:00
  • e6650050fe fix for Is Facet checkbox Michael Peter Christen 2014-12-10 13:14:39 +01:00
  • bd3ed5cae5 added charset detection to vocabulary reader Michael Peter Christen 2014-12-10 13:11:51 +01:00
  • 413eeefed4 added character set detection library from http://www-archive.mozilla.org/projects/intl/chardet.html Michael Peter Christen 2014-12-10 13:08:29 +01:00
  • 7bfc5b80cb added new options to vocabulary editor: - new switch 'isFacet' which causes that the usage of the vocabulary for search facets is enabled or disabled. This shall be used for large vocabularies sind searched in solr are extremely slow if facets for a large set of alternative terms are generated - new option to disable auto-enrichment from synonyms - new option to add synonyms from another column when importing from csv - automatically recognize double-occurrences in synonyms and bundling terms for such synonyms Michael Peter Christen 2014-12-10 12:20:27 +01:00
  • 87b53b3572 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-12-09 16:20:44 +01:00
  • 8df8ffbb6d enhanced the snapshot functionality: Michael Peter Christen 2014-12-09 16:20:34 +01:00