Commit Graph

  • a5d19e2982 update configheuristics_p.html text to state current opensearch heuristic function reger 2015-03-09 00:09:36 +01:00
  • 4b63dad88d fix version conflict in pom for commons-io reger 2015-03-08 22:10:51 +01:00
  • 7e09bff4a1 exclude default search fields from text copy to text_t for metadata index documents (reduce text redundance) reger 2015-03-08 21:49:23 +01:00
  • 86073a5ba3 For remote crawlReceipt add document abstract/description enhance the returned metadata returned to the originator by description_txt to improve fulltext search result hits. reger 2015-03-08 02:34:48 +01:00
  • 8af70950d9 harmonize snippet computation to considere description_txt always (solr hl & internal). For now just added desc to text list for computation, could be further equalized with hl computation. reger 2015-03-05 02:22:05 +01:00
  • 74ed399180 remove unused statement reger 2015-03-05 02:09:27 +01:00
  • fd4e2c809a Show dates in the content of a document in the search result: - if an eventDate is given in the search result, replace the document date with the event date and prefix it with the string "on ". - the document date is omitted if a date from the cent is shown Michael Peter Christen 2015-03-02 18:00:20 +01:00
  • 893889bc7b added special terms for on: - Date modifier: tomorrow, today; i.e.: search for: "Berlin on:tomorrow" to find events happening tomorrow in Berlin Michael Peter Christen 2015-03-02 13:10:05 +01:00
  • 710a0efa1b generalized time period computations Michael Peter Christen 2015-03-02 12:55:31 +01:00
  • dcfc384eee bugfix for fixed host/port Michael Peter Christen 2015-03-02 04:43:42 +01:00
  • d9d3111d10 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-03-02 04:31:05 +01:00
  • 535f1ebe3b added a new way of content browsing in search results: - date navigation Michael Peter Christen 2015-03-02 04:30:10 +01:00
  • 16bc267a32 add test case for snippet html encoding check reger 2015-03-01 23:50:17 +01:00
  • a4629ad83b upd pom reger 2015-02-28 19:48:29 +01:00
  • d7259419f3 postpone raw snippet html encoding upon use instead of during init of snippet adressing http://mantis.tokeek.de/view.php?id=551 reger 2015-02-28 19:02:18 +01:00
  • c3aadcf899 Fix for Jetty "JetLeak" bug: update to jetty 9.2.9 The bug was inside the jetty library, for details see: http://blog.gdssecurity.com/labs/2015/2/25/jetleak-vulnerability-remote-leakage-of-shared-buffers-in-je.html We recommend to update your YaCy peer with this bugfix. Michael Peter Christen 2015-02-28 15:46:46 +01:00
  • de56d934b2 apply query parameter getQueryFields() to GSA servlet reger 2015-02-27 00:53:20 +01:00
  • d23f7165ab Next try to fix start script for OpenBSD. Marc Nause 2015-02-25 21:11:59 +01:00
  • 2d2299f484 fix mimetype of rss items in rss parser - remove self reference as anchor for items reger 2015-02-25 01:58:42 +01:00
  • b432049d59 enhanced date parsing time Michael Peter Christen 2015-02-25 01:05:46 +01:00
  • 9b0de2de64 introduce getQueryFields to return default query fields (queryparamter QF) calculated from boostfields config, making sure title, description, keywords and content is always searched. - apply change to solrServlet makes sure every remote query uses at least all locally defined boost fields for search - apply to local solr search - simplify select query by using QF defaults reger 2015-02-23 23:12:07 +01:00
  • 53e4ae65d0 Changes to improve compatibility with OpenBSD. (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5503) Marc Nause 2015-02-23 22:54:49 +01:00
  • ba276d3e64 add description_txt to default query fields, Dublin Core Metadata field extracted by most parsers. reger 2015-02-22 05:42:04 +01:00
  • a0f04db9ea add extracted description/subject to pptParser reger 2015-02-22 05:31:56 +01:00
  • 8ec1db76ee url unescape add check for inconsistent utf8 multibyte parsing If the url contains special chars (like umlaute äöü) it's interpreted as multybyte char and actually not converted at all (removed). Added a check if the multibyte convesion is not complete, just add the char as is. reger 2015-02-20 02:21:04 +01:00
  • 4b97ddb9ec stop sending crawl receipts if receiver got offline reger 2015-02-17 03:16:10 +01:00
  • ad1596f9ac upd lucene api doc link reger 2015-02-16 01:20:12 +01:00
  • 7e35518787 add extracted description/subject to docParser reger 2015-02-16 00:50:16 +01:00
  • f0a5188e11 replace depreciated HTTPClient setStaleConnectionCheckEnabled with setValidateAfterInactivity() reger 2015-02-15 23:09:01 +01:00
  • 7b569d2dbe replace depriciated HTTPClient ALLOW_ALL_HOSTNAME_VERIFIER with NoopHostnameVerifier() reger 2015-02-15 21:34:01 +01:00
  • fba34e12ef fix formatting issue if snippet contains html code replacement for reverted commit 61f42a7928 reger 2015-02-15 20:39:20 +01:00
  • e48720a58c fix NPE in snippet computation reger 2015-02-15 05:30:14 +01:00
  • 49281617d2 upd to commons-codec-1.10.jar, commons-compress-1.9.jar reger 2015-02-14 23:04:05 +01:00
  • 1196ff01c8 revert: formatting fix eats also up highlighting need other solution for snippets with unwanted html code reger 2015-02-14 02:43:05 +01:00
  • f989f955dc fixed httpclient lib paths in ant build Michael Peter Christen 2015-02-14 01:38:20 +01:00
  • 6dbc976d8b upd to httpclient-4.4 reger 2015-02-13 00:50:32 +01:00
  • 61f42a7928 fix formatting issue in search result display if description contains html code noticed e.g. for id=NmNdJ9uApLaQ http://hswong3i.net/blog/hswong3i/virtualmin-drupal-7-x-ubuntu-12-04-howto reger 2015-02-13 00:20:33 +01:00
  • eda0aeaf26 allow/recognize host in file: protocol crawl target This is useful in intranet indexing while crawling a intranet file server accessed via hostname while e.g. under Windows mapped to different drive letters on individual clients. Here you can crawl e.g. file://fileserver/documents having a valid uri in that intranet environment (while e.g. P:/documents might be client dependant). reger 2015-02-11 23:26:39 +01:00
  • 77851fa53c fix parser test cases (Vocabulary paramete) reger 2015-02-11 01:43:02 +01:00
  • df83fcc4fc disable optimistic GC assumption in StandardMemoryStrategy After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC. Disabeling this check improved eom exceptions. reger 2015-02-11 01:42:01 +01:00
  • 8ff76f8682 the cleanup process experienced a 100% CPU load situation and the loop did not terminate: Michael Peter Christen 2015-02-10 08:43:45 +01:00
  • 1f5b5c0111 npe fix for latest scraper feature Michael Peter Christen 2015-02-10 08:33:30 +01:00
  • ee97302a23 hack to make date detection faster (while it becomes a bit incomplete regarding language alternatives) Michael Peter Christen 2015-02-09 18:46:06 +01:00
  • 6578ff3ddb enhanced suggest function Michael Peter Christen 2015-02-09 18:45:07 +01:00
  • fe6f5a395d fix Umlaut handling in blekko heuristic search term http://mantis.tokeek.de/view.php?id=169 observation: blekko seams to block xxxbot agents (=0 results) reger 2015-02-08 23:40:33 +01:00
  • ab98f69592 fix: searchoption hint for heuristic reger 2015-02-08 00:15:30 +01:00
  • 23924348e2 url with semicolon or comma handling in proxy request apply patch supplied with bugreport http://mantis.tokeek.de/view.php?id=540 reger 2015-02-07 22:01:54 +01:00
  • b05a2fca1f small correction for last commit sixcooler 2015-02-07 13:47:15 +01:00
  • 8fa542a8e1 upd to Jetty 9.2.7 reger 2015-02-07 00:44:09 +01:00
  • 9025fe3518 upd error message for proxy fix http://mantis.tokeek.de/view.php?id=539 reger 2015-02-07 00:37:43 +01:00
  • 974d58b01f IPv6 Fix for push interface Michael Peter Christen 2015-02-04 15:03:34 +01:00
  • fe50e5aef6 fix for failed selection of terms in faceted search with vocabularies Michael Peter Christen 2015-02-04 11:55:27 +01:00
  • 1309619a71 remove remote indexing option in crawl start if not in p2p mode Michael Peter Christen 2015-02-04 11:37:07 +01:00
  • 6324db1213 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-02-04 11:27:31 +01:00
  • 5cb05c3013 adjust table column width to not line wrap crawler traffic line reger 2015-02-04 03:51:34 +01:00
  • 606d00c8f2 cloning a crawl now accepts the class name of vocabulary scapers Michael Peter Christen 2015-02-04 01:50:35 +01:00
  • 97ba5ddbb7 configuration option for maxload limit for remote search Michael Peter Christen 2015-02-04 01:12:25 +01:00
  • c454ef69c6 add shortMemory check to heuristic search and skip operation on shortMemory (no request to remote openserch systems) reger 2015-02-03 03:08:34 +01:00
  • 11b21308c0 fix: malformed filename in image search fix for http://mantis.tokeek.de/view.php?id=533 reger 2015-02-01 05:35:09 +01:00
  • 9e1ec5fec4 refactor: just some more useages of constant for term ":[* TO *]" reger 2015-02-01 04:26:33 +01:00
  • 8c491f51a5 remove hardcoded initialization of language nav if not used reger 2015-02-01 00:29:28 +01:00
  • a311c97c9b Added & in start script for *NIX which was lost a few commits ago. Marc Nause 2015-01-30 21:17:23 +01:00
  • b5ac29c9a5 added a html field scraper which reads text from html entities of a given css class and extends a given vocabulary with a term consisting with the text content of the html class tag. Additionally, the term is included into the semantic facet of the document. This allows the creation of faceted search to documents without the pre-creation of vocabularies; instead, the vocabulary is created on-the-fly, possibly for use in other crawls. If any of the term scraping for a specific vocabulary is successful on a document, this vocabulary is excluded for auto-annotation on the page. Michael Peter Christen 2015-01-30 13:20:56 +01:00
  • 1cb290170e refactoring of autotagging code (combined same code pieces) Michael Peter Christen 2015-01-29 11:39:47 +01:00
  • c3b55455fc enhanced initialization speed of vocabularies by using better normalization and by removal of unused data structures Michael Peter Christen 2015-01-29 02:45:32 +01:00
  • 68c605d637 replace with CommonPattern.SPACE for split Michael Peter Christen 2015-01-29 02:28:03 +01:00
  • de3e373913 using precompiled CommonPattern.TAB for split Michael Peter Christen 2015-01-29 02:22:28 +01:00
  • 1f5047b15f using precompiled pattern CommonPattern.SEMICOLON for splits Michael Peter Christen 2015-01-29 02:19:41 +01:00
  • a8a2b7a803 persistency for vocabulary facet switch Michael Peter Christen 2015-01-29 02:16:42 +01:00
  • efbc9a3561 introducting a new getConfig method which parses comma-separated llists from setting fields; refactoring for all places where such lists are parsed Michael Peter Christen 2015-01-29 01:53:36 +01:00
  • 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where split(",") was used Michael Peter Christen 2015-01-29 01:46:22 +01:00
  • ac19690d30 refactoring with CommonPattern.COMMA Michael Peter Christen 2015-01-29 01:35:28 +01:00
  • cf9b22ca5c do not reindex based on vocabulary fields (there are meanwhile many of them) and some default settings Michael Peter Christen 2015-01-29 01:22:28 +01:00
  • 5a060c9f26 refactoring of reindexSolr (just replaced constant string) Michael Peter Christen 2015-01-29 00:33:07 +01:00
  • b5a55c8b3d fix for wkhtmltopdf (custom header does not work) Michael Peter Christen 2015-01-28 17:45:25 +01:00
  • 3d717b749a fix for urlmaskfilter Michael Peter Christen 2015-01-28 13:40:41 +01:00
  • 2636582435 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-01-28 10:32:17 +01:00
  • 0260d3d800 Allow to hide linkstructure graphic in crawl monitor using/setting the config param DECORATION_GRAFICS_LINKSTRUCTURE reger 2015-01-28 03:59:01 +01:00
  • bee5ee7cce removed some warnings Michael Peter Christen 2015-01-27 17:00:20 +01:00
  • 783cf6fbc7 the LinkedBlockingQueue is much faster than the ArrayBlockingQueue (strange but this is the result of a test: ArrayBlockingQueue: 39461 lines / second; LinkedBlockingQueue: 60774 lines / second) Michael Peter Christen 2015-01-27 16:53:09 +01:00
  • 6390454652 fix for vocabulary on/off setting Michael Peter Christen 2015-01-27 16:24:27 +01:00
  • a3c5995bde Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-01-26 14:13:17 +01:00
  • 5ca0762179 fix: eom on parsing ico file by genericImageParser reger 2015-01-24 23:17:07 +01:00
  • 4cd2d68e03 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-01-24 07:10:47 +01:00
  • dc5700148f update to latest code changes from json.org Michael Peter Christen 2015-01-24 07:10:14 +01:00
  • 42b0672be3 Let auto-disabled crawls recover if low resource condition vanished. Analog to autodisabled DHT switch autodisabled crawls back on upon mem ok by remembering the autodisable by conf parameter. reger 2015-01-24 01:53:58 +01:00
  • b32e0b5457 fix for shell script Michael Peter Christen 2015-01-23 18:34:38 +01:00
  • 29f6e9db7a write java version to status page Michael Peter Christen 2015-01-23 17:57:54 +01:00
  • 604ccd8072 new development cycle Michael Peter Christen 2015-01-23 11:31:05 +01:00
  • 287c528f46 replaced old JavaApplicationStub for Mac Application framework with new script. Adopted the YaCyApp environment and fixed a problem in the startYACY.sh application wrapper which caused wrong usage of logging option -l which caused that files had been written to the YaCy application folder. As a result of this fix, it is not necessary any more to change path settings in Info.plist if libraries are changed. Michael Peter Christen 2015-01-23 11:30:13 +01:00
  • 2bc2564668 Release 1.82 Release_1.82 Michael Peter Christen 2015-01-21 12:45:55 +01:00
  • 4c9d2a7c64 reverted 'do not show all options' strategy. This is actually confusing new users. Will be activated maybe again if there is an optional tutorial mode which can be switched on for this special purpose of running a tutorial. Michael Peter Christen 2015-01-20 18:18:12 +01:00
  • 7db2888336 fixed font size and print page generation in pdf snapshots Michael Peter Christen 2015-01-20 17:14:14 +01:00
  • 24f68a4eb7 refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. reger 2015-01-19 03:30:35 +01:00
  • 3b51636ecb fix for mediawiki import Michael Peter Christen 2015-01-12 00:35:47 +01:00
  • b07afbc115 a test with http://validator.w3.org/feed/#validate_by_input shows that the time format was wrong; we must use RFC-822 Michael Peter Christen 2015-01-09 16:45:43 +01:00
  • 8cafdb989a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2015-01-09 11:00:02 +01:00
  • 66839f73fa remove debug limit from commit before reger 2015-01-09 02:52:18 +01:00
  • 4214f250d0 Add option for extended search (Autosearch) to Bookmark.html asking all connected peers for the searchterm added as description to the bookmark created by the bookmark icon. Intended for searches/research projects with not sufficient results from local and DHT selected remote target peers. reger 2015-01-09 02:06:30 +01:00
  • bb37cb32e4 Add title import for bookmark icon if avail in index reger 2015-01-09 01:33:45 +01:00