Commit Graph

  • 8615556dd5 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git luc 2015-09-24 09:28:57 +02:00
  • 2f51baff4f check for loading error (includs unsupported formats) to prevent blank thumbnail display in image search because of not handled source which don't load on click. Now the cross icon indicates the problem (inlcuding not supported format) reger 2015-09-24 01:58:19 +02:00
  • 5578886f6f Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git luc 2015-09-23 21:04:20 +02:00
  • c38d6c1f37 Correction for mantis 535: inurl: parameter doesn't work on URLs with upper-case letters luc 2015-09-23 21:01:51 +02:00
  • 52e3eb4ce8 harmonize/correct assignment to Ymarkmeta.mime replace use of deprecated reger 2015-09-23 00:13:10 +02:00
  • 87f358058e Fix for index entries which have id's not computed as hash from the url. This makes it possible to operate with outside-computed url hashes in enterprise environments not using the build-in crawler from YaCy. Michael Peter Christen 2015-09-22 11:56:17 +02:00
  • 2951c9fc40 remove unused check for known fileextension in searchtrailer (check is done on add to filetype-nav) reger 2015-09-22 03:52:15 +02:00
  • 3f2b8ab5e5 optionally include mime in p2p url exchange string if doctype decodes to ambiguous mime and default conversion is not equal to original reger 2015-09-22 00:12:31 +02:00
  • de01b25805 Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-09-21 19:36:56 +02:00
  • a3195d78ae add Portuguese month names to date recognition reger 2015-09-20 23:28:42 +02:00
  • d2cc11ea8f fix html parser taking <style> content as text. Noticed some result description contain css content from style tag. Added <style> to tag list to scrape it's content not as text + test case included reger 2015-09-19 05:30:55 +02:00
  • 9ace7876ef Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-09-18 18:41:34 +02:00
  • 5f706797cb patch for a bug inside of solr since solr 5.0 when using a boost function with a numeric date field: "unexpected docvalues type NUMERIC for field 'last_modified' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues." This is a well-known bug inside solr which prevents that now the 'sort by date' in the YaCy search interface can be used. Without this patch no results at all is displayed (since the exception prevents that). Now there is at least a result but it is not ordered properly. Michael Peter Christen 2015-09-18 02:25:44 +02:00
  • c9da652249 Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-09-15 08:11:34 +02:00
  • 733d725dec limit css scrolling to result/content window x from pull request #10 reger 2015-09-15 02:11:30 +02:00
  • 4c38083a11 Merge pull request #10 from Raegdan/raegdan-css-layout-fix Burkhard 2015-09-15 02:09:17 +02:00
  • 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field (regardless if these fields part of update). Switch partial update option off in postprocessing if schema contains *_dts (multivalued date field). see http://mantis.tokeek.de/view.php?id=601 reger 2015-09-13 20:23:15 +02:00
  • b4cbdea1e7 adapt SolrServerConnector.add to handle error on partial update input document. In case of error we deleted the original document and added the new doc to the index. This is not valid for partial update documents (which contain only a subset of the fields). Remove the "delete" error handling step. reger 2015-09-13 20:19:50 +02:00
  • e594130aec add test case for partial update - to discover effect on YaCy for update of documents with multivalued date fields (like dates_in_content_dts) current result: loss of fields/information in index document, see EmbeddedSolrConnectorTest.testUdate_withMultivaluedDateField() reger 2015-09-13 06:02:07 +02:00
  • 98ab655917 on reindex delete index document with invalid url if discovered reger 2015-09-12 23:06:13 +02:00
  • 1e8369e18b use a parsed date in Document.toString reger 2015-09-12 22:00:40 +02:00
  • d5da9e5a38 fix test methode (add throw for URIMetadataNode) reger 2015-09-12 20:07:43 +02:00
  • 1dcec73c19 Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-09-11 18:54:40 +02:00
  • a7179138ce Returned again to main repository location : does anyone want to consider mantis 597 ? (http://mantis.tokeek.de/view.php?id=597) luccioman 2015-09-11 17:23:59 +02:00
  • 199b2ce52d Translator refactoring : to simplify locale files writing, process keys as simple string and no more as regular expressions. Updated all locale files to adapt to refectored Translator : removed useless escaped characters and did minor corrections. Performed minor syntax corrections on some html source files. Added an util to translate all html source files with all locales without launching full YaCy application. Corrected main arguments parsing on other translation utils. luccioman 2015-09-11 17:20:11 +02:00
  • 711183bd72 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server luccioman 2015-09-11 11:16:19 +02:00
  • 4dd9c0d5d9 Merge from main repository luccioman 2015-09-07 02:36:22 +02:00
  • 3428b6f13b improve filtering by filetype navigator. The used url-filter for filetype doesn't require ".ext" resulting in too many matches, add a sort-out filter for RWI results. reger 2015-09-07 02:36:22 +02:00
  • e37a4f0b3d prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation reger 2015-09-06 22:19:05 +02:00
  • 41c4eade51 extract modification date from vCard (vcfParser) reger 2015-09-06 04:28:27 +02:00
  • 8768896975 extract lastmodified from openoffice doc set lastmod date in office document parsers reger 2015-09-06 00:04:54 +02:00
  • c40c302748 when many crawl queues are generated, this NPE can occur; probably caused as concurrency issue: W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2239) at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271) at java.util.TreeMap.put(TreeMap.java:582) at net.yacy.kelondro.table.Table.<init>(Table.java:235) at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229) at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204) at net.yacy.crawler.HostQueue.push(HostQueue.java:397) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355) at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Michael Peter Christen 2015-09-05 14:12:17 +02:00
  • 94cfa63c46 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-09-05 14:07:53 +02:00
  • 0a37d8af89 in case that a site crawl is started for urls with file:// path, the host filter does not work because there is no host given in such urls. In that case, patch the filter to be a sub-path filter. Michael Peter Christen 2015-09-05 14:07:23 +02:00
  • 6b6ca63987 Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-09-05 11:30:51 +02:00
  • 367fe388b9 fix exception throw after sendError in DefaultServlet - reduce debug exception logs in crawler reger 2015-09-05 01:57:30 +02:00
  • 348b8db9d2 Merge pull request #12 from luccioman/master Michael Peter Christen 2015-09-04 17:05:06 +02:00
  • 9df249296a Return to mai repository version luccioman 2015-09-04 13:52:03 +02:00
  • 9752bd5f88 Added utils to help translation without launching full YaCy application : - translate all source files with a locale - list all non translated files with a locale luccioman 2015-09-04 13:44:44 +02:00
  • 2f0f0180e2 Added a function to list files recursively. luccioman 2015-09-04 13:42:57 +02:00
  • 7e4c1d2282 Translator refactoring : - deleted useless new StringBuilder allocation - use of a new reusable FileNameFilter - added javadoc luccioman 2015-09-04 13:42:10 +02:00
  • c1d937a90c Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server luccioman 2015-09-04 09:57:49 +02:00
  • 7c1da173e0 fix missing license in image search see http://mantis.tokeek.de/view.php?id=522 reger 2015-09-03 23:36:57 +02:00
  • f17863588f Updated french translations for yacysearhitem.html, yacysearchtrailer.html and Steering.html files. Corrected various labels. luccioman 2015-09-03 09:02:03 +02:00
  • 918ef72bbe Corrected br markup luccioman 2015-09-03 08:59:17 +02:00
  • f88bb2277e Corrected bookmark link title luccioman 2015-09-03 08:58:14 +02:00
  • 802ea66d19 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server luccioman 2015-09-03 08:04:38 +02:00
  • 5297e80cda fix missing onclick in ConfigPortal to enable checkbox reger 2015-09-03 00:59:14 +02:00
  • 2f847071e1 ignore /DATA (Eclipse Mars) sixcooler 2015-09-02 19:10:39 +02:00
  • cc8d6ad75f Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server luccioman 2015-09-02 08:51:42 +02:00
  • 802ccaead6 fix init of error cache, use latest faildates => load_date_dt reger 2015-09-02 02:36:31 +02:00
  • dba7f15073 apply same size constrain on result image from doc as for linked images see 19f1308bf0 reger 2015-09-01 23:22:48 +02:00
  • 5e45f1a460 enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field reger 2015-09-01 21:47:25 +02:00
  • 70e483ecc6 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server luccioman 2015-09-01 08:53:54 +02:00
  • 4cf875336c complete TODO: getFileExtension handle dot in query part + testcase reger 2015-08-31 23:28:03 +02:00
  • 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has moved and was not cleared anymore. This results in an huge fieldcache. (http://lucene.apache.org/#highlights-of-the-lucene-release-include https://issues.apache.org/jira/browse/LUCENE-5666) Here I try to use DovValues where it is possible. For this I used the Api-Scheme as new basis für the Solr-Schema. This needs at least a complete optimization of the Solr-Index to get a smaller FieldCache. Everything that is indexed with these setting will not use the Fieldcache at all. sixcooler 2015-08-31 20:24:41 +02:00
  • c729d089b6 French Translation update by Luc: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5671 sixcooler 2015-08-31 19:57:57 +02:00
  • e0dda0c01c Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server.git luccioman 2015-08-31 10:26:40 +02:00
  • eaf0e8ff2c start recording/indexing pixel size for image document as for linked images reger 2015-08-31 01:58:36 +02:00
  • c33229fc0c check mime prior to ext for metadata modification for images reger 2015-08-30 23:02:19 +02:00
  • 19f1308bf0 enforce th result images limit to > 16x16px for linked images http://mantis.tokeek.de/view.php?id=594 reger 2015-08-30 02:19:52 +02:00
  • a4509ea2ca Updated french translation for index.html, yacysearch.html and simpleheader.template. Correcte special characters to use HTML entities instead. luccioman 2015-08-27 09:25:11 +02:00
  • 250f6457f0 remove exired domain titan.deep-one.in from bootstrap.seedlist reger 2015-08-26 23:58:08 +02:00
  • 67799ce867 Updated translation of index.html, yacysearch.html and simpleheader.template, corrected some special characters not written as HTML entities. luccioman 2015-08-26 13:57:00 +02:00
  • 0e4ba0360b fix NPE on .yacyh result url of disconnected peer (cleanup yacyshare remaining) reger 2015-08-25 23:26:17 +02:00
  • 7ed812a2bf log missing seed.port in favour of exception to prevent repeating throws reger 2015-08-25 02:19:00 +02:00
  • 206883f80d fix: Preserve protocol in url proxy to connect to http/https. Display warning if https target is viewed over http reger 2015-08-25 01:16:41 +02:00
  • f7b0b3b7b3 avoid runtime exception by earlier testing for seed.ip=null reger 2015-08-23 23:01:20 +02:00
  • 0f80bc8309 upd to jsoup-1.8.3 reger 2015-08-19 22:46:48 +02:00
  • 906b5fd742 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-08-11 00:42:46 +02:00
  • 8f90767889 fix for filesystem crawl Michael Peter Christen 2015-08-11 00:42:26 +02:00
  • a3dd4be749 added / corrected charste to be 1.7 compatible. @Orbiter: please check is this is ok for you sixcooler 2015-08-10 20:53:20 +02:00
  • 8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-08-10 14:27:53 +02:00
  • df3314ac1a added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. Michael Peter Christen 2015-08-10 14:27:44 +02:00
  • 1409cabe8b exclude more default search fields from text copy to text_t for metadata index documents reger 2015-08-09 21:01:30 +02:00
  • e2e73258ca remove obsolete interface SearchAccumulator and unused SRURSSConnector Thread inheritance reger 2015-08-08 18:35:49 +02:00
  • dbbad23e12 removed warnings Michael Peter Christen 2015-08-03 05:37:34 +02:00
  • 500cfa9457 enhanced logging Michael Peter Christen 2015-08-03 05:17:22 +02:00
  • c14bc8d9b7 revert of fq transformation (recent fix) Michael Peter Christen 2015-08-03 05:15:34 +02:00
  • 203df5a750 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-08-03 05:02:26 +02:00
  • fa08ca207e ! finish running crawls before applying ! Allow crawl urls up to 2048 character fix for http://mantis.tokeek.de/view.php?id=575 reger 2015-08-03 00:49:24 +02:00
  • ee77f24e52 use some more declared HeaderFramework constants reger 2015-08-02 22:56:14 +02:00
  • 9e4043731d add missing ; in base.css reger 2015-08-02 21:36:44 +02:00
  • 11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-08-02 14:53:36 +02:00
  • b94bd7f20a a collection of search query enhancements: - fixed superfluous space in query field list - fixed filter query logic - removed look-ahead query which caused that each new search page submitted two solr queries - fixed random solr result orders in case that the solr score was equal: this was then re-ordered by YaCy using the document hash which came from the solr object and that appeared to be random. Now the hash of the url is used and the score is additionally modified by the url length to prevent that this particular case appears at all. Michael Peter Christen 2015-08-02 14:52:41 +02:00
  • 5ba9924289 pom: have Maven dependency management decide on transitive Lucene dependencies reger 2015-08-02 03:39:58 +02:00
  • dbe2594c38 replace deprecated myPublicLocalIP() in AbstractRemoteHandler reger 2015-08-02 00:53:49 +02:00
  • 6d3534e725 remove unused Transmission hit counter reger 2015-08-02 00:20:14 +02:00
  • cb67eb7baf use more absolute path for config file opening as suggested in pull request 5 (https://github.com/yacy/yacy_search_server/pull/5) reger 2015-08-01 23:54:26 +02:00
  • 92e5b217b6 upd to pdfbox-1.8.10 reger 2015-08-01 00:25:40 +02:00
  • 1ccbf739b1 added bayes filter from Philipp Nolte, originally taken from https://github.com/ptnplanet/Java-Naive-Bayes-Classifier and modified inside the loklak.org project. After optimization in loklak it was inserted into the net.yacy.cora.bayes package. It shall be used to create custom search navigation filters. Michael Peter Christen 2015-07-30 14:10:31 +02:00
  • 1bced1ae60 using latest enhanced (un/)gzip methods from loklak for yacy Michael Peter Christen 2015-07-30 13:39:10 +02:00
  • 3e6657288d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-07-30 03:39:11 +02:00
  • de8cfbe1d7 added export option to export the fulltext of the search index text only Michael Peter Christen 2015-07-30 03:21:40 +02:00
  • 165561706d upd to Solr-5.2.1 reger 2015-07-30 00:16:09 +02:00
  • 2fb6ebe88a move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting. Background: some user report problem with connecting/crawling some sites via https which require SNI support (by default switched off in YaCy). On the other hand systems not demanding SNI support are sometimes not properly configured and due to a bug/feature in java 1.7 connection is aborted. The later is more often the case, so the default is still fine. With the java start parameter expert user can no alter the startparameter to -Djsse.enableSNIExtension=true (java default) if they crawl more hosts requiring SNI support. The alternative to let YaCy try both during https handshake (deep inside the httpclient) is not pursut at this time. reger 2015-07-29 23:30:05 +02:00
  • fbeae20b3a try a healing of the cache if the index file is corrupted Michael Peter Christen 2015-07-27 15:16:08 +02:00
  • 7e158ae085 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-07-27 15:03:34 +02:00
  • 03ea723889 added log lines for query performance profiling Michael Peter Christen 2015-07-27 15:03:13 +02:00
  • 7f49dbfbd1 upd to SLF4J-1.7.12 reger 2015-07-27 00:57:19 +02:00