Commit Graph

  • 366ceae35a Fixed missing transitive dependency to commons-collections4-4.1 luccioman 2017-08-11 20:50:36 +02:00
  • bf72cbffa3 Updated debian package configuration to match new Java 1.8 target luccioman 2017-08-11 20:34:59 +02:00
  • 119b65389d upde to icu4j-59_1.jar reger 2017-08-10 23:57:37 +02:00
  • 4979439e87 Skip public post of jre version. Added to determine switch to java8 596b5dfa59 reger 2017-08-06 23:41:53 +02:00
  • e918ec199e Replace deprecated ConcurrentHashSet with recommended Java8 ConcurrentHashMap.newKeySet() in postprocessDocuments() reger 2017-08-06 23:26:27 +02:00
  • fb71994342 Harmonizing use of xml reader / sax parser in XMLBlacklistImporter eliminating the need for lib/xercesImpl.jar reger 2017-08-05 23:47:27 +02:00
  • 275d65fffe Patch last_modified date with internal FirstSeenTime() if no date provided to make sure updated documents are indexed with their last-modified date as provided in current crawl. (to patch moddate always with firstseen might bear the risk of miss actual updates). reger 2017-08-05 22:30:06 +02:00
  • d1b23afed6 Remove obsolete Protocol parameter ttl (time to live) not interpreted in target yacy/query.html also Protocol.querySeed() not used and parameter not interpreted in target servlet yacy/query.html reger 2017-08-01 00:59:53 +02:00
  • dedc6552d3 upd to poi-3.16.jar reger 2017-07-31 23:38:10 +02:00
  • 15d78b1064 Replace deprecated getIP with getIPs in Protocol transferURL() and getProfile(). Remember used ip for error handling and departInterface reger 2017-07-31 01:55:01 +02:00
  • ed36b47bec Replace one more deprecated peerDeparture in Protocol.transferIndex() by moving/using interfaceDeparture() in transferRWI() reger 2017-07-30 23:02:15 +02:00
  • 37f44941fb upd to pdfbox-2.0.7.jar reger 2017-07-30 20:09:06 +02:00
  • 41616de0b8 Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning. This overrides Solr default to use managed schema. As we don't use programatic schema changes this directs Solr to use schema.xml, eliminating the warning. reger 2017-07-23 03:55:56 +02:00
  • 0ee8c030c4 Log an error when Solr folder migration fails for some reason. luccioman 2017-07-17 15:35:10 +02:00
  • 44d455dfed upd to jwat-warc-1.1.0.jar reger 2017-07-16 23:37:28 +02:00
  • 588c6e96fb upd version for typeahead.jquery.js in jslicense.html reger 2017-07-16 23:35:56 +02:00
  • 5a646540cc Support parsing gzip files from servers with redundant headers. luccioman 2017-07-16 14:46:46 +02:00
  • 11a7f923d4 Distinguish response parsing failures from unexpected exceptions. luccioman 2017-07-16 14:39:53 +02:00
  • 8100c033a2 URL Viewer : apply crawler size limits when adding to local index. luccioman 2017-07-16 14:37:06 +02:00
  • eda7b0aeb6 Merge branch 'master' of https://github.com/yacy/yacy_search_server luccioman 2017-07-15 08:49:25 +02:00
  • 3005be7349 Clean up unmaintained and unused AugmentParser trail. reger 2017-07-15 00:19:23 +02:00
  • e5cff062b5 Clean up redundant but obsolete jquery.rdfquery-core-1.0.js script lib reger 2017-07-14 23:41:39 +02:00
  • cb4f1358e1 Added gzip parser support for max content bytes limit luccioman 2017-07-13 08:18:40 +02:00
  • 5216c681a9 Added HTML parser support for maximum content bytes parsing limit luccioman 2017-07-13 08:12:10 +02:00
  • 4aafebc014 Merge pull request #122 from Scarfmonster/patch-1 luccioman 2017-07-12 16:03:23 +02:00
  • 651fad6da5 Added RSS parser support for maximum content bytes parsing limit luccioman 2017-07-12 00:18:12 +02:00
  • 452a17a8d5 Finer control on bounded input streams with custom stream implementation luccioman 2017-07-12 00:13:24 +02:00
  • f8f1959ebb Added parsing within bounds implementation to the generic parser. luccioman 2017-07-11 09:07:48 +02:00
  • e0f400a0bd Support trying multiple parsers even when streaming on large resources. luccioman 2017-07-11 09:06:37 +02:00
  • 1e84956721 Support loading local files with a per request specified maximum size. luccioman 2017-07-11 09:04:23 +02:00
  • f369679d1c Fixed read/copy on input streams reading sometimes less than expected. luccioman 2017-07-11 09:00:27 +02:00
  • 23bda133d2 Fix css conflict of YMarks.html to make it viewable. yacy-ymarks.css sidebar conflicts with bootstraps sidebar (different overlay settings). Simply renamed it to ymark-sidebar. reger 2017-07-09 23:08:54 +02:00
  • af32d291c2 upd to commons-fileupload-1.3.3.jar reger 2017-07-08 23:46:10 +02:00
  • a21789d4e7 Fix unresolved pattern in api/share.html by init some display var's reger 2017-07-08 22:46:15 +02:00
  • bf55f1d6e5 Started support of partial parsing on large streamed resources. luccioman 2017-07-08 09:04:03 +02:00
  • 2a87b08cea Removed temporary html parser test code luccioman 2017-07-03 14:53:36 +02:00
  • 1b3c169a9c URL Viewer : decode raw text using the eventual response charset. luccioman 2017-07-03 13:51:14 +02:00
  • 90a7c1affa HTML parser : removed unnecessary remaining recursive processing luccioman 2017-07-03 10:00:53 +02:00
  • e6e20dab52 upd to Jetty 9.4.6.v20170531 Modify loginservice to the changes in Jetty, partially based on pull request #101 https://github.com/yacy/yacy_search_server/pull/101 bu @automenta reger 2017-07-01 23:58:28 +02:00
  • e4c730b99f Updated PerformanceQueues_p.xml API with last related servlet changes luccioman 2017-06-30 11:41:48 +02:00
  • dcc56318bb Made remote search max system load limits configurable from UI. luccioman 2017-06-30 11:30:54 +02:00
  • ddd13b776d Add keyword constraint to rwi query result filter To discard rwi results not matching query keyword: parameter reger 2017-06-30 02:11:18 +02:00
  • e82eaee4b6 Apply consistent behavior on HTTP resource size exceeding limit. luccioman 2017-06-30 01:13:47 +02:00
  • 0b75e92ac2 Do not wrap unnecessarily loader IOExceptions in IOExceptions luccioman 2017-06-30 01:06:17 +02:00
  • 433bdb7c0d Respect maxFileSize limit also when streaming HTTP and when relevant. luccioman 2017-06-30 00:30:54 +02:00
  • 4b72b29ea2 Added an informative title on the crawl start robots.txt status icon luccioman 2017-06-29 11:36:47 +02:00
  • d08f31c3a8 Crawl start Ajax request : properly handle eventual XML parsing errors luccioman 2017-06-29 11:25:27 +02:00
  • 9b1bb2545e Refactored plain-text URLs detection implementation. luccioman 2017-06-27 19:30:40 +02:00
  • 8da3174867 Ensure lower case conversion consistency with any default locale. luccioman 2017-06-27 06:42:33 +02:00
  • 286f3018bd Made mime type and extension normalization locale independent. luccioman 2017-06-26 17:33:56 +02:00
  • 319231a458 Added a generic XML parser, able to parse elements text and URLs. luccioman 2017-06-26 16:30:21 +02:00
  • aeeb8a7dd5 upd to jwat-warc-1.0.6.jar reger 2017-06-25 20:05:37 +02:00
  • f0ba828627 remove unused Solr optional extra handler lib solr-dataimporthandler-6.6.0.jar reger 2017-06-24 23:15:25 +02:00
  • 1773b61b3e upd to jsoup-1.10.3.jar reger 2017-06-24 22:54:43 +02:00
  • 3cedbbd4ed Wrong password was removed after the SSL certificate import Ryszard Goń 2017-06-23 02:23:49 +02:00
  • 64cec2790d Improved character encoding detection from Content-Type header luccioman 2017-06-22 10:50:34 +02:00
  • 1acb7005d0 Added a basic JUnit test with test gz files for the gzip parser luccioman 2017-06-21 09:14:50 +02:00
  • 1e2fb76720 Properly close test files in htmlParser unit test luccioman 2017-06-21 09:11:17 +02:00
  • c41b31dcb3 Cleaned up memory usage page HTML luccioman 2017-06-20 09:21:55 +02:00
  • 0487336ec3 Prevent integer overflow in table statistics and use strong typing luccioman 2017-06-19 17:02:11 +02:00
  • 0f80c978d6 Limit the number of initially previewed links in crawl start pages. luccioman 2017-06-17 09:33:14 +02:00
  • d2a4a27f52 Improved stream-oriented parsing entering conditions. luccioman 2017-06-17 09:26:37 +02:00
  • 32288a8999 Merge branch 'master' of https://github.com/yacy/yacy_search_server luccioman 2017-06-17 08:16:55 +02:00
  • e9b4b29f90 Limit scope of some local JavaScript variables. luccioman 2017-06-16 08:50:57 +02:00
  • 369b8e0e0b added json(p) endpoint for crawl start Michael Peter Christen 2017-06-16 08:44:40 +02:00
  • 83ba45ebae make nsis build script require java 8 reger 2017-06-16 06:31:45 +02:00
  • cf70081cfc update nsi installer java autodl bundleid to use jre-8u131 reger 2017-06-16 02:17:49 +02:00
  • 9220ccbec7 remove reference to velocityresponsewriter in solrconfig.xml it is not longer part of solr-core api http://lucene.apache.org/solr/6_6_0/index.html reger 2017-06-16 00:12:09 +02:00
  • 4be4bfbba6 remove sample path setting in solrconfig.xml not valid in Yacy resulting in startup stop exception after fresh swithch to 1.921 reger 2017-06-15 21:02:18 +02:00
  • 510859bcce update maven pom setting to YaCy version 1.921 java 1.8 and solr 6.6 reger 2017-06-15 20:24:53 +02:00
  • f6e8d71718 Prevent high CPU load at startup, caused by the Solr suggester build. luccioman 2017-06-15 14:13:46 +02:00
  • 9dd790087d Added HT Cache basic statistics (hit rate) luccioman 2017-06-15 09:50:02 +02:00
  • 5fdd5d16b1 Use volatile to ensure concurrent threads use up to date property value luccioman 2017-06-15 09:48:22 +02:00
  • 28b451a0b3 Made Cache compression level and lock timeout user configurable luccioman 2017-06-14 19:02:08 +02:00
  • a7394b479b Limit the synchronization blocking time on some Cache operations. luccioman 2017-06-14 09:13:50 +02:00
  • 73ab4a7b3a Prevent log pollution from unwanted Solr warnings. luccioman 2017-06-14 08:56:11 +02:00
  • c94a8c76bd re-added solr synchronization hack Michael Peter Christen 2017-06-09 12:50:36 +02:00
  • 6fe735945d migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8 Also: now Version 1.921 Michael Peter Christen 2017-06-09 12:25:23 +02:00
  • ce89492319 Ensure system resource release by closing document stream. luccioman 2017-06-08 07:36:11 +02:00
  • 8399275142 Properly close file output streams even on exceptions scenarios. luccioman 2017-06-08 07:19:16 +02:00
  • 4e4dc6c4e5 Removed unnecessary finalize implementation. luccioman 2017-06-06 10:30:02 +02:00
  • 632354e2ff Tokenize result entry keywords and add some styling for display reger 2017-06-04 01:50:40 +02:00
  • c42d17f607 upd to commons-compress-1.14.jar reger 2017-06-03 21:58:04 +02:00
  • a04feac064 Ensure file input streams proper closing in both success and failures luccioman 2017-06-03 04:00:46 +02:00
  • d98c04853d Ensure proper closing of file input streams. luccioman 2017-06-02 12:14:29 +02:00
  • c53c58fa85 Unsure closing ChunkIterator stream in every possible use case. luccioman 2017-06-02 09:47:45 +02:00
  • 29e52bda39 Merge branch 'master' of https://github.com/yacy/yacy_search_server luccioman 2017-06-02 01:47:53 +02:00
  • a9cb083fa1 Improved consistency between loader openInputStream and load functions luccioman 2017-06-02 01:46:06 +02:00
  • a814f3d885 Introduce keyword query parameter This enables keyword navigator to filter on keywords. Added search page output and layout config for keywords, allowing e.g. in Intranet use to display the keywords. No styling or links applied to the keyword text (but is desirable possibly in combination with bootstrap-tagsinput for future/intranet). reger 2017-06-02 01:00:21 +02:00
  • cbccf97361 Added JavaDoc to the getpageinfo_p API servlet. luccioman 2017-05-30 17:38:16 +02:00
  • c226ded799 Fix unescape of URLs having some '%' chars but not percent-encoded luccioman 2017-05-30 12:32:14 +02:00
  • bd88fd303e Deprecated duplicated and internally unused getpageinfo servlet. luccioman 2017-05-30 09:29:28 +02:00
  • 306a82dd71 Fixed scraper NullPointerException cases on malformed URLs. luccioman 2017-05-30 08:48:20 +02:00
  • aa55d71cf5 Fixed a NullPointerException case on Digest authentication. luccioman 2017-05-29 19:16:09 +02:00
  • b65a04087b upd to pdfbox-2.0.6.jar reger 2017-05-24 22:13:42 +02:00
  • 02ec0ed13c Quoted param value in Solr query to avoid unwanted traces in logs luccioman 2017-05-24 08:43:03 +02:00
  • 1be4d32f99 Restored search page default behavior for Tab, Page Up and Down keys luccioman 2017-05-23 07:25:40 +02:00
  • 1737af37cf Set request originator to own peer in warc importer in addition to change in 039162fbf0 reger 2017-05-22 01:56:11 +02:00
  • 039162fbf0 Change warc importer to use defaultsurrogate-crawl profile, as reported by LA_FORGE http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5990 and analysed by @luccioman (see comment 510f11d374) it creates conflict using a other crawlprofile without setting originator. reger 2017-05-22 01:34:08 +02:00
  • 3b1d640a3c enhanced debugging Michael Peter Christen 2017-05-18 00:28:12 +02:00