Commit Graph

  • e35444dfad Merge pull request #2 from yacy/master Andreas 2016-01-27 17:34:33 +01:00
  • 2048b7e057 support scraping start-/enddate from html tag with property "datetime" This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping). reger 2016-01-26 21:27:44 +01:00
  • 900d4584ba complet resource cleanup of lists in contentscraper's close() reger 2016-01-25 23:54:20 +01:00
  • 06e5cd6164 add support parsing swf-metadata to swfparser flash supports metadata tag in swf file with metadata in xmp (xml) format. parse some common data to include it in the head section of the html string of converttohtml. reger 2016-01-25 22:13:04 +01:00
  • e0ac26d63e Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2016-01-25 08:56:18 +01:00
  • 11b1587067 replace remaining use of java.util.Vector by ArrayList (WebCat-swf) reger 2016-01-24 02:30:27 +01:00
  • 9331acdb18 add support for DEFINEFONT3 (swf8) to webcat parser experienced issue with JPEGTABLE tag (with length=0) causing abort of parsing (ioexception) as we don't use/need it for text parsing skip this tag. reger 2016-01-23 22:46:22 +01:00
  • bf5fca5d99 add missing swf tag constants according to latest spec reduce use of synced vector in webcat parser reger 2016-01-23 20:19:01 +01:00
  • aa60ad1dbc Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2016-01-21 08:12:22 +01:00
  • 1f18653de0 pass parsed swf content trough htmlscraper Swf may contain subset of html tags which shoul'd appear as text. Especially <font> tag may totally screw up metadata servlet if not filtered out. reger 2016-01-21 02:55:05 +01:00
  • 18ecf57792 add support of compressed swf to swfParser from JavaSWF2 (source compatible to WebCat). Moved swf file signature check to parser Changed use of synced vector to list swf InStream reger 2016-01-20 00:58:29 +01:00
  • 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach sixcooler 2016-01-19 20:57:22 +01:00
  • e1dd808e1c fix for 'move test classes to test/java' sixcooler 2016-01-19 20:50:26 +01:00
  • ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2016-01-19 08:06:49 +01:00
  • 6c25710a34 replace bugfixed webcat-swf.jar reger 2016-01-18 23:36:18 +01:00
  • 4213ff84d4 import WebCat swf parser custom source package This package is not available as jar (used jar is a custom compile as we use just a portion of the package) WebCat package is not maintained. To be able to fix bugs, source extract of swf parser imported here. reger 2016-01-18 22:41:49 +01:00
  • bceb779414 refactor libbuild/GitRevMavenTask (marvenize) to be able to add additional modules to build reger 2016-01-18 22:29:17 +01:00
  • 730fb43ab1 add translation DE,FR submenuRanking.template upd translation DE RankingSolr_p reger 2016-01-17 14:52:24 +01:00
  • 84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout) because ViewImage*Test.java breaks test run reger 2016-01-16 19:22:27 +01:00
  • 9f91e6124f add DE translation for submenuCrawler.template + upd submenuIndexControl.template reger 2016-01-16 17:55:20 +01:00
  • ed3e16e092 apply remote result count config value to Bookmark Autosearch + prepare to make the widely unused Bookmark feature optional reger 2016-01-15 02:10:10 +01:00
  • 5d635879f8 Merge pull request #40 from Scarfmonster/autocrawl Michael Peter Christen 2016-01-14 22:19:55 +01:00
  • 7d6e0d8470 Add missing settings to autocrawl settings page Ryszard Goń 2016-01-14 03:27:33 +01:00
  • 7a7a1277bd Add autocrawl settings page Ryszard Goń 2016-01-14 02:40:46 +01:00
  • a98c395023 Add the Autocrawl thread Ryszard Goń 2016-01-14 00:50:23 +01:00
  • 4765e374e6 altered clac. of search result items per page to display taking the existing limits into account but make it consistent with search option screen for admin and public user changes: - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit) - otherwise requests are limited to 100 results per page ( = search option, index.html) (this basically is the major change, inc. limit from 20 to 100 for public user) P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being) reger 2016-01-13 01:30:49 +01:00
  • 231be83eb6 Corrected access to Load_MediawikiWiki.html and Load_PHPBB3.html luc 2016-01-12 22:09:30 +01:00
  • 1728cd30c6 Create autocrawl profiles Ryszard Goń 2016-01-12 16:28:34 +01:00
  • 85a9363012 Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2016-01-12 09:32:22 +01:00
  • abd8ecb503 remove contendom depending override of search result items per page initially introduced e4570bffaf (diff-ae6c130fc11088c830b00ed9256ab56b) (as one part of unexpected difference in actual vs requested results, partial bugfix for http://mantis.tokeek.de/view.php?id=627 ) reger 2016-01-12 01:04:10 +01:00
  • 41767a01c2 Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2016-01-11 23:08:22 +01:00
  • a98d9f6c37 Merge pull request #1 from yacy/master Andreas 2016-01-11 13:18:07 +01:00
  • 8271f783ca upd pom javadoc goal to not fail a build on javadoc errors reger 2016-01-11 01:38:45 +01:00
  • ff27824964 fix swfParser reading file signature before passing to library (current version expects data w/o signature) reger 2016-01-10 01:16:31 +01:00
  • b29db4640c update Maven pom - add release-profile to create the release archive only if profile is activated (speeding up normal compilation) - bind install of the 2 jar's not available in repository to validate phase (was clean) to automatically add these to local repository (with disadvantage it's done every time) reger 2016-01-09 20:32:47 +01:00
  • 04161912a5 fix tray icon switch (using predefined/correct config name) reger 2016-01-09 01:19:06 +01:00
  • 7aa1a29e33 Return more accurate HTTP status 400 with detail message when some error occurs on ViewImage : - missing required parameters - url licence invalid luc 2016-01-08 23:18:13 +01:00
  • bd9dc2f32b Corrected NullPointerException cases occuring in YJsonResponseWriter when no description is available. luc 2016-01-08 20:46:02 +01:00
  • 0076f9f97d Updated documented sample url luc 2016-01-08 20:43:49 +01:00
  • cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized users. Removed URLLicence generation when unnecessary (authorized users) luc 2016-01-08 20:42:57 +01:00
  • e3d53f0248 add de translation for IndexExport_p reger 2016-01-08 02:46:29 +01:00
  • 9f5b768d84 fix typo in translation (de,hi) for AccessTracker_p - rem some not translated in ru (-> currently best maintained translation) reger 2016-01-08 01:10:09 +01:00
  • c91e712178 further refactor using standard java / (one) utf-8 charset variable extending initiative of commit 9a25751850 reger 2016-01-07 16:17:37 +01:00
  • e3e8015306 Merge pull request #28 from Stepanov-Sergey/patch-1 Michael Peter Christen 2016-01-06 14:57:26 +01:00
  • 3dbd3caecf Merge pull request #37 from sudheesh001/LogFix Michael Peter Christen 2016-01-06 14:56:55 +01:00
  • 9a25751850 Merge pull request #38 from luccioman/master Michael Peter Christen 2016-01-06 14:55:54 +01:00
  • bfcca6bfee update on translation files - delete removed servlets AugmentedBrowsingFilters_p.html (de) CrawlStartIntranet_p.html (de) IndexCreateWWW***Queue_p.html (de) Ranking_p.html (de) - add IndexCreateQueues_p.html - rename Settings_Http.inc -> Settings_ProxyAccess.inc Language_p.html -> ConfigLanguage_p.html reger 2016-01-06 11:36:57 +01:00
  • c283efdd6d remove obsolete css style for removed file CacheAdmin_p.html and remove from translations reger 2016-01-06 00:51:49 +01:00
  • 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded charset names. luc 2016-01-05 23:37:05 +01:00
  • 218061752e add missing quote chars in sk.lng translation file + minor: del one redundancy reger 2016-01-05 10:48:51 +01:00
  • e8256bb3b1 remove blekko from opensearch config (not available) see https://blekko.com/ http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633 reger 2016-01-04 04:49:10 +01:00
  • 23ac8d186e Log files are commitable and shouldn't be sudheesh001 2016-01-04 07:45:08 +05:30
  • 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields fixed in 5.4.0 http://issues.apache.org/jira/browse/SOLR-8050 reger 2016-01-03 01:11:27 +01:00
  • 5a35f9383a bump to solr/lucene 5.4.0 sixcooler 2016-01-02 21:07:50 +01:00
  • a58d34a4e8 check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant reger 2016-01-02 05:03:57 +01:00
  • 9636a74633 remove local credential on download of config files from remote systems (blacklists, language, skins) to reduce risk to expose md5-pwd. - remove NoSuchElementException in loop reger 2016-01-01 06:08:41 +01:00
  • e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart - add filename to parameter fieldname - add filecontent to special parameter fieldname$file (some servlets use this $file parameter) reger 2015-12-31 03:04:13 +01:00
  • 1636541c48 correct filename input in settings_seed_upload.inc form to get filename (recently introduced by change form "text" to the more convenient "file" input 50f64ddc3b) reger 2015-12-30 21:27:44 +01:00
  • cd26717ba2 fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619 reger 2015-12-29 20:38:45 +01:00
  • a5faf73afa remove obsolete yacy.init entries interaction.* (related to removed triplestore) reger 2015-12-29 15:41:19 +01:00
  • 775e74b055 Merge branch 'master' of https://github.com/yacy/yacy_search_server sixcooler 2015-12-28 23:23:37 +01:00
  • dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master' sixcooler 2015-12-28 23:20:42 +01:00
  • fe308f47d5 added greeting line to interactive search and harmonized display position with 'Administration' line in all administration pages Michael Peter Christen 2015-12-28 23:20:37 +01:00
  • 582d059fb7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-12-28 22:57:39 +01:00
  • a7b41bd206 use curl downloads in download script with silent mode Michael Peter Christen 2015-12-28 22:57:31 +01:00
  • 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException in worker thread. Writer of importer keeps needs a poison to close the file. On exception (e.g. OOM) add a poison marker in outer most try/catch to assure output queue will terminate in this condition too (and closes+renames the surrogate/in/xxx.prt file) reger 2015-12-28 02:32:00 +01:00
  • a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing handle uncomplete metadata like "NS=43/50//N". For other {expr ... } type entries a try catch added reger 2015-12-27 01:59:15 +01:00
  • fade8452c6 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-12-26 23:45:00 +01:00
  • 7274e50d7f Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2015-12-26 22:25:15 +01:00
  • 5a39f9f679 stub to use a new app launcher for mac Michael Peter Christen 2015-12-26 22:25:08 +01:00
  • 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet to increase browser cache hits for not changing content reger 2015-12-26 17:35:46 +01:00
  • 6d54eb3d36 skip loading document on crawl start for YMark bookmarks by adding a constructor giving the already loaded document as parameter. reger 2015-12-26 01:15:07 +01:00
  • 50f64ddc3b apply default css styles (class btn) to submit buttons reger 2015-12-25 01:08:51 +01:00
  • 7f6ee76eb9 upd to ImageIO 3.2.1 reger 2015-12-25 00:53:16 +01:00
  • 09d3dd13d6 limit bookmark tag cloud font-size to 2.0em reger 2015-12-24 19:39:37 +01:00
  • 3076c87247 fix typo in Steering.html reger 2015-12-24 04:37:24 +01:00
  • 80e2c82249 fix NPE on empty blog importfile parameter reger 2015-12-24 02:00:45 +01:00
  • 8a8e53b1a1 apply default css styles (class btn) to blacklist* submit buttons reger 2015-12-23 23:11:37 +01:00
  • 0fa6340936 apply more default css styles (class btn) to submit buttons reger 2015-12-23 00:04:40 +01:00
  • bf098412cf use input type=file for choosing IndexImportMediawiki_p dump file reger 2015-12-22 20:15:08 +01:00
  • e84d94f8ca fix mime table for ms office / open office documents (causing wrong parser detect in intranet mode) reger 2015-12-22 17:48:24 +01:00
  • 7c6d6cd69a change some more submit button to bootstrap btn css class reger 2015-12-22 01:36:23 +01:00
  • 4eb7fb0ee5 just remove debug leftover reger 2015-12-21 19:47:40 +01:00
  • 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters, and feeding hyperlinks to webgraph processing. reger 2015-12-21 04:42:26 +01:00
  • 67f64af4b4 quick fix: go back to display search results favicon via <img> tag and ViewImage, ! until better solution is found !. reger 2015-12-21 01:05:59 +01:00
  • d5fd031449 fix reading of ippattern config array in URLProxy reger 2015-12-20 15:51:54 +01:00
  • b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards) otherwise use header.mime() differentiated in prev. commit. reger 2015-12-20 15:49:24 +01:00
  • 7a8c077838 fix HeaderFramework.mime() to strip charset parameter. Differentiate mime() and getContentType() which gives the raw header field. This improves parser detection if charsets are included in http content-type field. reger 2015-12-20 06:44:16 +01:00
  • b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly calculated doc hash if different. Testing showed that in some cases delivered url doesn't match the local calculated hash. In this case replace doc.id (and host_id_s) with calculation from url. reger 2015-12-20 02:10:49 +01:00
  • 15e46b2bad exclude in/outboundlinksnofollowcount_i from default schema fields (not used in any function) reger 2015-12-19 21:25:08 +01:00
  • dec3e6ad96 fix: adjust urlstub for mailto links (skip protocol) reger 2015-12-19 20:13:33 +01:00
  • cb83e65f89 drop returning document language "en" if unknown (fix todo) which also harmonizes handling of query.modifier for rwi and solr results (to result must match a given language filter) reger 2015-12-19 01:42:35 +01:00
  • 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document reger 2015-12-18 02:35:44 +01:00
  • 71c416f383 show mailto links in ViewFile.html linklist reger 2015-12-18 01:11:55 +01:00
  • 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified reger 2015-12-17 02:53:10 +01:00
  • 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links reger 2015-12-17 00:36:08 +01:00
  • b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server luc 2015-12-16 03:26:06 +01:00
  • ba0a293f5c Corrected another case of org.apache.lucene.store.AlreadyClosedException" occuring when SearchEvent.cleanup() was called while committing local solr index. luc 2015-12-16 02:26:40 +01:00
  • 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection by checking mailto scheme early. - fix upper case mailto protocol assignment - add test case for getProtocol reger 2015-12-16 03:01:17 +01:00
  • befb2415f8 Corrected frames preview displaying eventually incorrectly in local administration mode. luc 2015-12-16 02:23:58 +01:00