Commit Graph

  • 5eb3ee4e20 Add search navigator interface to allow for additional navigators (plugins) Prepared the first basic navigators (for authors and collections) for the list of SearchEvent.navigatorPlugins and adjusted servlet to use these. - this allows to configure display order of these navigators (by ordering config string) - eventually allows for additional and/or custom navigators using any available index field without need for changing servlets - the Collection navigation has been adjusted to exclude the internal, default robot_* and dht collections from displaying - rwi results are now also checked for navigatior by the refactored navi's reger 2016-10-31 02:17:43 +01:00
  • fd3f58fcaa improve query modifier parsing of "collection:" and possible collision with "on:" in case multiple collection modifier were entered (by mistake) http://mantis.tokeek.de/view.php?id=702 reger 2016-10-31 00:43:01 +01:00
  • 4c7e515769 correct Collection navigatior - search servlet modifier parameter (navigator entries are single collection names, spaces are removed by crawlstart) preparation: for abstraction of navi's reger 2016-10-30 05:19:06 +01:00
  • af39a76bf6 Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable reger 2016-10-29 04:19:46 +02:00
  • 065bcfba75 Merge pull request #88 from sudheesh001/Patch16 Sudheesh Singanamalla 2016-10-28 08:13:21 +03:00
  • d97da1ddb7 Fixes #16 Updates documentation about cloning and build from source sudheesh001 2016-10-12 17:22:15 +03:00
  • 20a1b29ed3 add simple test case for ReferenceContainer helpful for debugging calculated ranking parameter reger 2016-10-26 01:38:40 +02:00
  • 3cc2af8f92 reduce the mix of absolute and relative internal html page links (prefer relative for same pg or neighbors) to ease proxied access e.g. http://mantis.tokeek.de/view.php?id=701 reger 2016-10-25 03:02:31 +02:00
  • 3c7220bc7b Refacture rwi reference word position and word distance calculation used for rwi ranking. Main changes: - introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access) - use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null - adjust assignments and the min() max() and distance() calculation accordingly reger 2016-10-23 19:40:02 +02:00
  • f0639d810c Customized name for Threads still using the default "Thread-n" pattern. luccioman 2016-10-22 17:17:21 +02:00
  • c0379c3cd3 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git luccioman 2016-10-22 09:11:50 +02:00
  • db3b9db9c2 Crawl from local file : faster task end when manually terminating crawl. luccioman 2016-10-22 08:23:48 +02:00
  • 78085fad8d Fixed NullPointerException case. luccioman 2016-10-22 08:23:48 +02:00
  • 4c67ed3f8d catch rwi ranking div by zero exception during rwi search result processing worddistance calculation is effected by concurrent update (normalization) of min/max ranking parameter for wordpositions. On update of min/max the exception is raised in distance calc and now catched. This concurrent update and change of ranking results is needed for speed but should be further checked for optimization reger 2016-10-22 00:53:47 +02:00
  • 47af33a04c Advanced Crawl from local file : better processing of large files. luccioman 2016-10-21 13:03:31 +02:00
  • ee92082a3b Updated javadocs : warning about closing stream responsibility. luccioman 2016-10-21 12:48:36 +02:00
  • 6f49ece22f Fixed redirected URLs processing as crawl start point. luccioman 2016-10-20 12:12:26 +02:00
  • 68217465fe div by null in word distance calculation (again, description in http://mantis.tokeek.de/view.php?id=698) as root cause was not seen, added just workaround reducing in favour over a try catch (for easier followup). reger 2016-10-19 22:55:36 +02:00
  • 7263d17436 Removed mentions of deprecated LURL-db. luccioman 2016-10-19 14:56:25 +02:00
  • c3c4a52408 Added more examples in Blacklist JUnit test. luccioman 2016-10-19 13:14:20 +02:00
  • 8b74a6bf57 fix min/max calculation of WordReferenceVars.distance() Issue was the calculation in AbstractReference with positions.clear() call, this made distance result always 0 (distance needs min 2 positions) and created concurrency issues. + unit test of changes reger 2016-10-17 23:58:28 +02:00
  • da362628fb Added fine log level for too long blacklist matching processing. luccioman 2016-10-17 22:32:19 +02:00
  • aaae7c6462 adjust ConcurrentScoreMap internal value map to interface and use parameter Long -> Integer (saves some bytes) reger 2016-10-16 06:31:48 +02:00
  • 31d2a5645e remove obsolete query variable leftover from 8fb370d9f8 (diff-1d4259005ebfddc11083387857a86175) harmonize ranking shift parameter to 0xFF correct addresult weight parameter to long reger 2016-10-15 19:29:19 +02:00
  • 93ea366778 Updated license header file name luccioman 2016-10-15 11:34:50 +02:00
  • 4c0be4d5d4 Fixed maven compilation error luccioman 2016-10-15 11:34:23 +02:00
  • ba77e8f8ec upd to Jetty 9.2.19 reger 2016-10-15 05:23:18 +02:00
  • a588ed7628 Applied image headers customization to the new ViewFavicon servlet. luccioman 2016-10-14 14:05:38 +02:00
  • d16e57b41e Merge pull request #39 from luccioman/master luccioman 2016-10-14 12:00:39 +02:00
  • 7717a3d43d Fixed license headers on files created to improve favicon management. luccioman 2016-10-14 11:55:49 +02:00
  • 6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git luccioman 2016-10-14 11:29:55 +02:00
  • 7136b1ad60 HTML validation : fixed URL encoding of Pictures link. luccioman 2016-10-14 09:42:43 +02:00
  • 407563b9f0 add lock symbol to messages UI Trans menu item reger 2016-10-14 02:36:35 +02:00
  • 685d8e86bf Avoid frequent data type casting (float/long) for rwi score refactor to using long in URIMetadataNode too (and related call parameters) As remote rwi score's are not used (since v1.83) skip reading float-score , but keep in toString() for communication with older versions. reger 2016-10-14 01:17:34 +02:00
  • 3ccd89e274 Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments luccioman 2016-10-13 16:18:24 +02:00
  • f1f4459f88 Added some unit tests for Blacklist.isListed() luccioman 2016-10-13 15:39:47 +02:00
  • 4b699c469a Blacklist refactoring : extracted a function for easier unit testing luccioman 2016-10-13 15:33:31 +02:00
  • 54cfcc3f56 CrawlCheck_p.html : also display info about disallowed URLs. luccioman 2016-10-12 11:26:59 +02:00
  • 8b341e9818 Robots : properly handle URLs including non ASCII characters luccioman 2016-10-12 11:25:36 +02:00
  • 75bb77f0cb Refactoring : extracted a method to handle authorized action links. luccioman 2016-10-12 09:31:42 +02:00
  • c996b04741 HTML validation : fixed URL encoding of search results action links. luccioman 2016-10-12 09:16:47 +02:00
  • 2b81703828 Refactored search result action links construction. luccioman 2016-10-12 08:45:32 +02:00
  • e68b00678e prevent negative score on URIMetadataNode - in the special case were no solr score is supplied. + assert before use & test case reger 2016-10-11 19:54:50 +02:00
  • 242707f9b4 Fixed loadFromCache with strategy IFFRESH. luccioman 2016-10-10 01:10:35 +02:00
  • c778219768 remove module for swfparser from maven parent pom not longer required for the build see a4465c97d6 reger 2016-10-07 23:49:03 +02:00
  • 094aed8664 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git luccioman 2016-10-07 11:06:34 +02:00
  • c7402a2f89 Removed invalid empty form action. luccioman 2016-10-07 10:57:31 +02:00
  • 37df2e19fd Removed xmlns attribute which no more makes sense in HTML5 pages. luccioman 2016-10-07 10:46:20 +02:00
  • 94924e288f Added some accessibility improvements to the main interface. luccioman 2016-10-07 10:44:45 +02:00
  • dd86f7c44e Fixed HTML validation errors and grouped radios options in fieldsets luccioman 2016-10-07 10:43:06 +02:00
  • fc0c72c84b Switched to the short HTML Doctype luccioman 2016-10-07 10:42:23 +02:00
  • 7c81160f45 correct blacklist export as text url to blacklists_p.txt was using servlet for network access and missing network.unit.name fix for http://mantis.tokeek.de/view.php?id=694 + prevent unresoved_pattern in yacy/list servlet reger 2016-10-07 03:03:41 +02:00
  • b752bcfecb adjust date in text detection to ignore some program version strings like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650 + expand test case reger 2016-10-06 23:37:12 +02:00
  • b017e97421 optimize condenser language detection a little. langdetect probabilities take letter case into account, add words from description and anchors etc. as is. + add it to javadoc reger 2016-10-06 19:03:52 +02:00
  • ae3717d087 adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! ) + remove unused sentenceword map (we use only the count) + upd test case for sentence count reger 2016-10-06 03:41:07 +02:00
  • b5eb7a9217 Removed unnecessary crawlingDomFilterDepth hidden field. luccioman 2016-10-05 13:48:22 +02:00
  • f6d7c6ee1f Fixed Recorded action URLs beginning displayed in /Table_API_p.html luccioman 2016-10-05 12:20:37 +02:00
  • 474f0476c6 adjust Tokenizer sentence count on trailing text after last recognized sentence + upd test case for rwi multi-word-query (leaving results known to fail untested) reger 2016-10-05 05:52:37 +02:00
  • 4963ecb0a0 Add preference (disabled by default) to show the ranking for each result on the HTML UI. JeremyRand 2016-10-04 11:49:16 +00:00
  • 34658ddb9b Merge pull request #76 from luccioman/crawler luccioman 2016-10-04 05:06:18 +02:00
  • 0065c9b9ea Crawl monitoring : refresh running crawls table luccioman 2016-10-04 03:55:49 +02:00
  • e1e632ad84 Switched to the short HTML Doctype luccioman 2016-10-03 21:57:02 +02:00
  • 4d8611e5e7 Tables accessibility : added missing <thead> sections. luccioman 2016-10-03 21:55:38 +02:00
  • 9fb3142317 Restricted variables scope to function handleStatus() in Crawler.js luccioman 2016-10-03 21:52:24 +02:00
  • 3861ac9293 upd maven dependency-check plugin to reflect changes of https://nvd.nist.gov + upd unknown ant script with current lib/jsch version reger 2016-10-04 03:05:26 +02:00
  • 681a61dafb adjust rwi index result word position handling used for rwi ranking - correct WordReferenceVars.toRowEntry posintext parameter to set expected min posintext (the difference is on multi-word queries, while positions are ordered by search word order). - modified posofphrase/posinphrase join operation - to set min posofphrase - and keep posinphrase if not same posofphrase (was set to 0, no differentiation during ranking) + fix compiler msg (missing type declaration) reger 2016-10-04 01:42:18 +02:00
  • 14f7577231 add support for older Word versions (Word6/Word95) to docParser reger 2016-10-03 01:52:51 +02:00
  • 8794e06721 upd to poi-3.15.jar reger 2016-10-03 01:48:35 +02:00
  • e25f2ee88b mention date search parameter in search option help (index.html) reger 2016-10-02 06:36:34 +02:00
  • 1a79c64495 generalize DateDetection with holiday date rules readily available in icu to make sure current dates are recognized (was fixed to 2014 - 2016) + adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text + moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing + add test case for parseline (used by query parser) reger 2016-10-02 03:19:12 +02:00
  • 6f68f08354 correct DateDetection Silvester date add Thanksgiving reger 2016-10-01 03:16:27 +02:00
  • 32a2e3a22a have RSSFeed.getChannel return empty message on missing channel element, a) required b) prevent NPE in rss servlets + add test reger 2016-09-30 21:46:57 +02:00
  • fedb9f8151 del double entry in master.lng reger 2016-09-30 21:42:42 +02:00
  • 8d57b5b970 Added some javadocs. luccioman 2016-09-30 17:12:55 +02:00
  • 4585a60d7e Made use of the constant corresponding to the hard-coded value. luccioman 2016-09-30 17:12:29 +02:00
  • 60df09fff9 Fixed some HTML validation errors : Illegal character in query luccioman 2016-09-30 10:54:53 +02:00
  • a76a46a2e9 Removed invalid rel="[count]" from links in tagcloud. luccioman 2016-09-30 09:43:51 +02:00
  • 862f28eaa6 display number of documents/rss-items for label "docs" in load_rss_p servlet (as replacement for the rarely used "docs" rss-tag for a url to the rss-specification) reger 2016-09-29 23:59:10 +02:00
  • 5027912f30 Fixed <p> spacers : blocks elements such as <div> are not allowed inside luccioman 2016-09-29 14:24:15 +02:00
  • abe489a0b5 Removed unnecessary ARIA "form" role on native HTML form elements. luccioman 2016-09-29 13:42:07 +02:00
  • cca4186044 Fixed HTML validation error : "Stray end tag div" luccioman 2016-09-29 11:42:59 +02:00
  • dcdea2d02f Fixed shutdown for crawler.MaxActiveThreads value greater than 200 luccioman 2016-09-29 10:33:11 +02:00
  • ada473ced2 fix ConfigBasic servlet parameter name for Japanese _jp->_ja reger 2016-09-28 16:08:36 +02:00
  • d286ba2c3e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git luccioman 2016-09-28 14:53:08 +02:00
  • b8f6458152 Prevent yacy main thread from hanging on browser opening process. luccioman 2016-09-28 14:52:30 +02:00
  • cf3a4bdf52 upd to pdfbox-2.0.3 reger 2016-09-27 23:12:10 +02:00
  • 70e1eb30a5 prevent StringIndexOutOfBounds in getLocalFile() + tighten patching of DOS path w/o protocol to drive "LETTER": reger 2016-09-27 22:40:36 +02:00
  • 1bb0b135ac Avoid duplication of various MS Windows file URLs flavors luccioman 2016-09-27 07:53:08 +02:00
  • b9a8476f02 Removed unused import luccioman 2016-09-27 07:41:45 +02:00
  • e73c1eea8c remove unused rootpattern, leftover from commit 9a5ab4e2c1 (diff-d2b184283abed53ae260fc9eabdaef40) reger 2016-09-26 02:54:58 +02:00
  • 6f8c3ccea4 improve url hash computation for file path with mixed java & windows file.separator to compute equal hashes (by normalizing path for computation) + expand test case for to check mixed java / windows file url notation like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html - relates partially to http://mantis.tokeek.de/view.php?id=692 reger 2016-09-25 22:08:12 +02:00
  • bac302bfe4 fix NPE in QuickCrawlLink_p if param doesn't contain crawl url reger 2016-09-24 23:33:21 +02:00
  • e9b9a7f68f add missing text for Supporter.html to master.lng reger 2016-09-24 05:02:37 +02:00
  • efcb6a1e74 fix supported mime XML -> xml for rssParser (mime normalized to lower case for comparison) + add mime text/xml as in use for rss in the wild reger 2016-09-23 23:37:12 +02:00
  • b5ba8f9f68 Added alternative text and title to HostBrowser.html image links luccioman 2016-09-23 13:27:46 +02:00
  • 4aba491156 Fixed HTML validation errors : duplicate ids. luccioman 2016-09-22 16:25:47 +02:00
  • 1c139d70d4 Fixed W3C validation error : percent encode '[' and ']' chars in hrefs. luccioman 2016-09-22 16:20:13 +02:00
  • b3b75b0498 Accessibility : add a customizable alternative text to YaCy log luccioman 2016-09-22 16:08:33 +02:00
  • f2bc1b268d Updated URL fragment validation rules according to current standards luccioman 2016-09-22 11:28:33 +02:00
  • b1b8e69da8 Fixed NullPointerException cases luccioman 2016-09-22 11:25:33 +02:00