Commit Graph

  • 3ea8380959 Adding Vimeo tag to wiki commands to embedd Video video with id Copro 2013-01-23 04:00:15 +01:00
  • ee9d7fd93d Added feature to embedd Youtube videos to wiki commands for usage in Wiki, Blog or other servlets Copro 2013-01-23 02:43:58 +01:00
  • ec927ea72b Merge remote-tracking branch 'reger/master' Michael Peter Christen 2013-01-22 17:01:49 +01:00
  • 7159ed2a7d Merge remote-tracking branch 'copro/master' Michael Peter Christen 2013-01-22 17:01:18 +01:00
  • 946fad48c7 Some more German translation reducing the amount of Unused String messages Copro 2013-01-22 15:33:49 +01:00
  • 6690dac845 Russian translation fixes not merged due to conflict Aleksej 2013-01-22 16:19:07 +04:00
  • 9ccdd21d76 Merge remote-tracking branch 'aleksejs/fixtrans' Michael Peter Christen 2013-01-22 11:54:38 +01:00
  • de7c3d95b4 Added German translation for HostBrowser.html Copro 2013-01-22 05:14:37 +01:00
  • 5e5ae01909 updated Russian localization for update system Dmitriy Kazimirov 2013-01-21 21:32:12 +07:00
  • f9c65078f0 A little more fixes for Russian localization Dmitriy Kazimirov 2013-01-20 20:01:29 +07:00
  • ca01d225db A little more fixes for Russian localization Dmitriy Kazimirov 2013-01-20 17:26:11 +07:00
  • 9dc0bea1dc Little more correct and readable Russian localization Dmitriy Kazimirov 2013-01-20 04:52:37 +07:00
  • c1b9113a68 Little more correct and readable Russian localization Dmitriy Kazimirov 2013-01-20 04:43:55 +07:00
  • 9cc72df176 More Russian translations. And if some text is not translated it will be in English and not German Dmitriy Kazimirov 2013-01-19 20:21:47 +07:00
  • db024a4e19 added new solr fields (unused yet; implementation will follow) Michael Peter Christen 2013-01-21 18:02:29 +01:00
  • f5fd2aea18 removed archaic migration code Michael Peter Christen 2013-01-21 17:59:42 +01:00
  • 9b5bdae1b4 Reverted setting of MMapDirectoryFactory from solrconfig; see http://forum.yacy-websuche.de/viewtopic.php?p=27509#p27509 Instead, in the start script is checked if the host is a 64 host and -Dsolr.directoryFactory=solr.MMapDirectoryFactory is set as java option Michael Peter Christen 2013-01-21 17:55:28 +01:00
  • f8f7f33596 add Maven build script reger 2013-01-20 21:08:59 +01:00
  • eb68a30947 solr performance settings the target of these performance settings is the reduction of IO in general and during search in particual. - reduced mergeFactor to 4. This will increase the IO during indexing, but will reduce IO during search. It will also greatly reduce the number of open files which should make it possible to have overall larger indexes until the number of open files in an OS is reached. - increased ramBufferSizeMB to 256mb. This will reduce the number of commits. This change may compensate the reduction of the mergeFactor. - disabled updateLog. This is a real-time search feature which is available in YaCy anyway because a commit is forced if index.html is called. The updateLog feature causes a lot of IO during indexing and search and produced a lot of files in SEGMENTS/solr_40/data/tlog orbiter 2013-01-19 11:21:33 +01:00
  • 60f2a69331 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-01-17 21:53:19 +01:00
  • cba038f97b one more NPE fix Michael Peter Christen 2013-01-17 21:52:56 +01:00
  • f3e705c4fe bump to httpclient / httpcore 4.2.3 (bugfix-release) sixcooler 2013-01-17 20:10:49 +01:00
  • aa067da86b set the 'all' option as option at end of the list because the all option currently select also lists which cannot be exported in xml correctly Michael Peter Christen 2013-01-17 01:04:50 +01:00
  • af465cdca5 fix for wrong robots.txt loading for https protocol see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4579 Michael Peter Christen 2013-01-16 17:38:06 +01:00
  • edbc86d2b0 integrated search term into opensearch result title. this makes better bookmark names when subscribing multiple search results from the same peer Michael Peter Christen 2013-01-16 16:18:03 +01:00
  • c3d50d91f8 relaxing site operator for www prefix: - when using a site operator search for a domain where the domain has a www prefix, also the domain without the www is enclosed - when using a site operator search for a domain where the domain has no www prefix, also the domain with the www in enclosed - in the host navigator, all domains with and without a www prefix are accumulated. That means that the host navigator does never show a host with a www prefix. This should prevent usage mistakes of the site operator. Michael Peter Christen 2013-01-16 14:54:35 +01:00
  • f53703df62 using MMapDirectoryFactory as solution for ClosedChannelException given in https://issues.apache.org/jira/browse/SOLR-2247 Michael Peter Christen 2013-01-16 14:35:37 +01:00
  • db49e91724 fixed a NPE which may appear for freeworld peers without any rwi index data. This the NPE looked like: Caused by: java.lang.NullPointerException at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:279) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:155) at search.respond(search.java:314) ... 12 more Michael Peter Christen 2013-01-16 11:07:20 +01:00
  • 4faa07c214 added a timeout for topic computation (solr is here much slower than the old metadata-db) Michael Peter Christen 2013-01-15 16:20:43 +01:00
  • d2d5be032d added a 'inlink' search option according to the suggestion in the YaCy forum at http://forum.yacy-websuche.de/viewtopic.php?f=18&t=4572#p27410 Michael Peter Christen 2013-01-14 12:50:21 +01:00
  • 76e1e91b11 with strict compiler settings, IndexFederated_p does not compile without @SuppressWarnings("deprecation") Michael Peter Christen 2013-01-14 12:33:01 +01:00
  • 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index) - migrates all entries in old urldb reger 2013-01-14 03:06:24 +01:00
  • 3b6e08b49f prevent checking of urldb if empty - disconnect urlIndexFile if empty - add missing lock class in submenuSearchConfiguration reger 2013-01-12 15:20:23 +01:00
  • 1fb452174a read defaults from yacy.init for "Set to Defaults" button reger 2013-01-05 20:47:18 +01:00
  • f143804382 fix configuration for search page navigators - added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page) - currently redundant setting with part of ConfigPortal page - added missing config for filetype and protocol navigator - adjusted init of SearchEvent to check navigation config setting - renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator) reger 2013-01-05 19:00:54 +01:00
  • 24db2fcd9d fix for Network info Michael Peter Christen 2013-01-05 11:52:35 +01:00
  • 22c694f906 activated the clickdepth_i attribute for solr again because the calculcation of that value is not as extensive as expected and furthermore the value is very useful for ranking Michael Peter Christen 2013-01-05 01:00:18 +01:00
  • becd52a984 added also a re-calculation of reference counts during the post-processing of clickcount calculations. This is a really nice thing to have because the reference count affects ranking. Michael Peter Christen 2013-01-05 00:58:27 +01:00
  • fc47109608 added 'Last Hour' to network statistics Michael Peter Christen 2013-01-05 00:37:52 +01:00
  • 38d3feae65 added separate delete commands for the local+remote solr index, the old metadata and old rwi and for the citation index. The important advancement is the separation of the citation index deletion because that index is responsible for the linkdepth calculation. Now a search index can be deleted without the citation index and that should cause that less clickdepths must be post-processed. Michael Peter Christen 2013-01-04 16:39:34 +01:00
  • 6f0baaa309 added the clickdepth post-processing: some links may have 'shortcuts' to already calculated click depths. There are then calculated if the crawl buffer is empty and therefore no new 'shortcuts' can be discovered. The status of the clickdepth stack (to-be-processed) can be seen using a solr search command like this: http://localhost:8090/solr/select?q=process_sxt:[*%20TO%20*]&start=0&rows=30&fl=sku,clickdepth_i,process_sxt Michael Peter Christen 2013-01-04 16:37:39 +01:00
  • 0f5b6f38c1 enhanced root-url detection Michael Peter Christen 2013-01-03 19:21:21 +01:00
  • 5a0eb1b268 clickpath should not be active by default because it needs extensive computation - partly to be implemented Michael Peter Christen 2013-01-03 01:30:05 +01:00
  • 8ae08a2cac moved HTCache, Heuristics and Parser servlet to a more appropriate menu location Michael Peter Christen 2013-01-03 01:27:16 +01:00
  • 5c0c56cfe1 Preparations to produce a click depth attribute in the search index. This attribute can be used for ranking and for other purpose (demand by customer) The click depth is computed in two steps: - during indexing the current fill-state of the reverse link index is used to backtrack the current page to the root page. The length of that backtrack is the clickdepth. But this does not discover the shortest click depth. To get this, a second process to check again is needed - added a process tag that can be used to do operations on the existing index after a crawl; i.e. calculation the shortest clickpath. Added a field to control this operation but not a method to operate on this. - added a visualization of the clickpath length in the host browser Michael Peter Christen 2013-01-02 20:55:43 +01:00
  • 6861af87e2 removed warnings Michael Peter Christen 2013-01-02 19:05:48 +01:00
  • 295884fd54 - Merge commit '168b1d130d9d67b5e8855a0b50c4ba7ad4a416f8' - fixed conflict in htroot/yacysearch.java - removed nedres check because that causes that the remote server is not called at all in most cases (local index has already results but we want more) - fixed a regex bug (a '=' too much) Michael Peter Christen 2013-01-02 15:08:07 +01:00
  • 276e63401e small sanitary fixes - exclude unix shell scripts in NSIS windows install archive - replace link to env/grafics/yacy.gif to yacy.png (build.nsi) - remove unused code lines (Blacklist_p, Response, WordReferenceVars) - type & xhtml (RankingSolr_p.html) reger 2013-01-02 01:59:47 +01:00
  • f301336adf fix: no results with configuration citation reference index switched off - urlcitationindex != null check added to ResultEntry.referencesCount - plus other places where conflicting procedure was used (and urlcitationindex not already checked != null) reger 2012-12-30 02:13:48 +01:00
  • fe50702eb0 added a filterscannerfail attribute to QueryParams which causes that a check to the network scanner fail/success status can be used/suppressed for search results. This is a feature that comes with the port scanner. orbiter 2012-12-29 17:47:34 +01:00
  • 168b1d130d Adding heuristic to get search results from configured systems which support opensearch specification - any system supporting opensearch specification can be configured - search query is only forwarded to remote system if not enough results available on local peer - discover function provided, checking the local Solr index for links to opensearchdescription files, to add to the config - sample config file with some general search engines with opensearch support reger 2012-12-29 08:24:48 +01:00
  • 7761b60325 fix: Broken Link on Crawler_p.html - issue 218 http://bugs.yacy.net/view.php?id=218 - reduced Solr logging (/select) reger 2012-12-29 04:53:20 +01:00
  • eb90d38cd7 added missing extension 'mkv' for navigation Michael Peter Christen 2012-12-27 13:56:13 +01:00
  • e9e0d63897 Add config option to show HostBrowser link in search result - ConfigPortal: added checkbox Host Browser - yacy.init: added search.result.show.hostbrowser as default = on (true) - fix HostBrowser: broken link to protected WebStructurePicture for public user reger 2012-12-27 10:01:10 +01:00
  • 07861cae45 Release 1.3 Release_1.3 Michael Peter Christen 2012-12-27 05:11:11 +01:00
  • 9dfc9c95d8 updated slf4j and log4j Michael Peter Christen 2012-12-27 04:37:21 +01:00
  • 95712fdc8b update to pdf parser Michael Peter Christen 2012-12-27 04:16:31 +01:00
  • 4a9182ae16 use the search configuration to default the cacheStrategy to the value as given in the search configuration Michael Peter Christen 2012-12-27 03:19:21 +01:00
  • 98819ec3d9 use solr boost configuration to select search fields. At this time it is possible to enter a negative boost value to switch that value off. This might be different in the future with a better input interface. Michael Peter Christen 2012-12-27 03:17:45 +01:00
  • a6ad1d6fd1 update to search tests (use yacy interface and a bugfix) Michael Peter Christen 2012-12-27 03:15:50 +01:00
  • e1f89efd0d - made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish Michael Peter Christen 2012-12-26 21:25:27 +01:00
  • 8f3bd0c387 fix for smb crawl situation (lost too many urls) Michael Peter Christen 2012-12-26 19:15:11 +01:00
  • d456f69381 SeedUpload url : check to reject localhost url included in saveSeedList (same check as in / copied from Seed.isProper() ), to prevent identity change on next startup (due to rejected seeduploadurl). reger 2012-12-24 23:29:02 +01:00
  • fbf84e9ff3 fix SeedUpload setting propery name for include template file reger 2012-12-24 04:13:38 +01:00
  • 4987caf1c9 - apply fix for localhost handling (from yacy2solr) also to metadata2solr reger 2012-12-23 01:30:52 +01:00
  • 0148f1bb8c fix: exception if default work files don't exist reger 2012-12-22 23:03:39 +01:00
  • 9e4033f229 fix for event starter: delete start time when event is removed Michael Peter Christen 2012-12-22 21:16:22 +01:00
  • 99271ffd13 copy work tables from defaults/data/work if exist there and not in DATA/WORK This can be used to create start-up behavior work scripts in the api.bheap table Michael Peter Christen 2012-12-22 20:54:05 +01:00
  • 99edbf6f14 fix for config basic: do not accept empty peer names Michael Peter Christen 2012-12-22 20:52:52 +01:00
  • 24c9bb35f7 extended the Scheduler: introduced scheduled events - an event type (once, regular) can be selected - for this event type, a fixed time can be selected. This may be either directly after startup or at one of the full hours at a day (==25 options) The main point about this feature is the opportunity to start an action directly after startup. That makes it possible to create YaCy distributions which, after started at the first time, start to index parts of the intranet/internet by itself. Michael Peter Christen 2012-12-22 16:27:14 +01:00
  • 433143ba40 removed protocol, tld, ext from the urlmask and created specific navigation field for these Michael Peter Christen 2012-12-19 12:45:40 +01:00
  • 84f82541e8 search process enhancements Michael Peter Christen 2012-12-19 10:41:22 +01:00
  • 02020b590b - removed all extension types from extension navigation which are not proper/known - automatically show the protocol navigation if there is more than http and https - automatically show the extension navigation if there is some media content Michael Peter Christen 2012-12-19 02:38:05 +01:00
  • 01200f06cc using the author field as solr-native facet. this makes it necessary to introduce a copy-field for the author field to be copied to a string field. This field is then used to generate facets. Without this field, the facet would consist only of the words of the author names, not of the full author string. Michael Peter Christen 2012-12-19 01:56:33 +01:00
  • 2a4c064c89 using the publisher information for the author field if no author is given. This applies to cases where only the copyright field in the html header is filled but not the author field Michael Peter Christen 2012-12-19 01:54:35 +01:00
  • bab573361f - using a filter query for facet restriction - calculating the whole search result in at most two sub-queries from solr Michael Peter Christen 2012-12-19 01:00:57 +01:00
  • 7ad5457db0 using the solr facets as navigation in yacyinteractive.html instead of counting locally result types Michael Peter Christen 2012-12-19 00:59:40 +01:00
  • eac9650b31 added another solr field clickdepth_i which reflects the number of clicks which are necessary to get from the portal of a host to a specific document. At this time, only the start document is flagged with clickdepth '0', all other with '-1'. To get the actual clickdepth, a process must use crawled information to collect the actual number of clicks. This will be added in another/next step. Michael Peter Christen 2012-12-18 17:20:42 +01:00
  • 1052263af3 - added a new solr field references_i which stores the number of INCOMING links to the corresponding web page. This information is taken from the reverse link index (a 'little sister' of the RWI index). - this field can be of use to enhance the ranking because a web page with more incoming links can be more more important than others. But this is not true for typical link pages like menues. Therefore the number of outgoing links is needed. - added a new solr attribute 'bf' to solr queries which is a boost function extension. this field can contain a formula which comuptes the boost according to given field values. After some experiments the following forumla is now default: div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4 This takes the number of references and the inbound links. Further experiments are needed to enhance that forumula. Michael Peter Christen 2012-12-18 14:42:35 +01:00
  • 7c3de8b4cd - fix for localhost detection - added IPv6 patterns for localhost detection Michael Peter Christen 2012-12-18 12:52:20 +01:00
  • 34f8786508 removed dependency of vocabulary navigation from Jena and it's triplestore; the vocabulary search is now done using generic solr fields which are created on-the-fly during runtime. Michael Peter Christen 2012-12-18 02:29:03 +01:00
  • 664499bb10 PerformanceQueues: disable input for hardcoded httpd performance values reger 2012-12-16 21:01:13 +01:00
  • ad71747525 fix: set defaul language to "en" reger 2012-12-16 20:53:45 +01:00
  • 9319b90d8a - fixes for host navigation - fixes for filetype navigation - removed unused code Michael Peter Christen 2012-12-15 09:14:49 +01:00
  • cb5cbec14d distinguishing modified query string and original query string Michael Peter Christen 2012-12-15 00:05:46 +01:00
  • fb0fa9a102 - fixed 'delete from subpath' during crawl start which deleted nothing; now works; - changed some crawl start html design details Michael Peter Christen 2012-12-11 13:38:28 +01:00
  • 23d4a62345 fixes in the Russian translation, chmod a-x cn.lng Aleksej 2012-12-11 13:44:25 +04:00
  • 899fd8b62d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-12-10 21:18:56 +01:00
  • 712cc37c40 if maxFileSize < 0 then the file size limit is without limit. orbiter 2012-12-10 21:17:45 +01:00
  • 3f26aabfb3 quickfix for translated link containig word "browse" in ru & uk, see http://bugs.yacy.net/view.php?id=213 reger 2012-12-10 21:08:04 +01:00
  • f86d469973 more search command tools orbiter 2012-12-10 21:01:14 +01:00
  • 54e193a2b8 you can now search for '*' to get just ALL entries in the search index as result list. This makes sense if you intend to search just by using the navigation tools to cut the data set into navigation 'slices'. orbiter 2012-12-10 21:00:30 +01:00
  • 7f5526e6ef allow larger no-proxy expressions orbiter 2012-12-10 20:59:43 +01:00
  • 1228a5798d you can now search for '*' to get just ALL entries in the search index as result list. This makes sense if you intend to search just by using the navigation tools to cut the data set into navigation 'slices'. orbiter 2012-12-10 20:55:11 +01:00
  • 1f33c30d7b re-integrating useForHost method (lost sometime?) to get the noProxy pattern working again. Without using this method all remote urls including the localhost had been accessed through the configured proxy orbiter 2012-12-10 20:44:29 +01:00
  • f1a9c2e604 fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html) see bug http://bugs.yacy.net/view.php?id=215 reger 2012-12-10 20:02:35 +01:00
  • a4a780b871 - fix for bad url conversion in bookmarks when using smb urls - fix for localhost hosts in solr schema host handling orbiter 2012-12-10 07:22:42 +01:00
  • e80dfeca23 - making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171) - blacklist test adding explicite response text "not blocked" if no blacklist match reger 2012-12-08 06:34:48 +01:00
  • e2d499be9e remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list). reger 2012-12-08 00:19:20 +01:00
  • a3cd3852ab introduced a better place to update the lastacc time value in latency Michael Peter Christen 2012-12-07 15:49:23 +01:00