Commit Graph

  • aa3c26c62e added recrawl/reload to CrawlStartSite for a timeout of 3 days orbiter 2013-02-27 11:43:36 +01:00
  • c1b7e61882 added option to create empty vocabularies orbiter 2013-02-27 08:24:37 +01:00
  • e0edad689d fix link to IndexSchema_p.html bubu 2013-02-26 21:12:44 +01:00
  • d957739441 removed size request Michael Peter Christen 2013-02-26 17:53:44 +01:00
  • c95a84103a complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter Michael Peter Christen 2013-02-26 17:16:31 +01:00
  • 35fa718b77 testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js Michael Peter Christen 2013-02-25 14:31:50 +01:00
  • 008288719c fix for schema export to consider also automatically generated coordinate fields Michael Peter Christen 2013-02-25 01:13:03 +01:00
  • 089dee1770 - generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging Michael Peter Christen 2013-02-25 00:09:41 +01:00
  • c16de49f64 fix for webgraph delete query Michael Peter Christen 2013-02-24 18:17:58 +01:00
  • 56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again. Michael Peter Christen 2013-02-24 18:09:34 +01:00
  • 461d46101d - Removed log4j from libraries. This can be removed because the package log4j-over-slf4j is there. From slf4j all loggings are routed to the jdk logger. Now all loggings are consistently done to the jdk logger. - added some lines to the logging properties to suppress many solr logging statements. The number of the logging entries had already become a performance issue, therefore removing these from the log should increase performance. Michael Peter Christen 2013-02-23 16:45:05 +01:00
  • b349c8145b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-23 15:55:21 +01:00
  • 253a7aee88 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-02-23 14:33:29 +01:00
  • 36f9b0fc16 updated wstx-asl to 3.2.9 orbiter 2013-02-23 14:33:17 +01:00
  • 14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-23 08:48:33 +01:00
  • 58e1e6fa2b fixes to schema Michael Peter Christen 2013-02-23 08:14:10 +01:00
  • f291d60c5f on remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index reger 2013-02-22 22:17:45 +01:00
  • d31a109efe remove obsolete Solr "commit within" input field from IndexFederated see 4111606654 reger 2013-02-22 22:03:32 +01:00
  • 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph Michael Peter Christen 2013-02-22 15:45:15 +01:00
  • 89ede0fe84 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-21 13:24:10 +01:00
  • 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema Michael Peter Christen 2013-02-21 13:23:55 +01:00
  • 1951ba61ae remove CPGEN from Windows batch files (classpath for all needed libraries is defined in manifest of yacycore.jar) reger 2013-02-17 03:26:46 +01:00
  • 594ed63f2a fixed interactive search which caused an error if pubDate is not present in a search result orbiter 2013-02-16 20:33:27 +01:00
  • 33bc255e85 prevent that crawl starts with very large url lists cause a time-out in the user front-end Michael Peter Christen 2013-02-15 01:58:28 +01:00
  • 98a4a4aa97 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-15 01:38:23 +01:00
  • b6de1f42dc Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet. Michael Peter Christen 2013-02-15 01:38:10 +01:00
  • efb6cf7d21 Merge branch 'master' of git@gitorious.org:yacy/rc1.git Marc Nause 2013-02-13 19:31:12 +01:00
  • ce5b7afab2 *) removed Skype online indicator (was not working anymore) *) updated ICQ URLs Marc Nause 2013-02-13 19:29:40 +01:00
  • 4111606654 removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit. Michael Peter Christen 2013-02-13 02:29:47 +01:00
  • c20fa3640d fix to unbalanced tag and license for null objects Michael Peter Christen 2013-02-13 01:23:05 +01:00
  • 3a6097966d added jsonp option to yjson result writer Michael Peter Christen 2013-02-13 01:11:57 +01:00
  • de58043205 Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr. Michael Peter Christen 2013-02-13 00:33:53 +01:00
  • d3508fa8ff fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html Michael Peter Christen 2013-02-13 00:01:38 +01:00
  • 1db23e9eac Moved methods from SolrServerConnector to AbstractSolrConnector with the result that most of these methods become superfluous in other classes. This is a generalization step towards multi-indexes in Solr. Michael Peter Christen 2013-02-12 22:03:10 +01:00
  • 02fa31b5bf better filesearch layout Michael Peter Christen 2013-02-12 12:21:29 +01:00
  • e55ec3071d reduced number of facets in yacyinteractive (only filetype necessary) Michael Peter Christen 2013-02-12 12:00:54 +01:00
  • 16d90859b7 reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key Michael Peter Christen 2013-02-12 11:52:33 +01:00
  • 0d888ff69e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-12 03:42:58 +01:00
  • c34af7fe94 extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries. Michael Peter Christen 2013-02-12 03:42:46 +01:00
  • c37d718f16 make sure yacy.running is deleted if not running (catch exception) - to prevent following log if YaCy was previously not properly shutdown reger 2013-02-11 22:53:19 +01:00
  • 762b687e47 extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests. Michael Peter Christen 2013-02-11 22:12:15 +01:00
  • d70d99fab5 added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format. Michael Peter Christen 2013-02-11 22:10:14 +01:00
  • 6a4878940b fix in html parser and bookmark generation Michael Peter Christen 2013-02-11 13:28:08 +01:00
  • 51e7ab4f70 moved bookmarks back to more prominent location (even if this does not fit to the 'Search Interfaces' headline) Michael Peter Christen 2013-02-09 06:57:20 +01:00
  • dee8b24d3c better error handling for bookmarks Michael Peter Christen 2013-02-09 06:55:57 +01:00
  • e1da39245a when searching the network, do not search on robinson peers with the old DHT search interface. Now use the solr interface. Michael Peter Christen 2013-02-08 18:30:08 +01:00
  • 6f6ddaf7e7 A robinson peer does not need to write RWI data if such peers are only searched using the solr interface. Searching public rpbinsons will be done with solr only in the future. Michael Peter Christen 2013-02-08 17:58:54 +01:00
  • ab4f74c82c fix for xml blacklist import Michael Peter Christen 2013-02-08 15:12:10 +01:00
  • 7806680ab8 fixed a problem with re-feeding of already indexed documents whith coordinates attached. Michael Peter Christen 2013-02-08 12:45:54 +01:00
  • cb38e860cf After the observation that Windows user simply forget that they started YaCy; YaCy is still running and the user additionally expect that another doubleclick on the YaCy icon simply opens the search windows (again) I decided to add a function that complies to the expectation to the user: simply open the browser pop-up page again if the user starts YaCy while YaCy is still running. Michael Peter Christen 2013-02-07 23:39:00 +01:00
  • 27894d2c1a Merge branch 'master' of git@gitorious.org:yacy/rc1.git Marc Nause 2013-02-05 21:09:41 +01:00
  • 75f9568472 *) only install files from the RELEASE directory *) minor changes Marc Nause 2013-02-05 21:02:32 +01:00
  • eb80405a16 added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer Michael Peter Christen 2013-02-05 12:47:20 +01:00
  • 1e3d8cc235 show a link for the host in the host browser; see Michael Peter Christen 2013-02-04 21:24:57 +01:00
  • 19c46e4acf catch more exceptions Michael Peter Christen 2013-02-04 21:24:39 +01:00
  • 7de502f43d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-04 20:02:35 +01:00
  • 3bc5ee6e3d *) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore) Marc Nause 2013-02-04 19:57:28 +01:00
  • 4f270d89e2 another NPE Michael Peter Christen 2013-02-04 18:04:52 +01:00
  • 921091c3a6 use thread-safe http connection manager for authenticated remote solr connections Michael Peter Christen 2013-02-04 17:48:04 +01:00
  • e8f7b85b98 fixes to internal RWI usage if RWI is switched off (NPE etc) Michael Peter Christen 2013-02-04 17:11:02 +01:00
  • 3834829b37 bugfixes and more logging for solr connector Michael Peter Christen 2013-02-04 16:42:10 +01:00
  • bc00097cbf arrr... forgot the new library Michael Peter Christen 2013-02-04 12:02:37 +01:00
  • 09a2b09c48 guava update Michael Peter Christen 2013-02-04 11:21:05 +01:00
  • 80fe3d7860 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-04 10:57:54 +01:00
  • 4323621a76 update to Solr 4.1.0 Michael Peter Christen 2013-02-04 10:55:49 +01:00
  • 639c114199 remove jetty from classpath - as it was moved last commit sixcooler 2013-02-03 23:24:19 +01:00
  • 160ce568b3 move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete reger 2013-02-03 22:32:38 +01:00
  • 07a20e8253 removed unused import orbiter 2013-02-02 10:52:39 +01:00
  • d1cb4cbc84 enhanced network scanner, is faster and more flexible now - start more processes - remove superfluous host name resolution - better/more flexible subnet ip range calculation - prefer ipv4 makes better usable ip pre-settings in servlet - extended servlet by new subnet /20 - option - redesign of scanner start process in servlet (generalization) Michael Peter Christen 2013-02-02 09:51:43 +01:00
  • 592adf7ccb fix for domain navigation Michael Peter Christen 2013-02-02 07:21:18 +01:00
  • 4ca1b76627 less search overhead when first result set is smaller than requested Michael Peter Christen 2013-02-02 07:20:56 +01:00
  • f748b0aa7c NPE fix Michael Peter Christen 2013-02-02 07:20:02 +01:00
  • 7dfcc92b71 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-01-31 13:15:42 +01:00
  • 0b6566a389 optimizations when starting large crawl requests with many start urls in one request: - allow larger match-fields in html interface - delete all host hashes at once from zurl - when deleting by host, do not count size of deleted entries since that was the reason it took so long Michael Peter Christen 2013-01-31 13:15:28 +01:00
  • a2160054d7 ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms orbiter 2013-01-30 19:33:48 +01:00
  • ecc10a752c fixes to index enumeration for vocabulary production orbiter 2013-01-29 18:14:14 +01:00
  • be5d3a1066 adding classpath to Manfiest of yacycore.jar - this allows to start w/o giving explicite java -cp (just java -jar lib/yacycore.jar works) - especially helpful while running YaCy as Win service, making it obsolete to adjust classpath cfg of the service wrapper on upgrades of lib/*.jar's reger 2013-01-29 03:01:57 +01:00
  • be27567b53 allow more links when starting a crawl by file Michael Peter Christen 2013-01-28 17:50:23 +01:00
  • 3777b338c7 bugfix: location url for migrate urldb button onclick reger 2013-01-27 06:13:49 +01:00
  • 8447814a31 correct headermenue in migrateurldb_p.html - update NetBeans project path reger 2013-01-26 23:43:09 +01:00
  • 99185d7048 one more fix for author_sxt Michael Peter Christen 2013-01-26 03:59:39 +01:00
  • b6ae6262f6 - add the copyField author_sxt only if author exists - set the solr default search field according to existing fields Michael Peter Christen 2013-01-26 03:34:46 +01:00
  • 088373b4ea catch exception if solr connection change fails Michael Peter Christen 2013-01-25 16:06:58 +01:00
  • 8a55fd96e9 Merge remote-tracking branch 'aleksejs/rutrans3' Michael Peter Christen 2013-01-25 13:57:10 +01:00
  • 3252223055 Russian translation fixes and additions Aleksej 2013-01-25 16:05:48 +04:00
  • 7293c0413b Russian localization:index.html fix Dmitriy Kazimirov 2013-01-25 16:08:08 +07:00
  • 3a13906121 clear some more caches if running out of memory sixcooler 2013-01-25 04:24:36 +01:00
  • 6fc4bdddbd *) fixed admin password configuration Marc Nause 2013-01-24 20:09:33 +01:00
  • e23a596c1d added a copyField for author_sxt for automated schema generation Michael Peter Christen 2013-01-24 18:25:28 +01:00
  • 8651ec35fe turned author_s into the multi-valued field author_sxt Michael Peter Christen 2013-01-24 18:24:31 +01:00
  • f1a4feda3e security fix for suggest (don't let users ask for too much) Michael Peter Christen 2013-01-24 17:57:28 +01:00
  • 244b157299 fix for external solr schema definition Michael Peter Christen 2013-01-24 16:34:15 +01:00
  • 4589afe056 fix NPE when solr does not deliver snippets Michael Peter Christen 2013-01-24 14:12:31 +01:00
  • 0fe7b6fd3b migrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets. Michael Peter Christen 2013-01-24 12:39:19 +01:00
  • 1768c82010 removed field selection because that created documents with that field only which was not useful when re-writing the same document Michael Peter Christen 2013-01-24 03:26:38 +01:00
  • 8eebeea533 fix for search result link in ViewFile Michael Peter Christen 2013-01-24 01:50:59 +01:00
  • 5bed1a7893 Russian localization update Dmitriy Kazimirov 2013-01-23 17:11:45 +07:00
  • 31e854bef6 Merge remote-tracking branch 'copro/master' Michael Peter Christen 2013-01-23 14:41:17 +01:00
  • 4735bd47f4 - changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done). Michael Peter Christen 2013-01-23 14:40:58 +01:00
  • 0025983993 Fix typo embedd -> embed Copro 2013-01-23 04:11:55 +01:00