aa3c26c62eadded recrawl/reload to CrawlStartSite for a timeout of 3 days
orbiter
2013-02-27 11:43:36 +01:00
c1b7e61882added option to create empty vocabularies
orbiter
2013-02-27 08:24:37 +01:00
e0edad689dfix link to IndexSchema_p.html
bubu
2013-02-26 21:12:44 +01:00
d957739441removed size request
Michael Peter Christen
2013-02-26 17:53:44 +01:00
c95a84103acomplete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter
Michael Peter Christen
2013-02-26 17:16:31 +01:00
35fa718b77testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js
Michael Peter Christen
2013-02-25 14:31:50 +01:00
008288719cfix for schema export to consider also automatically generated coordinate fields
Michael Peter Christen
2013-02-25 01:13:03 +01:00
089dee1770- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging
Michael Peter Christen
2013-02-25 00:09:41 +01:00
c16de49f64fix for webgraph delete query
Michael Peter Christen
2013-02-24 18:17:58 +01:00
56d5946a59- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.
Michael Peter Christen
2013-02-24 18:09:34 +01:00
461d46101d- Removed log4j from libraries. This can be removed because the package log4j-over-slf4j is there. From slf4j all loggings are routed to the jdk logger. Now all loggings are consistently done to the jdk logger. - added some lines to the logging properties to suppress many solr logging statements. The number of the logging entries had already become a performance issue, therefore removing these from the log should increase performance.
Michael Peter Christen
2013-02-23 16:45:05 +01:00
b349c8145bMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-23 15:55:21 +01:00
253a7aee88Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter
2013-02-23 14:33:29 +01:00
36f9b0fc16updated wstx-asl to 3.2.9
orbiter
2013-02-23 14:33:17 +01:00
14cceb6b17Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-23 08:48:33 +01:00
58e1e6fa2bfixes to schema
Michael Peter Christen
2013-02-23 08:14:10 +01:00
f291d60c5fon remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index
reger
2013-02-22 22:17:45 +01:00
d31a109eferemove obsolete Solr "commit within" input field from IndexFederated see 4111606654
reger
2013-02-22 22:03:32 +01:00
788288eb9eadded the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph
Michael Peter Christen
2013-02-22 15:45:15 +01:00
89ede0fe84Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-21 13:24:10 +01:00
91a0401d59introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema
Michael Peter Christen
2013-02-21 13:23:55 +01:00
1951ba61aeremove CPGEN from Windows batch files (classpath for all needed libraries is defined in manifest of yacycore.jar)
reger
2013-02-17 03:26:46 +01:00
594ed63f2afixed interactive search which caused an error if pubDate is not present in a search result
orbiter
2013-02-16 20:33:27 +01:00
33bc255e85prevent that crawl starts with very large url lists cause a time-out in the user front-end
Michael Peter Christen
2013-02-15 01:58:28 +01:00
98a4a4aa97Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-15 01:38:23 +01:00
b6de1f42dcFull redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.
Michael Peter Christen
2013-02-15 01:38:10 +01:00
efb6cf7d21Merge branch 'master' of git@gitorious.org:yacy/rc1.git
Marc Nause
2013-02-13 19:31:12 +01:00
ce5b7afab2*) removed Skype online indicator (was not working anymore) *) updated ICQ URLs
Marc Nause
2013-02-13 19:29:40 +01:00
4111606654removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.
Michael Peter Christen
2013-02-13 02:29:47 +01:00
c20fa3640dfix to unbalanced tag and license for null objects
Michael Peter Christen
2013-02-13 01:23:05 +01:00
3a6097966dadded jsonp option to yjson result writer
Michael Peter Christen
2013-02-13 01:11:57 +01:00
de58043205Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.
Michael Peter Christen
2013-02-13 00:33:53 +01:00
d3508fa8fffixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html
Michael Peter Christen
2013-02-13 00:01:38 +01:00
1db23e9eacMoved methods from SolrServerConnector to AbstractSolrConnector with the result that most of these methods become superfluous in other classes. This is a generalization step towards multi-indexes in Solr.
Michael Peter Christen
2013-02-12 22:03:10 +01:00
02fa31b5bfbetter filesearch layout
Michael Peter Christen
2013-02-12 12:21:29 +01:00
e55ec3071dreduced number of facets in yacyinteractive (only filetype necessary)
Michael Peter Christen
2013-02-12 12:00:54 +01:00
16d90859b7reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key
Michael Peter Christen
2013-02-12 11:52:33 +01:00
0d888ff69eMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-12 03:42:58 +01:00
c34af7fe94extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.
Michael Peter Christen
2013-02-12 03:42:46 +01:00
c37d718f16make sure yacy.running is deleted if not running (catch exception) - to prevent following log if YaCy was previously not properly shutdown
reger
2013-02-11 22:53:19 +01:00
762b687e47extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests.
Michael Peter Christen
2013-02-11 22:12:15 +01:00
d70d99fab5added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format.
Michael Peter Christen
2013-02-11 22:10:14 +01:00
6a4878940bfix in html parser and bookmark generation
Michael Peter Christen
2013-02-11 13:28:08 +01:00
51e7ab4f70moved bookmarks back to more prominent location (even if this does not fit to the 'Search Interfaces' headline)
Michael Peter Christen
2013-02-09 06:57:20 +01:00
dee8b24d3cbetter error handling for bookmarks
Michael Peter Christen
2013-02-09 06:55:57 +01:00
e1da39245awhen searching the network, do not search on robinson peers with the old DHT search interface. Now use the solr interface.
Michael Peter Christen
2013-02-08 18:30:08 +01:00
6f6ddaf7e7A robinson peer does not need to write RWI data if such peers are only searched using the solr interface. Searching public rpbinsons will be done with solr only in the future.
Michael Peter Christen
2013-02-08 17:58:54 +01:00
ab4f74c82cfix for xml blacklist import
Michael Peter Christen
2013-02-08 15:12:10 +01:00
7806680ab8fixed a problem with re-feeding of already indexed documents whith coordinates attached.
Michael Peter Christen
2013-02-08 12:45:54 +01:00
cb38e860cfAfter the observation that Windows user simply forget that they started YaCy; YaCy is still running and the user additionally expect that another doubleclick on the YaCy icon simply opens the search windows (again) I decided to add a function that complies to the expectation to the user: simply open the browser pop-up page again if the user starts YaCy while YaCy is still running.
Michael Peter Christen
2013-02-07 23:39:00 +01:00
27894d2c1aMerge branch 'master' of git@gitorious.org:yacy/rc1.git
Marc Nause
2013-02-05 21:09:41 +01:00
75f9568472*) only install files from the RELEASE directory *) minor changes
Marc Nause
2013-02-05 21:02:32 +01:00
eb80405a16added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer
Michael Peter Christen
2013-02-05 12:47:20 +01:00
1e3d8cc235show a link for the host in the host browser; see
Michael Peter Christen
2013-02-04 21:24:57 +01:00
19c46e4acfcatch more exceptions
Michael Peter Christen
2013-02-04 21:24:39 +01:00
7de502f43dMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-04 20:02:35 +01:00
d1cb4cbc84enhanced network scanner, is faster and more flexible now - start more processes - remove superfluous host name resolution - better/more flexible subnet ip range calculation - prefer ipv4 makes better usable ip pre-settings in servlet - extended servlet by new subnet /20 - option - redesign of scanner start process in servlet (generalization)
Michael Peter Christen
2013-02-02 09:51:43 +01:00
592adf7ccbfix for domain navigation
Michael Peter Christen
2013-02-02 07:21:18 +01:00
4ca1b76627less search overhead when first result set is smaller than requested
Michael Peter Christen
2013-02-02 07:20:56 +01:00
f748b0aa7cNPE fix
Michael Peter Christen
2013-02-02 07:20:02 +01:00
7dfcc92b71Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-01-31 13:15:42 +01:00
0b6566a389optimizations when starting large crawl requests with many start urls in one request: - allow larger match-fields in html interface - delete all host hashes at once from zurl - when deleting by host, do not count size of deleted entries since that was the reason it took so long
Michael Peter Christen
2013-01-31 13:15:28 +01:00
a2160054d7ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms
orbiter
2013-01-30 19:33:48 +01:00
ecc10a752cfixes to index enumeration for vocabulary production
orbiter
2013-01-29 18:14:14 +01:00
be5d3a1066adding classpath to Manfiest of yacycore.jar - this allows to start w/o giving explicite java -cp (just java -jar lib/yacycore.jar works) - especially helpful while running YaCy as Win service, making it obsolete to adjust classpath cfg of the service wrapper on upgrades of lib/*.jar's
reger
2013-01-29 03:01:57 +01:00
be27567b53allow more links when starting a crawl by file
Michael Peter Christen
2013-01-28 17:50:23 +01:00
99185d7048one more fix for author_sxt
Michael Peter Christen
2013-01-26 03:59:39 +01:00
b6ae6262f6- add the copyField author_sxt only if author exists - set the solr default search field according to existing fields
Michael Peter Christen
2013-01-26 03:34:46 +01:00
088373b4eacatch exception if solr connection change fails
Michael Peter Christen
2013-01-25 16:06:58 +01:00
8a55fd96e9Merge remote-tracking branch 'aleksejs/rutrans3'
Michael Peter Christen
2013-01-25 13:57:10 +01:00
3252223055Russian translation fixes and additions
Aleksej
2013-01-25 16:05:48 +04:00
3a13906121clear some more caches if running out of memory
sixcooler
2013-01-25 04:24:36 +01:00
6fc4bdddbd*) fixed admin password configuration
Marc Nause
2013-01-24 20:09:33 +01:00
e23a596c1dadded a copyField for author_sxt for automated schema generation
Michael Peter Christen
2013-01-24 18:25:28 +01:00
8651ec35feturned author_s into the multi-valued field author_sxt
Michael Peter Christen
2013-01-24 18:24:31 +01:00
f1a4feda3esecurity fix for suggest (don't let users ask for too much)
Michael Peter Christen
2013-01-24 17:57:28 +01:00
244b157299fix for external solr schema definition
Michael Peter Christen
2013-01-24 16:34:15 +01:00
4589afe056fix NPE when solr does not deliver snippets
Michael Peter Christen
2013-01-24 14:12:31 +01:00
0fe7b6fd3bmigrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets.
Michael Peter Christen
2013-01-24 12:39:19 +01:00
1768c82010removed field selection because that created documents with that field only which was not useful when re-writing the same document
Michael Peter Christen
2013-01-24 03:26:38 +01:00
8eebeea533fix for search result link in ViewFile
Michael Peter Christen
2013-01-24 01:50:59 +01:00
31e854bef6Merge remote-tracking branch 'copro/master'
Michael Peter Christen
2013-01-23 14:41:17 +01:00
4735bd47f4- changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done).
Michael Peter Christen
2013-01-23 14:40:58 +01:00