Commit Graph

  • 2ccf1dba71 upgrade to solr 3.6.1 Michael Peter Christen 2012-08-17 15:11:21 +02:00
  • e651d3e320 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-08-17 14:45:18 +02:00
  • 06a78eecb7 code simplification Michael Peter Christen 2012-08-17 14:43:32 +02:00
  • 54bea21c02 bugfix for solr connector, possibly a cause for http://forum.yacy-websuche.de/viewtopic.php?p=26893#p26893 Michael Peter Christen 2012-08-17 14:34:31 +02:00
  • 9bece5ac5f enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available Michael Peter Christen 2012-08-17 14:22:07 +02:00
  • 8a91f4fa42 local robots.txt: disallow external crawlers to follow the URL proxy cominch 2012-08-17 11:47:39 +02:00
  • 18f989dfb1 - refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata Michael Peter Christen 2012-08-17 01:34:38 +02:00
  • 395b78a0d8 using the solr search index to concurrently search within solr and the rwis during local search requests. Michael Peter Christen 2012-08-17 01:21:56 +02:00
  • 6197caf698 added clear-text search words in query params Michael Peter Christen 2012-08-16 23:05:37 +02:00
  • efafa79db5 - added a content-encoding: gzip to streamed http server responses - finish and close streamed http responses immediately - this applies only to the solr interface which should be much faster now! Michael Peter Christen 2012-08-16 22:35:19 +02:00
  • 23226676c6 FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default. Michael Peter Christen 2012-08-16 18:17:47 +02:00
  • a1b2c9a67d doctype2mime fix, influences metadata conversion between old metadata and solr Michael Peter Christen 2012-08-16 17:49:35 +02:00
  • 7c31be1c80 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-08-16 17:45:26 +02:00
  • 6456a1656a changed local robots.txt to prevent external crawlers to submit random search queries cominch 2012-08-16 17:38:10 +02:00
  • a16206e38b more attempts to clean the index (cleaning is faster then) Michael Peter Christen 2012-08-16 17:24:25 +02:00
  • 703f427303 fixed some peer-ping connection details - larger time-out - removed too old seedlist - fixed a bug in connection test Michael Peter Christen 2012-08-16 17:11:54 +02:00
  • 597bb76e4f get the peer location more quickly Michael Peter Christen 2012-08-16 16:28:57 +02:00
  • 156d457aec fix for Index out of bounds exception in Network servlet orbiter 2012-08-16 07:47:52 +02:00
  • da93addec3 addon to e74d66e28c (removed htmlparser.jar): for Mac App orbiter 2012-08-16 07:28:38 +02:00
  • ae9cd7a118 fix xss bug #204 Lotus 2012-08-15 14:23:21 +02:00
  • 1641835fef replaced yacy xml encoding by solr xml encoding Michael Peter Christen 2012-08-14 13:29:11 +02:00
  • 89fe13e73d enhanced GSA and RSS output format: corrected date, added some missing fields, added xml encoding for utf8 Michael Peter Christen 2012-08-14 13:19:29 +02:00
  • ea49a8aa8c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-08-14 12:40:44 +02:00
  • d988ba50cf added a very rudimentary, incomplete, non-verified GSA response writer for solr. Try this: http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10 Michael Peter Christen 2012-08-14 12:40:26 +02:00
  • aab0b680c3 - added xslt support for solr result formats. try i.e. http://localhost:8090/solr/select?q=*:*&start=0&rows=10&wt=xslt&tr=json.xsl - added servlet-side mime-type configuration for streamed servlets. this is used for the result formatters in solr result formats Michael Peter Christen 2012-08-14 11:12:50 +02:00
  • e74d66e28c augmented browsing: remove htmlparser library cominch 2012-08-14 10:09:46 +02:00
  • e2119f4e76 augmented browsing: replace htmlparser by jsoup, which is more stable and reliable cominch 2012-08-14 10:06:12 +02:00
  • ad62609ec7 added a possibility to define a custom network definition URL for remote management cominch 2012-08-13 16:57:53 +02:00
  • fb0f430685 Merge remote-tracking branch 'original yacy/master' cominch 2012-08-13 16:48:14 +02:00
  • 9448d9a8a2 ups Michael Peter Christen 2012-08-13 14:01:45 +02:00
  • e5ef840f40 - renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader Michael Peter Christen 2012-08-13 13:32:32 +02:00
  • 94a334f128 another fix to the Solr metadata reading process and to the shutdown process Michael Peter Christen 2012-08-13 11:13:53 +02:00
  • b51df6c7e8 - added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html Michael Peter Christen 2012-08-13 10:40:04 +02:00
  • da851c6071 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-08-11 01:21:18 +02:00
  • bd4f03bc85 removed unused class Michael Peter Christen 2012-08-11 01:05:40 +02:00
  • 39f8eb60c3 tried to prevent calls to bad-hack getSize() method and reduced overhead of that method a bit. orbiter 2012-08-10 18:10:25 +02:00
  • 9b88433f45 patch from hint in http://forum.yacy-websuche.de/viewtopic.php?p=26858#p26858 from gaston orbiter 2012-08-10 15:44:37 +02:00
  • e816b88b55 changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr. orbiter 2012-08-10 15:39:10 +02:00
  • 2571e0d47a removed unused classes orbiter 2012-08-10 14:47:44 +02:00
  • f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr. Michael Peter Christen 2012-08-10 13:26:51 +02:00
  • b2b480fff2 more abstraction of the YaCySchema -> Opensearch matching process Michael Peter Christen 2012-08-10 09:48:15 +02:00
  • aa0ef98ffa Merge branch 'master' of git://gitorious.org/~chalker/yacy/chalkers-yacy-rc1 Michael Peter Christen 2012-08-10 09:47:15 +02:00
  • 73f6d69d03 more abstraction for solr query params parsing Michael Peter Christen 2012-08-10 07:58:45 +02:00
  • 24462e9baa set the title every time, it is possible that it has changed Michael Peter Christen 2012-08-10 07:51:57 +02:00
  • dcc72799c4 better abstraction for result writers using controlled vocabularies and URIRefs Michael Peter Christen 2012-08-10 07:45:43 +02:00
  • 136fcb1ad9 refactoring Michael Peter Christen 2012-08-10 06:47:13 +02:00
  • a12f693ec9 added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search. Michael Peter Christen 2012-08-09 18:06:48 +02:00
  • 792ecf2444 Fix an error in Russian translation: "can not" => "can". Сковорода Никита Андреевич 2012-08-08 11:35:45 +04:00
  • bca4a16603 replaced the multivalue generic string field name suffix _ss by _txt because _ss is not part of the standard solr example schema. Michael Peter Christen 2012-08-06 17:58:09 +02:00
  • 67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-08-05 15:49:48 +02:00
  • d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. orbiter 2012-08-05 15:49:27 +02:00
  • 70b10e8316 added the JSON response writer to solr interface, add &wt=json to the servlet GET properties to use this format Michael Peter Christen 2012-08-01 00:14:56 +02:00
  • 3276508d1b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-31 23:49:56 +02:00
  • 3ce04cecf3 bad hack to prevent a bug appearing in solr Michael Peter Christen 2012-07-31 23:49:07 +02:00
  • f32aa9a49c prevent merge of blobs that can't be handled in memory sixcooler 2012-07-31 23:23:16 +02:00
  • bbd242afb4 fix for a NPE Michael Peter Christen 2012-07-30 14:51:01 +02:00
  • 8d944f6517 nowrap from gaston in forum http://forum.yacy-websuche.de/viewtopic.php?p=26815#p26815 Michael Peter Christen 2012-07-30 12:39:47 +02:00
  • 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically. Michael Peter Christen 2012-07-30 10:38:23 +02:00
  • ef488a15f7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-27 12:14:24 +02:00
  • 1687737771 Abstraction of HandleMap and HandleSet Michael Peter Christen 2012-07-27 12:13:53 +02:00
  • 76b037a20a check content domain fix: search image/media should not show pages containing image/media search text should show all/text but image/media sixcooler 2012-07-27 04:11:52 +02:00
  • 9cd409682f close augmented stream if filled from cache to get its content use augmented stream if proxyAugmentation is set only sixcooler 2012-07-26 18:09:40 +02:00
  • e432bb9cd9 better calculation of possible saving in HeapReader index data structure Michael Peter Christen 2012-07-26 10:05:06 +02:00
  • 9549984c65 documentation/comments Michael Peter Christen 2012-07-25 21:34:23 +02:00
  • beb6425f0c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-25 21:18:30 +02:00
  • 83c93e9209 no translation of queue-links sixcooler 2012-07-25 15:35:13 +02:00
  • 3bcd9d622b cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple. Michael Peter Christen 2012-07-25 14:31:54 +02:00
  • 6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup. Michael Peter Christen 2012-07-25 01:53:47 +02:00
  • 315d83cfa0 cleanup Michael Peter Christen 2012-07-24 22:16:56 +02:00
  • 1f41d9c6f5 bugfix for a NPE Michael Peter Christen 2012-07-24 17:29:32 +02:00
  • 76202f068e extended abstraction of local and remote solr index using one front-end for index administration and querying. Michael Peter Christen 2012-07-24 17:23:29 +02:00
  • d3f243e2e1 fixed node type calculation for principal peers Michael Peter Christen 2012-07-23 23:40:50 +02:00
  • 7ec7341f60 added user-authentication protection to solr search (same as implemented for yacysearch) Michael Peter Christen 2012-07-23 21:43:14 +02:00
  • e2a97ef8f6 better explain how to access the embedded solr Michael Peter Christen 2012-07-23 21:31:12 +02:00
  • 826967513b changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts. Michael Peter Christen 2012-07-23 16:28:39 +02:00
  • cba4ab862e fix for http://bugs.yacy.net/view.php?id=202 Michael Peter Christen 2012-07-23 00:36:18 +02:00
  • b76836db7b Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 Michael Peter Christen 2012-07-23 00:35:14 +02:00
  • 36c9875b6e removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid) reger 2012-07-23 00:00:40 +02:00
  • 0640a6f7e6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-22 21:50:44 +02:00
  • 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off orbiter 2012-07-22 13:18:45 +02:00
  • 6cc5d1094e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-07-21 13:34:57 +02:00
  • 05a3ffd03a patches to ensure that solr connectors are active ony if they have a solr object assigned and vice versa orbiter 2012-07-20 11:47:50 +02:00
  • 5a3c829872 embedded solr is only initiated if it is activated with IndexFederated_p.html orbiter 2012-07-20 11:40:33 +02:00
  • 161005ceaa Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-20 09:04:14 +02:00
  • bf4968d748 source change in classpath Michael Peter Christen 2012-07-20 09:04:02 +02:00
  • 3a350a2f83 partial html fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454 Lotus 2012-07-20 08:53:12 +02:00
  • 49ee31f837 added classpath for htroot/solr orbiter 2012-07-20 00:59:58 +02:00
  • 97b7bcf2a6 added a solr search index - by default, a (empty) solr storage instance is created at SEGMENTS/solr_36 - the index is written if in /IndexFederated_p.html the flag "embedded solr search index" is switched on - a standard solr query interface is available now with a new servlet at http://127.0.0.1:8090/solr/select Michael Peter Christen 2012-07-19 11:34:05 +02:00
  • f0a079ac9f allow larger log entries Michael Peter Christen 2012-07-14 16:28:14 +02:00
  • 9b48c9fe2e removed a crawler overhead (terminated loop which searches greatest stack that has zero-waiting urls). This should cause a slightly faster crawl for crawl stacks with many different domains in the crawl queue. Michael Peter Christen 2012-07-14 13:11:04 +02:00
  • 784a4abb18 enhancement in internal data organization which should generate less synchronizations in database access Michael Peter Christen 2012-07-14 13:09:44 +02:00
  • f78ce93a80 collection of speed and memory saving hacks Michael Peter Christen 2012-07-13 21:15:38 +02:00
  • c00a3cf74d less usage of generic logger to avoid logger generation overhead orbiter 2012-07-12 19:54:54 +02:00
  • a196f24f60 prevent enqueueing of non-loggeable logging entries orbiter 2012-07-12 19:42:42 +02:00
  • 482afed07c reduced logging overhead (a bit) orbiter 2012-07-12 19:23:40 +02:00
  • e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-07-12 11:14:04 +02:00
  • bbfa497a3c replaced more size() > 0 by !isEmpty() orbiter 2012-07-12 11:12:21 +02:00
  • 58e7d1952f reduction of logging to prevent too much IO caused be logging Michael Peter Christen 2012-07-12 02:08:11 +02:00
  • 83da68c4c1 fixed a memory leak inside the logger which appeared if the log was writter faster that the logger is able to print this out to its out stream. A very large collection of unwritten log outputs had been seen during strong crawling. The new ArrayBlockingQueue is limited to prevent this case. Michael Peter Christen 2012-07-12 01:23:04 +02:00
  • e3aa05b9dd added creation of subpath pattern when crawl start is 'from file' Michael Peter Christen 2012-07-11 23:18:57 +02:00