Commit Graph

  • cf0acd2cb4 upgrade to solr 4.2.1 Michael Peter Christen 2013-04-06 16:11:24 +02:00
  • 40b3f2c5fe comment out dead menue link reger 2013-04-06 02:34:56 +02:00
  • bf1e1ddca1 fix typo in prev commit reger 2013-04-06 02:29:49 +02:00
  • d4d93be779 uncomment "used time" calculation for remote search log reger 2013-04-06 02:08:01 +02:00
  • 36202f27b0 improve remote search log, set "Returned Results" to transmitcount (instead of no value) reger 2013-04-05 03:33:33 +02:00
  • e89491271f - fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down) reger 2013-04-04 00:40:59 +02:00
  • 6a9d0b60a3 make sure configured port is reported on recreated mySeed.txt reger 2013-04-01 03:51:57 +02:00
  • 254074b11d Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-03-22 03:46:26 +01:00
  • 870aedf3c6 fixes for better search interface integration in yaml templates Michael Peter Christen 2013-03-20 16:19:49 +01:00
  • 735eb70525 better search timing; prevents '0 results' for very large local indexes >> 10 mio documents Michael Peter Christen 2013-03-19 11:23:18 +01:00
  • 5512be6673 fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion. Michael Peter Christen 2013-03-19 10:33:35 +01:00
  • 342ba1049b - callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced. Michael Peter Christen 2013-03-19 10:32:01 +01:00
  • 65d73e5652 renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery orbiter 2013-03-19 00:59:47 +01:00
  • 31d16f20d7 fix invisible icon not found reger 2013-03-18 00:10:23 +01:00
  • 17ae51e741 increased number of links limitation from 1000 to 10000 for rss feeds and html documents orbiter 2013-03-17 22:13:56 +01:00
  • 243b66ae6d Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy orbiter 2013-03-17 13:39:31 +01:00
  • 7763f2554f add the new PPMbar in Crawler_p for a better style and better use. Frank 2013-03-17 11:43:12 +01:00
  • e4d26d1cb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-03-17 10:52:42 +01:00
  • 940c6849ee enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements) orbiter 2013-03-17 10:52:31 +01:00
  • d57b221921 add: reset Solr schema filed selection to default button in IndexSchema_p reger 2013-03-17 03:46:29 +01:00
  • a725a4242f main release 1.4 Release_1.4 Michael Peter Christen 2013-03-15 10:25:47 +01:00
  • 9406a2e438 fixed NPE during index abstract computation Michael Peter Christen 2013-03-15 10:04:27 +01:00
  • 16e9d4d1dd added a restart hint Michael Peter Christen 2013-03-15 10:00:06 +01:00
  • d725782440 turned severe message to warning message about network failure events Michael Peter Christen 2013-03-15 09:40:02 +01:00
  • b3a54d5b1c fix for wrong class name in log Michael Peter Christen 2013-03-15 09:35:57 +01:00
  • 2d36a7eaf5 - do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries Michael Peter Christen 2013-03-15 00:14:28 +01:00
  • 4af0839be2 use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used Michael Peter Christen 2013-03-14 21:13:12 +01:00
  • b8ed66a55d added all clickdepth computations for source and target paths in webstructure core Michael Peter Christen 2013-03-14 17:54:33 +01:00
  • 6300730d7f refactoring of clickdepth computation as preparation for clickdepth computation of webgraph links Michael Peter Christen 2013-03-14 12:13:02 +01:00
  • 2080fc7406 removed unused tag fields Michael Peter Christen 2013-03-14 10:35:21 +01:00
  • 7804c12976 fix error msg in ConfigHeuristics_p reger 2013-03-14 03:30:25 +01:00
  • 230a12bfe2 adjust Opensearch discover function to new webgraph Solr schema reger 2013-03-14 03:10:54 +01:00
  • 6b13dd0d3d added clickdepth field writing for webgraph core (unfinished) orbiter 2013-03-14 01:35:38 +01:00
  • 47114910d5 fix for possible memory leaks orbiter 2013-03-13 17:55:37 +01:00
  • addba047e2 changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking Michael Peter Christen 2013-03-13 14:47:00 +01:00
  • 38f46eb33d set RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries) reger 2013-03-12 03:13:14 +01:00
  • 2962f2b9e9 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-03-12 02:51:17 +01:00
  • ab74d559fb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-03-11 18:23:43 +01:00
  • 4490133909 removed target_tag_s (superfluous) Michael Peter Christen 2013-03-11 10:46:29 +01:00
  • cd197bb555 fix for NPE if surrogates do not exist orbiter 2013-03-10 19:46:06 +01:00
  • 6ae30f9d0f replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter reger 2013-03-10 05:22:18 +01:00
  • 68e739a90b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-03-10 02:29:38 +01:00
  • 3d9ce9cd04 - added more selection criteria for network seed list - enhanced up script Michael Peter Christen 2013-03-10 02:26:24 +01:00
  • 168e8d9b4d added/fixed missing DOCTYPE line (submitted by Thomas) orbiter 2013-03-08 14:40:09 +01:00
  • 252bb51f98 fix for wrong mime type in noload crawler Michael Peter Christen 2013-03-07 15:31:00 +01:00
  • 25300913fa fixes to search debugging after testing with the different search debugging options Michael Peter Christen 2013-03-05 21:28:22 +01:00
  • 81380ae5c8 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-03-05 12:24:10 +01:00
  • c2fde018b5 concurrent snippet fetching from solr results which do not have snippets Michael Peter Christen 2013-03-05 12:24:01 +01:00
  • b1140e3d82 added debug switches for detailed search testing orbiter 2013-03-05 12:19:32 +01:00
  • cdbfddf091 added filter queries for better image, audio and video results orbiter 2013-03-04 21:18:54 +01:00
  • 587ef83eab added missing cleanup statements for short memory cases during search Michael Peter Christen 2013-03-04 13:01:24 +01:00
  • 2562f052b9 do not put the fulltext field text_t into the search cache because it is not used there and uses a lot of memory orbiter 2013-03-04 12:01:10 +01:00
  • 2b6c79d347 in method exists() also use the new caching-stacks for documents/metadata Michael Peter Christen 2013-03-04 01:13:17 +01:00
  • ae734b3f8d enhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again) Michael Peter Christen 2013-03-04 00:17:29 +01:00
  • 2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed search cache to prevent that too much IO load is on the peer during search. Michael Peter Christen 2013-03-04 00:07:52 +01:00
  • 0d7b4bc891 better protection against OOM during search flush and fixed missing result push Michael Peter Christen 2013-03-03 23:45:47 +01:00
  • 221ed7d764 - enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters Michael Peter Christen 2013-03-03 22:38:50 +01:00
  • 2714b59f38 *) For some reason this seems to fix a ClassCastException on my system (OpenJDK). Marc Nause 2013-03-03 20:38:20 +01:00
  • 3b1d9dc884 made index storage from DHT search result concurrently. This prevents blocking by high CPU usage during search. Also: removed query from Solr for DHT search results; results are taken from the pending queue. Michael Peter Christen 2013-03-02 10:25:52 +01:00
  • f13c0b2abd fix for search orbiter 2013-03-01 19:18:16 +01:00
  • 0f7ea7ad9f - enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr) orbiter 2013-03-01 15:27:17 +01:00
  • 7ff10bdb1b fix of page navigation for formatted totalcount numbers orbiter 2013-03-01 00:48:28 +01:00
  • 08d28eed1a Übersetzung des Domain Navigators als Anbieter Navigator; ist als Nutzen besser erklärbar orbiter 2013-02-28 23:55:46 +01:00
  • f327ffedb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-28 15:55:13 +01:00
  • 9c09fd7d0b better/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages. orbiter 2013-02-28 14:04:08 +01:00
  • 840fa22135 disabled clickdepth computation during craling since that is repeated during clean-up phase. Michael Peter Christen 2013-02-28 02:25:39 +01:00
  • a734fbc4a5 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-02-27 22:44:57 +01:00
  • d74472f562 corrected result counter orbiter 2013-02-27 22:40:23 +01:00
  • 2555542f7a removed the dns prefetch because that was not soo useful orbiter 2013-02-27 20:58:34 +01:00
  • aa3c26c62e added recrawl/reload to CrawlStartSite for a timeout of 3 days orbiter 2013-02-27 11:43:36 +01:00
  • c1b7e61882 added option to create empty vocabularies orbiter 2013-02-27 08:24:37 +01:00
  • e0edad689d fix link to IndexSchema_p.html bubu 2013-02-26 21:12:44 +01:00
  • d957739441 removed size request Michael Peter Christen 2013-02-26 17:53:44 +01:00
  • c95a84103a complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter Michael Peter Christen 2013-02-26 17:16:31 +01:00
  • 35fa718b77 testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js Michael Peter Christen 2013-02-25 14:31:50 +01:00
  • 008288719c fix for schema export to consider also automatically generated coordinate fields Michael Peter Christen 2013-02-25 01:13:03 +01:00
  • 089dee1770 - generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging Michael Peter Christen 2013-02-25 00:09:41 +01:00
  • c16de49f64 fix for webgraph delete query Michael Peter Christen 2013-02-24 18:17:58 +01:00
  • 56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again. Michael Peter Christen 2013-02-24 18:09:34 +01:00
  • 461d46101d - Removed log4j from libraries. This can be removed because the package log4j-over-slf4j is there. From slf4j all loggings are routed to the jdk logger. Now all loggings are consistently done to the jdk logger. - added some lines to the logging properties to suppress many solr logging statements. The number of the logging entries had already become a performance issue, therefore removing these from the log should increase performance. Michael Peter Christen 2013-02-23 16:45:05 +01:00
  • b349c8145b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-23 15:55:21 +01:00
  • 253a7aee88 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-02-23 14:33:29 +01:00
  • 36f9b0fc16 updated wstx-asl to 3.2.9 orbiter 2013-02-23 14:33:17 +01:00
  • 14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-23 08:48:33 +01:00
  • 58e1e6fa2b fixes to schema Michael Peter Christen 2013-02-23 08:14:10 +01:00
  • f291d60c5f on remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index reger 2013-02-22 22:17:45 +01:00
  • d31a109efe remove obsolete Solr "commit within" input field from IndexFederated see 4111606654 reger 2013-02-22 22:03:32 +01:00
  • 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph Michael Peter Christen 2013-02-22 15:45:15 +01:00
  • 89ede0fe84 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-21 13:24:10 +01:00
  • 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema Michael Peter Christen 2013-02-21 13:23:55 +01:00
  • 1951ba61ae remove CPGEN from Windows batch files (classpath for all needed libraries is defined in manifest of yacycore.jar) reger 2013-02-17 03:26:46 +01:00
  • 594ed63f2a fixed interactive search which caused an error if pubDate is not present in a search result orbiter 2013-02-16 20:33:27 +01:00
  • 33bc255e85 prevent that crawl starts with very large url lists cause a time-out in the user front-end Michael Peter Christen 2013-02-15 01:58:28 +01:00
  • 98a4a4aa97 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-15 01:38:23 +01:00
  • b6de1f42dc Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet. Michael Peter Christen 2013-02-15 01:38:10 +01:00
  • efb6cf7d21 Merge branch 'master' of git@gitorious.org:yacy/rc1.git Marc Nause 2013-02-13 19:31:12 +01:00
  • ce5b7afab2 *) removed Skype online indicator (was not working anymore) *) updated ICQ URLs Marc Nause 2013-02-13 19:29:40 +01:00
  • 4111606654 removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit. Michael Peter Christen 2013-02-13 02:29:47 +01:00
  • c20fa3640d fix to unbalanced tag and license for null objects Michael Peter Christen 2013-02-13 01:23:05 +01:00
  • 3a6097966d added jsonp option to yjson result writer Michael Peter Christen 2013-02-13 01:11:57 +01:00