36202f27b0improve remote search log, set "Returned Results" to transmitcount (instead of no value)
reger
2013-04-05 03:33:33 +02:00
e89491271f- fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down)
reger
2013-04-04 00:40:59 +02:00
6a9d0b60a3make sure configured port is reported on recreated mySeed.txt
reger
2013-04-01 03:51:57 +02:00
254074b11dMerge branch 'master' of git://gitorious.org/yacy/rc1.git
reger
2013-03-22 03:46:26 +01:00
870aedf3c6fixes for better search interface integration in yaml templates
Michael Peter Christen
2013-03-20 16:19:49 +01:00
735eb70525better search timing; prevents '0 results' for very large local indexes >> 10 mio documents
Michael Peter Christen
2013-03-19 11:23:18 +01:00
5512be6673fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion.
Michael Peter Christen
2013-03-19 10:33:35 +01:00
342ba1049b- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.
Michael Peter Christen
2013-03-19 10:32:01 +01:00
65d73e5652renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery
orbiter
2013-03-19 00:59:47 +01:00
31d16f20d7fix invisible icon not found
reger
2013-03-18 00:10:23 +01:00
17ae51e741increased number of links limitation from 1000 to 10000 for rss feeds and html documents
orbiter
2013-03-17 22:13:56 +01:00
243b66ae6dMerge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy
orbiter
2013-03-17 13:39:31 +01:00
7763f2554fadd the new PPMbar in Crawler_p for a better style and better use.
Frank
2013-03-17 11:43:12 +01:00
e4d26d1cb4Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter
2013-03-17 10:52:42 +01:00
940c6849eeenhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)
orbiter
2013-03-17 10:52:31 +01:00
d57b221921add: reset Solr schema filed selection to default button in IndexSchema_p
reger
2013-03-17 03:46:29 +01:00
9406a2e438fixed NPE during index abstract computation
Michael Peter Christen
2013-03-15 10:04:27 +01:00
16e9d4d1ddadded a restart hint
Michael Peter Christen
2013-03-15 10:00:06 +01:00
d725782440turned severe message to warning message about network failure events
Michael Peter Christen
2013-03-15 09:40:02 +01:00
b3a54d5b1cfix for wrong class name in log
Michael Peter Christen
2013-03-15 09:35:57 +01:00
2d36a7eaf5- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries
Michael Peter Christen
2013-03-15 00:14:28 +01:00
4af0839be2use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used
Michael Peter Christen
2013-03-14 21:13:12 +01:00
b8ed66a55dadded all clickdepth computations for source and target paths in webstructure core
Michael Peter Christen
2013-03-14 17:54:33 +01:00
6300730d7frefactoring of clickdepth computation as preparation for clickdepth computation of webgraph links
Michael Peter Christen
2013-03-14 12:13:02 +01:00
2080fc7406removed unused tag fields
Michael Peter Christen
2013-03-14 10:35:21 +01:00
7804c12976fix error msg in ConfigHeuristics_p
reger
2013-03-14 03:30:25 +01:00
230a12bfe2adjust Opensearch discover function to new webgraph Solr schema
reger
2013-03-14 03:10:54 +01:00
6b13dd0d3dadded clickdepth field writing for webgraph core (unfinished)
orbiter
2013-03-14 01:35:38 +01:00
47114910d5fix for possible memory leaks
orbiter
2013-03-13 17:55:37 +01:00
addba047e2changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking
Michael Peter Christen
2013-03-13 14:47:00 +01:00
38f46eb33dset RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries)
reger
2013-03-12 03:13:14 +01:00
2962f2b9e9Merge branch 'master' of git://gitorious.org/yacy/rc1.git
reger
2013-03-12 02:51:17 +01:00
ab74d559fbMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter
2013-03-11 18:23:43 +01:00
4490133909removed target_tag_s (superfluous)
Michael Peter Christen
2013-03-11 10:46:29 +01:00
cd197bb555fix for NPE if surrogates do not exist
orbiter
2013-03-10 19:46:06 +01:00
6ae30f9d0freplace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter
reger
2013-03-10 05:22:18 +01:00
68e739a90bMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-03-10 02:29:38 +01:00
3d9ce9cd04- added more selection criteria for network seed list - enhanced up script
Michael Peter Christen
2013-03-10 02:26:24 +01:00
168e8d9b4dadded/fixed missing DOCTYPE line (submitted by Thomas)
orbiter
2013-03-08 14:40:09 +01:00
252bb51f98fix for wrong mime type in noload crawler
Michael Peter Christen
2013-03-07 15:31:00 +01:00
25300913fafixes to search debugging after testing with the different search debugging options
Michael Peter Christen
2013-03-05 21:28:22 +01:00
81380ae5c8Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-03-05 12:24:10 +01:00
c2fde018b5concurrent snippet fetching from solr results which do not have snippets
Michael Peter Christen
2013-03-05 12:24:01 +01:00
cdbfddf091added filter queries for better image, audio and video results
orbiter
2013-03-04 21:18:54 +01:00
587ef83eabadded missing cleanup statements for short memory cases during search
Michael Peter Christen
2013-03-04 13:01:24 +01:00
2562f052b9do not put the fulltext field text_t into the search cache because it is not used there and uses a lot of memory
orbiter
2013-03-04 12:01:10 +01:00
2b6c79d347in method exists() also use the new caching-stacks for documents/metadata
Michael Peter Christen
2013-03-04 01:13:17 +01:00
ae734b3f8denhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again)
Michael Peter Christen
2013-03-04 00:17:29 +01:00
2d472a39f4DHT-transferred metadata and crawl receipts now also use the delayed search cache to prevent that too much IO load is on the peer during search.
Michael Peter Christen
2013-03-04 00:07:52 +01:00
0d7b4bc891better protection against OOM during search flush and fixed missing result push
Michael Peter Christen
2013-03-03 23:45:47 +01:00
221ed7d764- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters
Michael Peter Christen
2013-03-03 22:38:50 +01:00
2714b59f38*) For some reason this seems to fix a ClassCastException on my system (OpenJDK).
Marc Nause
2013-03-03 20:38:20 +01:00
3b1d9dc884made index storage from DHT search result concurrently. This prevents blocking by high CPU usage during search. Also: removed query from Solr for DHT search results; results are taken from the pending queue.
Michael Peter Christen
2013-03-02 10:25:52 +01:00
f13c0b2abdfix for search
orbiter
2013-03-01 19:18:16 +01:00
0f7ea7ad9f- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)
orbiter
2013-03-01 15:27:17 +01:00
7ff10bdb1bfix of page navigation for formatted totalcount numbers
orbiter
2013-03-01 00:48:28 +01:00
08d28eed1aÜbersetzung des Domain Navigators als Anbieter Navigator; ist als Nutzen besser erklärbar
orbiter
2013-02-28 23:55:46 +01:00
f327ffedb4Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-28 15:55:13 +01:00
9c09fd7d0bbetter/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages.
orbiter
2013-02-28 14:04:08 +01:00
840fa22135disabled clickdepth computation during craling since that is repeated during clean-up phase.
Michael Peter Christen
2013-02-28 02:25:39 +01:00
a734fbc4a5Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter
2013-02-27 22:44:57 +01:00
d74472f562corrected result counter
orbiter
2013-02-27 22:40:23 +01:00
2555542f7aremoved the dns prefetch because that was not soo useful
orbiter
2013-02-27 20:58:34 +01:00
aa3c26c62eadded recrawl/reload to CrawlStartSite for a timeout of 3 days
orbiter
2013-02-27 11:43:36 +01:00
c1b7e61882added option to create empty vocabularies
orbiter
2013-02-27 08:24:37 +01:00
e0edad689dfix link to IndexSchema_p.html
bubu
2013-02-26 21:12:44 +01:00
d957739441removed size request
Michael Peter Christen
2013-02-26 17:53:44 +01:00
c95a84103acomplete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter
Michael Peter Christen
2013-02-26 17:16:31 +01:00
35fa718b77testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js
Michael Peter Christen
2013-02-25 14:31:50 +01:00
008288719cfix for schema export to consider also automatically generated coordinate fields
Michael Peter Christen
2013-02-25 01:13:03 +01:00
089dee1770- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging
Michael Peter Christen
2013-02-25 00:09:41 +01:00
c16de49f64fix for webgraph delete query
Michael Peter Christen
2013-02-24 18:17:58 +01:00
56d5946a59- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.
Michael Peter Christen
2013-02-24 18:09:34 +01:00
461d46101d- Removed log4j from libraries. This can be removed because the package log4j-over-slf4j is there. From slf4j all loggings are routed to the jdk logger. Now all loggings are consistently done to the jdk logger. - added some lines to the logging properties to suppress many solr logging statements. The number of the logging entries had already become a performance issue, therefore removing these from the log should increase performance.
Michael Peter Christen
2013-02-23 16:45:05 +01:00
b349c8145bMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-23 15:55:21 +01:00
253a7aee88Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter
2013-02-23 14:33:29 +01:00
36f9b0fc16updated wstx-asl to 3.2.9
orbiter
2013-02-23 14:33:17 +01:00
14cceb6b17Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-23 08:48:33 +01:00
58e1e6fa2bfixes to schema
Michael Peter Christen
2013-02-23 08:14:10 +01:00
f291d60c5fon remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index
reger
2013-02-22 22:17:45 +01:00
d31a109eferemove obsolete Solr "commit within" input field from IndexFederated see 4111606654
reger
2013-02-22 22:03:32 +01:00
788288eb9eadded the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph
Michael Peter Christen
2013-02-22 15:45:15 +01:00
89ede0fe84Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-21 13:24:10 +01:00
91a0401d59introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema
Michael Peter Christen
2013-02-21 13:23:55 +01:00
1951ba61aeremove CPGEN from Windows batch files (classpath for all needed libraries is defined in manifest of yacycore.jar)
reger
2013-02-17 03:26:46 +01:00
594ed63f2afixed interactive search which caused an error if pubDate is not present in a search result
orbiter
2013-02-16 20:33:27 +01:00
33bc255e85prevent that crawl starts with very large url lists cause a time-out in the user front-end
Michael Peter Christen
2013-02-15 01:58:28 +01:00
98a4a4aa97Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen
2013-02-15 01:38:23 +01:00
b6de1f42dcFull redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.
Michael Peter Christen
2013-02-15 01:38:10 +01:00
efb6cf7d21Merge branch 'master' of git@gitorious.org:yacy/rc1.git
Marc Nause
2013-02-13 19:31:12 +01:00
ce5b7afab2*) removed Skype online indicator (was not working anymore) *) updated ICQ URLs
Marc Nause
2013-02-13 19:29:40 +01:00
4111606654removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.
Michael Peter Christen
2013-02-13 02:29:47 +01:00
c20fa3640dfix to unbalanced tag and license for null objects
Michael Peter Christen
2013-02-13 01:23:05 +01:00
3a6097966dadded jsonp option to yjson result writer
Michael Peter Christen
2013-02-13 01:11:57 +01:00