Commit Graph

  • 5d442dad82 avoid NPE in regex checker orbiter 2013-04-20 10:53:49 +02:00
  • 24bcf54100 Merge branch 'master' of git://gitorious.org/~saranshupscale/yacy/yacy-india-rc1 Michael Peter Christen 2013-04-19 09:55:33 +02:00
  • b31793f5d6 Hello world Saransh Sharma 2013-04-19 13:12:23 +05:30
  • 50421171c3 added new schema fields: Michael Peter Christen 2013-04-18 17:21:17 +02:00
  • 566d6c980c checking of document signature for a double-document check now refers only to documents within the same domain Michael Peter Christen 2013-04-17 16:15:27 +02:00
  • 1d30082446 added hindi translation configuration Michael Peter Christen 2013-04-17 12:57:27 +02:00
  • ee9d50e4b8 Hindi Some parts only Saransh Sharma 2013-04-17 14:41:55 +05:30
  • d05dc07cff setting of new default values for ranking Michael Peter Christen 2013-04-16 15:02:00 +02:00
  • 97775fbebc fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/ Michael Peter Christen 2013-04-16 14:45:14 +02:00
  • ac5fa9fe48 fix for result counter logging Michael Peter Christen 2013-04-16 13:32:13 +02:00
  • 298bf2deb5 fix to ranking configuration servlet Michael Peter Christen 2013-04-16 12:38:16 +02:00
  • 2db058b551 added in RankingSolr_p.html a select box to switch between different ranking situations. By default, four situations can be configured. Michael Peter Christen 2013-04-16 11:38:51 +02:00
  • 6fbca35215 fixed api table navigation Michael Peter Christen 2013-04-16 01:39:30 +02:00
  • 7ab5093321 added new solr title_exact_signature_l and description_exact_signature_l to be able to identify unique title and unique description fields. Michael Peter Christen 2013-04-16 01:35:15 +02:00
  • f24ac518e6 redesign of exists()-query (can now be called with query) and the CachedSolrConnector which based its cache on the key value. This will be used to correct the title_unique_b and description_unique_b field. Michael Peter Christen 2013-04-15 14:08:30 +02:00
  • 27d6222880 added new field host_extent_i which, after a crawl and postprocessing, holds the number of documents for the host where the document is hosted. This is necessary for ranking and the norming of references per local host in the ranking computation. Michael Peter Christen 2013-04-14 20:52:40 +02:00
  • 579eb01a49 showing now the details of references count in host browser: external (ext), internal (int) and external hosts (hosts) for each indexed document. Michael Peter Christen 2013-04-14 11:30:57 +02:00
  • 0f4237d8e5 add admin option to delete load errors from index reger 2013-04-14 05:33:01 +02:00
  • 518b20147c skip postprocessing during document.store if no citation index connected (prevent null pointer exception) reger 2013-04-14 02:01:27 +02:00
  • ac478384d3 *) did some long overdue refactoring Marc Nause 2013-04-13 23:04:44 +02:00
  • e99c8789ff *) fixed encoding of query in link to map (in case geolocalization is enabled, "Show search results for "köln" on map") *) applied suggestions of Checkstyle plugin Marc Nause 2013-04-13 21:50:48 +02:00
  • ada3f27de7 added three new field for a better ranking: references_internal_i, references_external_i and references_exthosts_i. These can be used to count and evaluate the number of external links to every web page. An experimental ranking function can be i.e.: div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1)) Michael Peter Christen 2013-04-12 16:17:14 +02:00
  • 082e3274d6 - setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking. Michael Peter Christen 2013-04-12 10:48:41 +02:00
  • a20941c067 resume paused crawls on startup; user expects that restarts 'heal' everything Michael Peter Christen 2013-04-11 15:07:08 +02:00
  • edc0b33f6d - showing references count and clickdepth in host browser - fixed generation and presentation of both values Michael Peter Christen 2013-04-11 14:46:13 +02:00
  • 2c3b024196 if the crawl was paused (automatically), show the reason for pausing in the Crawler_p servlet. orbiter 2013-04-09 18:55:26 +02:00
  • 566a3b0294 fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set reger 2013-04-08 21:25:21 +02:00
  • 989575b447 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-04-07 20:20:09 +02:00
  • 27907c9739 added missing library after solr upgrade Michael Peter Christen 2013-04-07 10:36:05 +02:00
  • f37b4c984c adjust Netbeans IDE project.xml classpath for Solr 4.2.1 jars reger 2013-04-06 23:00:48 +02:00
  • c6c01a3ca2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-04-06 16:11:33 +02:00
  • cf0acd2cb4 upgrade to solr 4.2.1 Michael Peter Christen 2013-04-06 16:11:24 +02:00
  • 40b3f2c5fe comment out dead menue link reger 2013-04-06 02:34:56 +02:00
  • bf1e1ddca1 fix typo in prev commit reger 2013-04-06 02:29:49 +02:00
  • d4d93be779 uncomment "used time" calculation for remote search log reger 2013-04-06 02:08:01 +02:00
  • 36202f27b0 improve remote search log, set "Returned Results" to transmitcount (instead of no value) reger 2013-04-05 03:33:33 +02:00
  • e89491271f - fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down) reger 2013-04-04 00:40:59 +02:00
  • 6a9d0b60a3 make sure configured port is reported on recreated mySeed.txt reger 2013-04-01 03:51:57 +02:00
  • 254074b11d Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-03-22 03:46:26 +01:00
  • 870aedf3c6 fixes for better search interface integration in yaml templates Michael Peter Christen 2013-03-20 16:19:49 +01:00
  • 735eb70525 better search timing; prevents '0 results' for very large local indexes >> 10 mio documents Michael Peter Christen 2013-03-19 11:23:18 +01:00
  • 5512be6673 fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion. Michael Peter Christen 2013-03-19 10:33:35 +01:00
  • 342ba1049b - callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced. Michael Peter Christen 2013-03-19 10:32:01 +01:00
  • 65d73e5652 renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery orbiter 2013-03-19 00:59:47 +01:00
  • 31d16f20d7 fix invisible icon not found reger 2013-03-18 00:10:23 +01:00
  • 17ae51e741 increased number of links limitation from 1000 to 10000 for rss feeds and html documents orbiter 2013-03-17 22:13:56 +01:00
  • 243b66ae6d Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy orbiter 2013-03-17 13:39:31 +01:00
  • 7763f2554f add the new PPMbar in Crawler_p for a better style and better use. Frank 2013-03-17 11:43:12 +01:00
  • e4d26d1cb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-03-17 10:52:42 +01:00
  • 940c6849ee enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements) orbiter 2013-03-17 10:52:31 +01:00
  • d57b221921 add: reset Solr schema filed selection to default button in IndexSchema_p reger 2013-03-17 03:46:29 +01:00
  • a725a4242f main release 1.4 Release_1.4 Michael Peter Christen 2013-03-15 10:25:47 +01:00
  • 9406a2e438 fixed NPE during index abstract computation Michael Peter Christen 2013-03-15 10:04:27 +01:00
  • 16e9d4d1dd added a restart hint Michael Peter Christen 2013-03-15 10:00:06 +01:00
  • d725782440 turned severe message to warning message about network failure events Michael Peter Christen 2013-03-15 09:40:02 +01:00
  • b3a54d5b1c fix for wrong class name in log Michael Peter Christen 2013-03-15 09:35:57 +01:00
  • 2d36a7eaf5 - do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries Michael Peter Christen 2013-03-15 00:14:28 +01:00
  • 4af0839be2 use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used Michael Peter Christen 2013-03-14 21:13:12 +01:00
  • b8ed66a55d added all clickdepth computations for source and target paths in webstructure core Michael Peter Christen 2013-03-14 17:54:33 +01:00
  • 6300730d7f refactoring of clickdepth computation as preparation for clickdepth computation of webgraph links Michael Peter Christen 2013-03-14 12:13:02 +01:00
  • 2080fc7406 removed unused tag fields Michael Peter Christen 2013-03-14 10:35:21 +01:00
  • 7804c12976 fix error msg in ConfigHeuristics_p reger 2013-03-14 03:30:25 +01:00
  • 230a12bfe2 adjust Opensearch discover function to new webgraph Solr schema reger 2013-03-14 03:10:54 +01:00
  • 6b13dd0d3d added clickdepth field writing for webgraph core (unfinished) orbiter 2013-03-14 01:35:38 +01:00
  • 47114910d5 fix for possible memory leaks orbiter 2013-03-13 17:55:37 +01:00
  • addba047e2 changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking Michael Peter Christen 2013-03-13 14:47:00 +01:00
  • 38f46eb33d set RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries) reger 2013-03-12 03:13:14 +01:00
  • 2962f2b9e9 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-03-12 02:51:17 +01:00
  • ab74d559fb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-03-11 18:23:43 +01:00
  • 4490133909 removed target_tag_s (superfluous) Michael Peter Christen 2013-03-11 10:46:29 +01:00
  • cd197bb555 fix for NPE if surrogates do not exist orbiter 2013-03-10 19:46:06 +01:00
  • 6ae30f9d0f replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter reger 2013-03-10 05:22:18 +01:00
  • 68e739a90b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-03-10 02:29:38 +01:00
  • 3d9ce9cd04 - added more selection criteria for network seed list - enhanced up script Michael Peter Christen 2013-03-10 02:26:24 +01:00
  • 168e8d9b4d added/fixed missing DOCTYPE line (submitted by Thomas) orbiter 2013-03-08 14:40:09 +01:00
  • 252bb51f98 fix for wrong mime type in noload crawler Michael Peter Christen 2013-03-07 15:31:00 +01:00
  • 25300913fa fixes to search debugging after testing with the different search debugging options Michael Peter Christen 2013-03-05 21:28:22 +01:00
  • 81380ae5c8 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-03-05 12:24:10 +01:00
  • c2fde018b5 concurrent snippet fetching from solr results which do not have snippets Michael Peter Christen 2013-03-05 12:24:01 +01:00
  • b1140e3d82 added debug switches for detailed search testing orbiter 2013-03-05 12:19:32 +01:00
  • cdbfddf091 added filter queries for better image, audio and video results orbiter 2013-03-04 21:18:54 +01:00
  • 587ef83eab added missing cleanup statements for short memory cases during search Michael Peter Christen 2013-03-04 13:01:24 +01:00
  • 2562f052b9 do not put the fulltext field text_t into the search cache because it is not used there and uses a lot of memory orbiter 2013-03-04 12:01:10 +01:00
  • 2b6c79d347 in method exists() also use the new caching-stacks for documents/metadata Michael Peter Christen 2013-03-04 01:13:17 +01:00
  • ae734b3f8d enhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again) Michael Peter Christen 2013-03-04 00:17:29 +01:00
  • 2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed search cache to prevent that too much IO load is on the peer during search. Michael Peter Christen 2013-03-04 00:07:52 +01:00
  • 0d7b4bc891 better protection against OOM during search flush and fixed missing result push Michael Peter Christen 2013-03-03 23:45:47 +01:00
  • 221ed7d764 - enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters Michael Peter Christen 2013-03-03 22:38:50 +01:00
  • 2714b59f38 *) For some reason this seems to fix a ClassCastException on my system (OpenJDK). Marc Nause 2013-03-03 20:38:20 +01:00
  • 3b1d9dc884 made index storage from DHT search result concurrently. This prevents blocking by high CPU usage during search. Also: removed query from Solr for DHT search results; results are taken from the pending queue. Michael Peter Christen 2013-03-02 10:25:52 +01:00
  • f13c0b2abd fix for search orbiter 2013-03-01 19:18:16 +01:00
  • 0f7ea7ad9f - enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr) orbiter 2013-03-01 15:27:17 +01:00
  • 7ff10bdb1b fix of page navigation for formatted totalcount numbers orbiter 2013-03-01 00:48:28 +01:00
  • 08d28eed1a Übersetzung des Domain Navigators als Anbieter Navigator; ist als Nutzen besser erklärbar orbiter 2013-02-28 23:55:46 +01:00
  • f327ffedb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-02-28 15:55:13 +01:00
  • 9c09fd7d0b better/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages. orbiter 2013-02-28 14:04:08 +01:00
  • 840fa22135 disabled clickdepth computation during craling since that is repeated during clean-up phase. Michael Peter Christen 2013-02-28 02:25:39 +01:00
  • a734fbc4a5 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-02-27 22:44:57 +01:00
  • d74472f562 corrected result counter orbiter 2013-02-27 22:40:23 +01:00
  • 2555542f7a removed the dns prefetch because that was not soo useful orbiter 2013-02-27 20:58:34 +01:00