Commit Graph

  • aeff31cd44 fix for workflow processor (cause: latest redesign for less threads) orbiter 2013-05-12 21:36:20 +02:00
  • 77faeada4d small memory leak patch Michael Peter Christen 2013-05-11 11:19:06 +02:00
  • b24d1d18e4 removed synchronization and concurrency in Fulltext class, concurrent deletions are now handled in ConcurrentUpdateSolrConnector Michael Peter Christen 2013-05-11 10:53:12 +02:00
  • f965d04496 added new peer icons for Mentor peers and Mentee peers (not used yet) Michael Peter Christen 2013-05-10 17:33:02 +02:00
  • b9b446bca6 - added ssl configuration sign (a lock) to network statistic/table - fixed a bug in bitfield Michael Peter Christen 2013-05-10 17:32:21 +02:00
  • 7095446ad3 added checkbox (near port) to switch on ssl support (https access) to the admin interface. Michael Peter Christen 2013-05-10 13:49:46 +02:00
  • e6c8b545c2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-05-10 12:16:55 +02:00
  • a83c2fe833 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-05-10 12:02:40 +02:00
  • 4baa0d4a97 Added a default keystore for ssl encryption of the YaCy web interface. This will enable https-access to YaCy, but this feature is disabled by default using the new server.https=false attribute. This has two purposes: - make it easier for everyone to use https (just set server.https=true) - provide the basis for secure yacy-to-yacy communication in the future orbiter 2013-05-10 12:02:31 +02:00
  • 0aef60f66e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-05-10 06:03:24 +02:00
  • da191c839d reduce SolrConnectorLogging setting (from default ALL to INFO) reger 2013-05-10 05:54:07 +02:00
  • aaddb4809c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-05-10 04:57:15 +02:00
  • 038f956821 fix for sitemap detection: the sitemap url was not visible if it appeared after the declaration of robots allow/deny for the crawler because the sitemap parser terminated after the allow/deny rules had been found. Now the parser reads the robots.txt until the end to discover also sitemap rules at the end of the file. Michael Peter Christen 2013-05-10 04:56:58 +02:00
  • 4fc6837690 - fix monitor url of crawl job in PerformanceQueues_p.html - reduce logging of every index add (switch embeddedsolr.add from info to debug) reger 2013-05-10 04:38:13 +02:00
  • 442ed50be0 removed some unnecessary synchronizations Michael Peter Christen 2013-05-09 03:06:48 +02:00
  • 9bd2aee180 migrated to solr 4.3.0 Michael Peter Christen 2013-05-09 02:17:53 +02:00
  • ad050ec88d - upgraded httpclient, httpcore and httpmime - removed httpclient 3.1 which has been used by solrj < 4.x.x and is now not used any more - fixed some parts in YaCy which used methods from httpclient 3.1 Michael Peter Christen 2013-05-09 00:22:45 +02:00
  • 4b100f8b48 Merge branch 'master' of ssh://gitorious.org/yacy/rc1 Michael Peter Christen 2013-05-08 23:46:03 +02:00
  • 3abf516ca7 merged classpath Bitte geben Sie eine Versionsbeschreibung für Ihre Änderungen ein. Zeilen, Michael Peter Christen 2013-05-08 23:45:29 +02:00
  • a1c989002b fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4652 generate dht data even if dht receive and dht transmission is switched off orbiter 2013-05-08 16:48:45 +02:00
  • 48e9a54e80 updated pdf parser orbiter 2013-05-08 15:17:06 +02:00
  • e26bdd4a52 fixes to deletion methods (removed unnecessary concurrency and added removal of crawl queue entries) Michael Peter Christen 2013-05-08 13:26:25 +02:00
  • f2c9b0b5f2 better robustness of Concurrent Solr Connector against update/deletion thread failure Michael Peter Christen 2013-05-08 12:41:24 +02:00
  • f7f3e28c5e prevent that the size of the index is computed too many times. Because the index size is now provided by solr, and the only way to do that is a match for [* TO *], a size computation is quite complex and time-consuming. Therefore this patch prevents that the method is called at all and if necessary puts a DOS-preventing barrier in front of it. Michael Peter Christen 2013-05-08 11:50:46 +02:00
  • cca19d94d4 re-declared some fields to be of type string rather than text which makes them more efficient and less large Michael Peter Christen 2013-05-06 16:45:54 +02:00
  • cc90f82dbb increased default proxy client timeout to one minute Michael Peter Christen 2013-05-06 14:58:18 +02:00
  • ed1d5bace6 draw the names of other peers which receive/send dht into the network graphic Michael Peter Christen 2013-05-06 14:27:39 +02:00
  • b528448332 enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page. Michael Peter Christen 2013-05-05 23:39:46 +02:00
  • 58d85b5b80 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-05-05 21:05:47 +02:00
  • 24d2b4baee remove pre 1.0 migration statement which possibly overwrites user navigator setting reger 2013-05-05 05:00:42 +02:00
  • f1bb54943e typo Michael Peter Christen 2013-05-04 09:34:06 +02:00
  • d7fd346917 - added regular-expression based deletions - on-demand collection-list generation for collection-based deletions instead of a default collection-list presentation (this makes calling the interface much faster since the computation of collections lists for large indexes may take some seconds) Michael Peter Christen 2013-05-04 01:14:10 +02:00
  • 3841854c97 abstraction of catchall term Michael Peter Christen 2013-05-04 00:14:22 +02:00
  • ea85674be2 added the date to error documents Michael Peter Christen 2013-05-04 00:14:00 +02:00
  • 72003b109b Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-05-03 03:56:25 +02:00
  • 4fec35a665 adjust Test case EmbeddedSolrConnector reger 2013-05-03 03:55:14 +02:00
  • 6fafed2180 fix for solr cache when a delete buffer is filled and a document, which is the delete queue, is replaced with a new one. Michael Peter Christen 2013-05-03 02:03:30 +02:00
  • 20b767f35e preventing score computation in solr where applicable Michael Peter Christen 2013-05-03 02:02:35 +02:00
  • 7de5b9cfa0 fix for http://bugs.yacy.net/view.php?id=233 - check geolocation coordinates and accept only those, which are well-formed - the solr push process does not stop crawling any more if after 20 requests to Solr Solr does not accept the record. Instead, a severe log entry asks the user to create a bug request orbiter 2013-05-03 00:24:39 +02:00
  • e145afb8d6 fix for PerformanceMemory showing UNRESOLVED_PATTERN by removing solr-cache-stuff, which is not available anymore sixcooler 2013-05-02 15:47:21 +02:00
  • ee217dbdee remove sort order in all cases where not needed Michael Peter Christen 2013-04-30 11:44:56 +02:00
  • 70e981b333 prevent that long-running deletion tasks block a hard commit. Michael Peter Christen 2013-04-30 11:09:21 +02:00
  • bb4bf3d8fd infinity timeout bug protection patch Michael Peter Christen 2013-04-30 11:06:48 +02:00
  • 1b102d98d8 - added index deletion to index administration submenu - added index deletion processes to the process scheduler/recorder Michael Peter Christen 2013-04-30 02:11:28 +02:00
  • ee95e772cf Merge branch 'master' of git://gitorious.org/~saranshupscale/yacy/yacy-india-rc1 Michael Peter Christen 2013-04-30 00:20:42 +02:00
  • ab686900c1 New Hindi Translation Saransh Sharma 2013-04-30 03:33:21 +05:30
  • d1be4127e7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-04-29 19:31:40 +02:00
  • 0e2ee00fea added an index deletion servlet and some style changes for the 'dangerous' engage-button Michael Peter Christen 2013-04-29 19:30:53 +02:00
  • 1aac722cc6 added another solr connector, the ConcurrentUpdateSolrConnector which does not block when long-running updates to solr are made. This is realized using blocking queues which process all long-running tasks in the background. Also some bugfixes to existing connectors. Michael Peter Christen 2013-04-29 19:30:04 +02:00
  • 0af7803367 added more features to ScoreMap (pretty toString) Michael Peter Christen 2013-04-29 19:28:17 +02:00
  • f36a7da5f6 - re-introduced existById in solr connector. - intruduced raw-queries for the re-introduced byId-Queries (they are hopefully faster than full edismax queries) - removed the cached solr connector (testing this) to rely only on the solr built-in search caches. That should save some RAM (also). We will see if this is usable. Michael Peter Christen 2013-04-28 21:20:14 +02:00
  • e4f7e5bcfe fixed bad css change Michael Peter Christen 2013-04-28 20:09:45 +02:00
  • 46fa800bc7 added httpstatus_i to automatically switched on fields (used in all search queries) reger 2013-04-27 03:11:44 +02:00
  • 3502b4c697 refactoring (renaming) of yacy-solr api Michael Peter Christen 2013-04-27 01:32:18 +02:00
  • 3a0fcfbeda Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-04-26 10:50:08 +02:00
  • 25499eead5 - added a new field for the regular expression in crawl start - added the field in crawl profile - adopted logging end error management - adopted duplicate document detection - added a new rule to the indexing process to reject non-matching content - full redesign of the expert crawl start servlet The new filter field can now be seen in /CrawlStartExpert_p.html at Section "Document Filter", subsection item "Filter on Content of Document" Michael Peter Christen 2013-04-26 10:49:55 +02:00
  • 0a9b0992f3 RinkingSolr_p: include warning if boost field not in local index reger 2013-04-26 02:26:38 +02:00
  • e1bfe9d07a - reduction of the concurrently running processes to make YaCy more adjusted to smaller and 1-core devices. - the workflow processor now starts no process at all. these are started as soon as parser/condenser/indexing queues are filled. - better abstraction orbiter 2013-04-25 11:33:17 +02:00
  • c091000165 added collection attribute also to the rss feed reader Michael Peter Christen 2013-04-24 01:14:35 +02:00
  • 43ca359e24 Merge branch 'master' of ssh://gitorious.org/yacy/rc1 Michael Peter Christen 2013-04-23 21:01:08 +02:00
  • 2d60dfb3e1 Merge branch 'master' of git://gitorious.org/~saranshupscale/yacy/yacy-india-rc1 Michael Peter Christen 2013-04-23 21:00:49 +02:00
  • f7571386a3 added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time). orbiter 2013-04-23 20:42:54 +02:00
  • 04b61e08c8 More Translation Saransh Sharma 2013-04-23 19:31:17 +05:30
  • 3e79bd4b1f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-04-23 12:15:46 +02:00
  • d571e739b6 increased row limitation for authorized users from 10000 to 100000000 in solr interface orbiter 2013-04-23 12:15:33 +02:00
  • d937c55204 extended limitation of dom export size from 100000 to 100000000 Michael Peter Christen 2013-04-22 22:33:13 +02:00
  • fc2095ac67 some extensions to raster plotter to transform a RGB picture to an indexed color scheme. This is needed for gif animations Michael Peter Christen 2013-04-22 14:33:04 +02:00
  • c1a2175fbc added transparency to gif image animation and the integration to the YaCy httpd for on-the-fly generated gifs (including animated gifs) Michael Peter Christen 2013-04-21 12:29:05 +02:00
  • a1fffe8e86 fixed default ranking values Michael Peter Christen 2013-04-21 12:27:27 +02:00
  • 5d442dad82 avoid NPE in regex checker orbiter 2013-04-20 10:53:49 +02:00
  • 24bcf54100 Merge branch 'master' of git://gitorious.org/~saranshupscale/yacy/yacy-india-rc1 Michael Peter Christen 2013-04-19 09:55:33 +02:00
  • b31793f5d6 Hello world Saransh Sharma 2013-04-19 13:12:23 +05:30
  • 50421171c3 added new schema fields: Michael Peter Christen 2013-04-18 17:21:17 +02:00
  • 566d6c980c checking of document signature for a double-document check now refers only to documents within the same domain Michael Peter Christen 2013-04-17 16:15:27 +02:00
  • 1d30082446 added hindi translation configuration Michael Peter Christen 2013-04-17 12:57:27 +02:00
  • ee9d50e4b8 Hindi Some parts only Saransh Sharma 2013-04-17 14:41:55 +05:30
  • d05dc07cff setting of new default values for ranking Michael Peter Christen 2013-04-16 15:02:00 +02:00
  • 97775fbebc fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/ Michael Peter Christen 2013-04-16 14:45:14 +02:00
  • ac5fa9fe48 fix for result counter logging Michael Peter Christen 2013-04-16 13:32:13 +02:00
  • 298bf2deb5 fix to ranking configuration servlet Michael Peter Christen 2013-04-16 12:38:16 +02:00
  • 2db058b551 added in RankingSolr_p.html a select box to switch between different ranking situations. By default, four situations can be configured. Michael Peter Christen 2013-04-16 11:38:51 +02:00
  • 6fbca35215 fixed api table navigation Michael Peter Christen 2013-04-16 01:39:30 +02:00
  • 7ab5093321 added new solr title_exact_signature_l and description_exact_signature_l to be able to identify unique title and unique description fields. Michael Peter Christen 2013-04-16 01:35:15 +02:00
  • f24ac518e6 redesign of exists()-query (can now be called with query) and the CachedSolrConnector which based its cache on the key value. This will be used to correct the title_unique_b and description_unique_b field. Michael Peter Christen 2013-04-15 14:08:30 +02:00
  • 27d6222880 added new field host_extent_i which, after a crawl and postprocessing, holds the number of documents for the host where the document is hosted. This is necessary for ranking and the norming of references per local host in the ranking computation. Michael Peter Christen 2013-04-14 20:52:40 +02:00
  • 579eb01a49 showing now the details of references count in host browser: external (ext), internal (int) and external hosts (hosts) for each indexed document. Michael Peter Christen 2013-04-14 11:30:57 +02:00
  • 0f4237d8e5 add admin option to delete load errors from index reger 2013-04-14 05:33:01 +02:00
  • 518b20147c skip postprocessing during document.store if no citation index connected (prevent null pointer exception) reger 2013-04-14 02:01:27 +02:00
  • ac478384d3 *) did some long overdue refactoring Marc Nause 2013-04-13 23:04:44 +02:00
  • e99c8789ff *) fixed encoding of query in link to map (in case geolocalization is enabled, "Show search results for "köln" on map") *) applied suggestions of Checkstyle plugin Marc Nause 2013-04-13 21:50:48 +02:00
  • ada3f27de7 added three new field for a better ranking: references_internal_i, references_external_i and references_exthosts_i. These can be used to count and evaluate the number of external links to every web page. An experimental ranking function can be i.e.: div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1)) Michael Peter Christen 2013-04-12 16:17:14 +02:00
  • 082e3274d6 - setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking. Michael Peter Christen 2013-04-12 10:48:41 +02:00
  • a20941c067 resume paused crawls on startup; user expects that restarts 'heal' everything Michael Peter Christen 2013-04-11 15:07:08 +02:00
  • edc0b33f6d - showing references count and clickdepth in host browser - fixed generation and presentation of both values Michael Peter Christen 2013-04-11 14:46:13 +02:00
  • 2c3b024196 if the crawl was paused (automatically), show the reason for pausing in the Crawler_p servlet. orbiter 2013-04-09 18:55:26 +02:00
  • 566a3b0294 fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set reger 2013-04-08 21:25:21 +02:00
  • 989575b447 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-04-07 20:20:09 +02:00
  • 27907c9739 added missing library after solr upgrade Michael Peter Christen 2013-04-07 10:36:05 +02:00
  • f37b4c984c adjust Netbeans IDE project.xml classpath for Solr 4.2.1 jars reger 2013-04-06 23:00:48 +02:00
  • c6c01a3ca2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-04-06 16:11:33 +02:00