Commit Graph

  • ebfaf753b7 - faster initialization of index files - removal of not used space if index files shrink (rare, but possible) Michael Peter Christen 2014-01-28 12:39:58 +01:00
  • ba5ab11cc4 less logging orbiter 2014-01-27 21:54:52 +01:00
  • 322854a5f8 fix auth for forced ping Michael Peter Christen 2014-01-27 15:56:02 +01:00
  • fbf4f77d80 fixed missing corona in network picture Michael Peter Christen 2014-01-27 15:43:08 +01:00
  • 4b7f2fcf38 updated bootstrap seedlist list Michael Peter Christen 2014-01-27 13:55:06 +01:00
  • d2b8f2b477 enhancements for staticIP and ipv6 handling Michael Peter Christen 2014-01-27 13:48:20 +01:00
  • a71718a459 add config value for ssl/https port (default=8443) adjust server routines to use config reger 2014-01-27 01:09:56 +01:00
  • 91d79c1ac4 disable wrong forward to https on port change reger 2014-01-26 21:50:42 +01:00
  • a3e2cca8e9 improve isOlder check to not overwrite node index with metadata on equal load date reger 2014-01-26 01:00:52 +01:00
  • 193b8235c2 remove double jquery-1.3.1.js and adjust header links to jquery-1.3.2 reger 2014-01-26 00:58:54 +01:00
  • 9b24dae2b7 add language navigation filter clause to rwi results reger 2014-01-25 22:59:23 +01:00
  • f307d65dcf prepare for a language navigator works fine to restrict language for local solrSearches. More work needs to be done to make rwi/remote searches respect the modifier.language restriction. reger 2014-01-24 03:11:25 +01:00
  • cf553e5045 added hint to web.xml and for completeness the full set of hardcoded mappings reger 2014-01-23 23:56:45 +01:00
  • 768b1306b8 Added a write-enabled checkbox for remote solr servers. It is now possible to assign every peer other YaCy peers as remote solr server which are only used for read operations during search. This also affects crawling: it will exclude urls from crawls which exist on remote solr/remote YaCy peers. orbiter 2014-01-23 22:48:31 +01:00
  • f7d6dd136f changed solr paths according to new default paths orbiter 2014-01-23 19:21:07 +01:00
  • c84bcc878a first try to add a generic solr servlet as luke request servlet Michael Peter Christen 2014-01-23 19:01:31 +01:00
  • a8fdaace31 changed the web.xml as well to migrate the solr servlet Michael Peter Christen 2014-01-23 18:41:45 +01:00
  • 4cb7e2a2ca refactoring: renamed the SolrServlet to SolrSelectServlet for better naming of more Solr Servlets Michael Peter Christen 2014-01-23 17:20:49 +01:00
  • dc06e407ce added two virtual instances of solr for the both cores: collection1 and webgraph. These cores are now accessible at /solr/collection1/select instead /solr/select?core=collection1 and /solr/webgraph/select instead /solr/select?core=webgraph in addition to the old behavior to support compatibility to the old peers. These new paths are fully solr standard-conform and will allow the cross-linking between YaCy peers using their public solr API. Michael Peter Christen 2014-01-23 17:14:13 +01:00
  • 8b14e92ba4 added button in host browser to re-load 404/failed documents Michael Peter Christen 2014-01-23 15:56:36 +01:00
  • f47067b0ce fix search navigator not showing activated nav introduced with 97e84439fb reger 2014-01-23 01:52:51 +01:00
  • 771d8261c1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-22 21:53:27 +01:00
  • c351e47a84 fix for bad-formatted lonlat orbiter 2014-01-22 21:33:11 +01:00
  • 4c603b216e optimize parse ServerSideInclude reger 2014-01-22 21:23:32 +01:00
  • 5ec0c969c9 fix for http://bugs.yacy.net/view.php?id=354 orbiter 2014-01-22 20:59:53 +01:00
  • 0002abd583 fix for OOM during remote search and too high load protection orbiter 2014-01-22 20:54:03 +01:00
  • 5a917e13c6 use less ram on dht-URL transfer by not using a URIMetadataNode[] sixcooler 2014-01-22 17:52:07 +01:00
  • c87cdfca2e do not set a load prerequisite that prevents the start of one-time-jobs Michael Peter Christen 2014-01-22 17:18:53 +01:00
  • 0512e46c6a bump to httpclient-4.3.2 sixcooler 2014-01-22 01:31:22 +01:00
  • 4d77ca52c9 workaround to let dht-out run on smal Systems like a Pi sixcooler 2014-01-22 01:26:44 +01:00
  • 9a96a7d73f put list quick navigator buttons belowon BlackList_p editor replacing the dropdown -> go navigation reger 2014-01-21 21:35:48 +01:00
  • 6ada0daae9 making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering. Michael Peter Christen 2014-01-21 19:28:00 +01:00
  • 489c3fbc90 code simplifications / removed warnings Michael Peter Christen 2014-01-21 17:53:39 +01:00
  • 0168f80c28 new crawling factors can now be changed during runtime Michael Peter Christen 2014-01-21 17:52:16 +01:00
  • be5e808236 - removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request) Michael Peter Christen 2014-01-21 17:48:45 +01:00
  • 40a4030b55 configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi sixcooler 2014-01-21 17:04:22 +01:00
  • 6d8c023a5e lower client-connection for single-cpu-systems sixcooler 2014-01-21 16:56:44 +01:00
  • 77531850b5 reverted crawling strategy from latest commit. Michael Peter Christen 2014-01-21 16:05:55 +01:00
  • c0da966dfa enhanced crawler speed Michael Peter Christen 2014-01-20 21:46:40 +01:00
  • 79809342fa added synchronization to exists() call bacause the concurrent call to that method showed in thread dump close to deadlock situations. Its also better to synchronize IO operations because they become faster then. Michael Peter Christen 2014-01-20 21:09:03 +01:00
  • 9a6912f2e6 if a http client thread is still running but we do not wait for it any more, call an interrupt Michael Peter Christen 2014-01-20 18:39:36 +01:00
  • 0d235a565b cleanup crawl loader jobs Michael Peter Christen 2014-01-20 18:36:00 +01:00
  • 1ea17bd9f3 - removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class Michael Peter Christen 2014-01-20 18:31:46 +01:00
  • d3de309953 fix IOexception logging issue in DefaultServlet reason not sure but .logException triggers another exception reger 2014-01-20 08:12:35 +01:00
  • 97e84439fb adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. reger 2014-01-20 00:58:17 +01:00
  • d24a0ec32c upd heuristic default list (heuristicopensearch.conf) - Faroo Web taken out (requires api key) http://www.faroo.com/hp/api/api.html#description - update Faroo News to new url - Twitter taken out (change to Api 1.1 not supporting rss) https://dev.twitter.com/discussions/24239 reger 2014-01-20 00:03:55 +01:00
  • 022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked. Michael Peter Christen 2014-01-19 15:21:23 +01:00
  • 42f3733a05 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-19 14:47:24 +01:00
  • 25a6c05008 experimental removal of synchronization. This should work for all cases where the size() and isEmpty() method is used only for statistics, which happens at many locations in YaCy. If these methods are used for structual reasons (like accessing the last element in an array) then it may fail or cause other problems. As far as visible, this is not the case. Michael Peter Christen 2014-01-19 14:47:11 +01:00
  • 5695280edd removed superfluous synchronization Michael Peter Christen 2014-01-19 14:44:58 +01:00
  • a1977b7a75 removed debug code Michael Peter Christen 2014-01-19 14:42:26 +01:00
  • fd4abc0565 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-19 01:50:55 +01:00
  • d5b8e473c8 added load limit for DHT transfer: RWI acceptance only if local load is not too high orbiter 2014-01-19 01:50:42 +01:00
  • 41c126978b fix bug: Crawl Start (Expert) crawls "?-URLs" even if told not to do so http://bugs.yacy.net/view.php?id=329 reger 2014-01-18 23:27:16 +01:00
  • 2614fa7aeb Skip remote Solr search if last try showed error As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is remembered in the seed.dna of the remote peer. This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status. Might also be set to "not available" on time-out of last try. reger 2014-01-18 18:48:52 +01:00
  • a07e9b3582 concurrency-solid version of transmission limitation orbiter 2014-01-18 12:55:05 +01:00
  • ec21f0494e removed -d64 jvm option because that causes problems on non-64 bit linux, see http://bugs.yacy.net/view.php?id=349 and http://bugs.yacy.net/view.php?id=339 orbiter 2014-01-18 12:54:14 +01:00
  • 60ead31273 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-18 10:50:36 +01:00
  • 52bf7d1ac8 reduce load during dht transfer orbiter 2014-01-18 10:50:24 +01:00
  • f0587d4af5 NP-fix, which was found on a Pi under 'havy' load sixcooler 2014-01-18 00:03:44 +01:00
  • a9ed28c0b5 no commit if no action is requested Michael Peter Christen 2014-01-17 14:54:44 +01:00
  • 0bf3cab8c7 - better 'extra'-peer selection - logging of health status for 'extra'-peer selection - concurrency for remote peer IO and interrupting the threads if time-out occurrs Michael Peter Christen 2014-01-17 14:54:19 +01:00
  • e3c4456c8e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-17 09:43:09 +01:00
  • 7f21d21d1d added synchronization to deeply-embedded solr connector EmbeddedSolrConnector because deadlock situations show that methods in lucene class seem to block. orbiter 2014-01-17 09:42:55 +01:00
  • 9b06774414 fix role name in GSA servlet reger 2014-01-17 01:00:02 +01:00
  • 0c754dd794 implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. reger 2014-01-17 00:02:23 +01:00
  • ba44eb1160 when scaling the number of remote peers, also consider the machine load and the number of cores Michael Peter Christen 2014-01-16 17:34:26 +01:00
  • f8ce7040ab remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware. Michael Peter Christen 2014-01-16 17:27:14 +01:00
  • 47a82e471c less blocking in SeedDB which caused deadlocks in peer ping Michael Peter Christen 2014-01-16 13:10:20 +01:00
  • ec10ed45bd better logging in logger Michael Peter Christen 2014-01-16 13:08:39 +01:00
  • a5d7961812 replaced old caching in SolrConnector with a new one which is better for concurrency and should prevent from 100% CPU usage after a long run of a peer with a large number of documents. Michael Peter Christen 2014-01-15 23:13:22 +01:00
  • 84cf7e8e9f backmigration from solrj 4.6.0 to 4.5.1. This is necessary because solrj.4.6.0 has a bug which prevents the attachment of a remote solr (as tested with a SolrCloud). See bug report https://issues.apache.org/jira/browse/SOLR-5532 This bug shall be fixed in Solr 4.6.1. Fortunately, solrj-4.5.1 works together with solr-4.6.0 thus the current index does not need to be changed. Michael Peter Christen 2014-01-15 17:18:32 +01:00
  • 6e2fe777af simulate Authorization cookie for yacy servlet header reger 2014-01-10 19:31:36 +01:00
  • ea7cef5d05 fix NPE in TemplateEngine reger 2014-01-10 18:11:32 +01:00
  • cb6d0c2113 implementing YaCy legacy role names - taking out customized SecurityHandler code as the original/default seems to just work fine - with this individual sec. constraints can be applied via web.xml (using legacy role names) reger 2014-01-10 14:07:49 +01:00
  • 530b9f6de8 Merge origin/master reger 2014-01-10 12:38:00 +01:00
  • f09dbbef96 make SecurityHandler webappcontext ready reger 2014-01-10 12:36:42 +01:00
  • ca3411d805 added checkindex, solr index check Michael Peter Christen 2014-01-10 12:27:49 +01:00
  • 36594d0348 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-10 12:15:13 +01:00
  • 37f2a82a5d making root context (htroot) a WebAppContext - this allows additional features, like servlet configuration via web.xml and many more things. - currently the standard servlets are still configured in the code (so the supplied defaults/web.xml is not realy needed, yet), but could be expanded - lookup for web.xml - 1. in /DATA/SETTINGS then in /defaults reger 2014-01-10 10:42:47 +01:00
  • f6099b730d disabled unused fields in default Solr collection schema reger 2014-01-10 10:26:45 +01:00
  • 28eae57e8b spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem) reger 2014-01-10 10:24:33 +01:00
  • b931bf6b48 fix use of url proxy access pattern pattern of transparent was used. reger 2014-01-08 08:12:56 +01:00
  • 280c4a3ac1 exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) reger 2014-01-08 04:46:21 +01:00
  • fbc1071f6d Merge origin/master reger 2014-01-07 22:48:45 +01:00
  • 7b800a0c8e fix: NPE on shutdown via script reger 2014-01-07 22:44:24 +01:00
  • bfaf42b0b8 added a script which can check the solr index for inconsistencies while the peer is down. This shall be used in emergency cases where a check or fix for a broken solr index is needed. Michael Peter Christen 2014-01-07 21:58:55 +01:00
  • ce4d42d77c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-07 21:52:38 +01:00
  • 644573cfc4 using the adminAccountUserName from yacy.conf within apicall.sh Michael Peter Christen 2014-01-07 21:52:19 +01:00
  • 6932aa4d7a use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. reger 2014-01-07 21:26:50 +01:00
  • c656e67c97 fix: display proper error msg on admin user change reger 2014-01-07 20:34:37 +01:00
  • 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index. orbiter 2014-01-07 17:53:49 +01:00
  • add0e42804 fix double-escaped urls from proxy-usage sixcooler 2014-01-07 01:04:33 +01:00
  • 865ce6f974 check blacklist proxyClient config sixcooler 2014-01-07 01:01:55 +01:00
  • 345f9aba27 make use of our DNS-cache again - this realy speeds up the lookup sixcooler 2014-01-07 00:18:01 +01:00
  • e6d284fe1e better solution for prev. commit with MultiMapSolrParams.getFieldInt not returning default parameter reger 2014-01-06 18:19:54 +01:00
  • 0bc2fc14ab improve NPE chance on missing parameters java.lang.NullPointerException at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:145) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) reger 2014-01-06 17:52:21 +01:00
  • f06cef5d5b reimplement proxy access by configured whitlist pattern was currently limited to own ip. reger 2014-01-06 15:00:14 +01:00
  • 05d6cc6ea3 setting of IPv4Stack moved earlier it seems even better to call system.setproperty before isrunning check (if nothing helps we have to set it in startup script) reger 2014-01-06 11:28:05 +01:00
  • 1b6d173b14 update to Jetty 8.1.14 reger 2014-01-06 08:48:43 +01:00