Commit Graph

  • 0512e46c6a bump to httpclient-4.3.2 sixcooler 2014-01-22 01:31:22 +01:00
  • 4d77ca52c9 workaround to let dht-out run on smal Systems like a Pi sixcooler 2014-01-22 01:26:44 +01:00
  • 9a96a7d73f put list quick navigator buttons belowon BlackList_p editor replacing the dropdown -> go navigation reger 2014-01-21 21:35:48 +01:00
  • 6ada0daae9 making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering. Michael Peter Christen 2014-01-21 19:28:00 +01:00
  • 489c3fbc90 code simplifications / removed warnings Michael Peter Christen 2014-01-21 17:53:39 +01:00
  • 0168f80c28 new crawling factors can now be changed during runtime Michael Peter Christen 2014-01-21 17:52:16 +01:00
  • be5e808236 - removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request) Michael Peter Christen 2014-01-21 17:48:45 +01:00
  • 40a4030b55 configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi sixcooler 2014-01-21 17:04:22 +01:00
  • 6d8c023a5e lower client-connection for single-cpu-systems sixcooler 2014-01-21 16:56:44 +01:00
  • 77531850b5 reverted crawling strategy from latest commit. Michael Peter Christen 2014-01-21 16:05:55 +01:00
  • c0da966dfa enhanced crawler speed Michael Peter Christen 2014-01-20 21:46:40 +01:00
  • 79809342fa added synchronization to exists() call bacause the concurrent call to that method showed in thread dump close to deadlock situations. Its also better to synchronize IO operations because they become faster then. Michael Peter Christen 2014-01-20 21:09:03 +01:00
  • 9a6912f2e6 if a http client thread is still running but we do not wait for it any more, call an interrupt Michael Peter Christen 2014-01-20 18:39:36 +01:00
  • 0d235a565b cleanup crawl loader jobs Michael Peter Christen 2014-01-20 18:36:00 +01:00
  • 1ea17bd9f3 - removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class Michael Peter Christen 2014-01-20 18:31:46 +01:00
  • d3de309953 fix IOexception logging issue in DefaultServlet reason not sure but .logException triggers another exception reger 2014-01-20 08:12:35 +01:00
  • 97e84439fb adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. reger 2014-01-20 00:58:17 +01:00
  • d24a0ec32c upd heuristic default list (heuristicopensearch.conf) - Faroo Web taken out (requires api key) http://www.faroo.com/hp/api/api.html#description - update Faroo News to new url - Twitter taken out (change to Api 1.1 not supporting rss) https://dev.twitter.com/discussions/24239 reger 2014-01-20 00:03:55 +01:00
  • 022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked. Michael Peter Christen 2014-01-19 15:21:23 +01:00
  • 42f3733a05 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-19 14:47:24 +01:00
  • 25a6c05008 experimental removal of synchronization. This should work for all cases where the size() and isEmpty() method is used only for statistics, which happens at many locations in YaCy. If these methods are used for structual reasons (like accessing the last element in an array) then it may fail or cause other problems. As far as visible, this is not the case. Michael Peter Christen 2014-01-19 14:47:11 +01:00
  • 5695280edd removed superfluous synchronization Michael Peter Christen 2014-01-19 14:44:58 +01:00
  • a1977b7a75 removed debug code Michael Peter Christen 2014-01-19 14:42:26 +01:00
  • fd4abc0565 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-19 01:50:55 +01:00
  • d5b8e473c8 added load limit for DHT transfer: RWI acceptance only if local load is not too high orbiter 2014-01-19 01:50:42 +01:00
  • 41c126978b fix bug: Crawl Start (Expert) crawls "?-URLs" even if told not to do so http://bugs.yacy.net/view.php?id=329 reger 2014-01-18 23:27:16 +01:00
  • 2614fa7aeb Skip remote Solr search if last try showed error As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is remembered in the seed.dna of the remote peer. This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status. Might also be set to "not available" on time-out of last try. reger 2014-01-18 18:48:52 +01:00
  • a07e9b3582 concurrency-solid version of transmission limitation orbiter 2014-01-18 12:55:05 +01:00
  • ec21f0494e removed -d64 jvm option because that causes problems on non-64 bit linux, see http://bugs.yacy.net/view.php?id=349 and http://bugs.yacy.net/view.php?id=339 orbiter 2014-01-18 12:54:14 +01:00
  • 60ead31273 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-18 10:50:36 +01:00
  • 52bf7d1ac8 reduce load during dht transfer orbiter 2014-01-18 10:50:24 +01:00
  • f0587d4af5 NP-fix, which was found on a Pi under 'havy' load sixcooler 2014-01-18 00:03:44 +01:00
  • a9ed28c0b5 no commit if no action is requested Michael Peter Christen 2014-01-17 14:54:44 +01:00
  • 0bf3cab8c7 - better 'extra'-peer selection - logging of health status for 'extra'-peer selection - concurrency for remote peer IO and interrupting the threads if time-out occurrs Michael Peter Christen 2014-01-17 14:54:19 +01:00
  • e3c4456c8e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2014-01-17 09:43:09 +01:00
  • 7f21d21d1d added synchronization to deeply-embedded solr connector EmbeddedSolrConnector because deadlock situations show that methods in lucene class seem to block. orbiter 2014-01-17 09:42:55 +01:00
  • 9b06774414 fix role name in GSA servlet reger 2014-01-17 01:00:02 +01:00
  • 0c754dd794 implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. reger 2014-01-17 00:02:23 +01:00
  • ba44eb1160 when scaling the number of remote peers, also consider the machine load and the number of cores Michael Peter Christen 2014-01-16 17:34:26 +01:00
  • f8ce7040ab remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware. Michael Peter Christen 2014-01-16 17:27:14 +01:00
  • 47a82e471c less blocking in SeedDB which caused deadlocks in peer ping Michael Peter Christen 2014-01-16 13:10:20 +01:00
  • ec10ed45bd better logging in logger Michael Peter Christen 2014-01-16 13:08:39 +01:00
  • a5d7961812 replaced old caching in SolrConnector with a new one which is better for concurrency and should prevent from 100% CPU usage after a long run of a peer with a large number of documents. Michael Peter Christen 2014-01-15 23:13:22 +01:00
  • 84cf7e8e9f backmigration from solrj 4.6.0 to 4.5.1. This is necessary because solrj.4.6.0 has a bug which prevents the attachment of a remote solr (as tested with a SolrCloud). See bug report https://issues.apache.org/jira/browse/SOLR-5532 This bug shall be fixed in Solr 4.6.1. Fortunately, solrj-4.5.1 works together with solr-4.6.0 thus the current index does not need to be changed. Michael Peter Christen 2014-01-15 17:18:32 +01:00
  • 6e2fe777af simulate Authorization cookie for yacy servlet header reger 2014-01-10 19:31:36 +01:00
  • ea7cef5d05 fix NPE in TemplateEngine reger 2014-01-10 18:11:32 +01:00
  • cb6d0c2113 implementing YaCy legacy role names - taking out customized SecurityHandler code as the original/default seems to just work fine - with this individual sec. constraints can be applied via web.xml (using legacy role names) reger 2014-01-10 14:07:49 +01:00
  • 530b9f6de8 Merge origin/master reger 2014-01-10 12:38:00 +01:00
  • f09dbbef96 make SecurityHandler webappcontext ready reger 2014-01-10 12:36:42 +01:00
  • ca3411d805 added checkindex, solr index check Michael Peter Christen 2014-01-10 12:27:49 +01:00
  • 36594d0348 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-10 12:15:13 +01:00
  • 37f2a82a5d making root context (htroot) a WebAppContext - this allows additional features, like servlet configuration via web.xml and many more things. - currently the standard servlets are still configured in the code (so the supplied defaults/web.xml is not realy needed, yet), but could be expanded - lookup for web.xml - 1. in /DATA/SETTINGS then in /defaults reger 2014-01-10 10:42:47 +01:00
  • f6099b730d disabled unused fields in default Solr collection schema reger 2014-01-10 10:26:45 +01:00
  • 28eae57e8b spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem) reger 2014-01-10 10:24:33 +01:00
  • b931bf6b48 fix use of url proxy access pattern pattern of transparent was used. reger 2014-01-08 08:12:56 +01:00
  • 280c4a3ac1 exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) reger 2014-01-08 04:46:21 +01:00
  • fbc1071f6d Merge origin/master reger 2014-01-07 22:48:45 +01:00
  • 7b800a0c8e fix: NPE on shutdown via script reger 2014-01-07 22:44:24 +01:00
  • bfaf42b0b8 added a script which can check the solr index for inconsistencies while the peer is down. This shall be used in emergency cases where a check or fix for a broken solr index is needed. Michael Peter Christen 2014-01-07 21:58:55 +01:00
  • ce4d42d77c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-01-07 21:52:38 +01:00
  • 644573cfc4 using the adminAccountUserName from yacy.conf within apicall.sh Michael Peter Christen 2014-01-07 21:52:19 +01:00
  • 6932aa4d7a use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. reger 2014-01-07 21:26:50 +01:00
  • c656e67c97 fix: display proper error msg on admin user change reger 2014-01-07 20:34:37 +01:00
  • 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index. orbiter 2014-01-07 17:53:49 +01:00
  • add0e42804 fix double-escaped urls from proxy-usage sixcooler 2014-01-07 01:04:33 +01:00
  • 865ce6f974 check blacklist proxyClient config sixcooler 2014-01-07 01:01:55 +01:00
  • 345f9aba27 make use of our DNS-cache again - this realy speeds up the lookup sixcooler 2014-01-07 00:18:01 +01:00
  • e6d284fe1e better solution for prev. commit with MultiMapSolrParams.getFieldInt not returning default parameter reger 2014-01-06 18:19:54 +01:00
  • 0bc2fc14ab improve NPE chance on missing parameters java.lang.NullPointerException at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:145) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) reger 2014-01-06 17:52:21 +01:00
  • f06cef5d5b reimplement proxy access by configured whitlist pattern was currently limited to own ip. reger 2014-01-06 15:00:14 +01:00
  • 05d6cc6ea3 setting of IPv4Stack moved earlier it seems even better to call system.setproperty before isrunning check (if nothing helps we have to set it in startup script) reger 2014-01-06 11:28:05 +01:00
  • 1b6d173b14 update to Jetty 8.1.14 reger 2014-01-06 08:48:43 +01:00
  • 41dc0f82c1 open service manager upon install failure in installYaCyWindowsService.bat likely service is already installed Service Manager allows then to modify settings reger 2014-01-06 07:22:23 +01:00
  • 30d925a96e reimplemented server access restriction via Jetty IPAccessHandler to allow only configured IP's to access. Handler is only loaded if a restriction is configured. reger 2014-01-06 07:00:16 +01:00
  • 3cb6c7861f fixed shutdown authenticaton problem orbiter 2014-01-06 01:48:54 +01:00
  • ed06b5b94b set a realm message to log-in input window which explains that a password for the account 'admin' can be (re-)set with the script bin/passwd.sh Michael Peter Christen 2014-01-05 17:43:34 +01:00
  • 7005ecdabd cleanup Michael Peter Christen 2014-01-05 15:06:40 +01:00
  • 2939b47986 removed non-working realm setting in http client (auth for localhost was added in previous commit) Michael Peter Christen 2014-01-05 15:04:18 +01:00
  • 9d52b337f3 added http authentification to YaCy http client for all localhost acesses to enable self-steering of the peer using the API table. This is necessary in case that an password for the administration pages is set. orbiter 2014-01-05 14:46:11 +01:00
  • c951945666 modified log-in detail to enable admin-login from localhost with stored hash even if localhost access is disabled. This is urgently needed for the apicall.sh script since that is used for high-availability set-up (checkalive and indexdump for index mirroring) Michael Peter Christen 2014-01-05 11:50:23 +01:00
  • 9bd71fdbb4 made the access tracker class static because it shall be used by the jetty auth module Michael Peter Christen 2014-01-05 05:04:28 +01:00
  • 1c56befb93 fixed mess with test on localhost (which means local hosts for some cases) Michael Peter Christen 2014-01-05 04:55:30 +01:00
  • 7d6fc79eb8 refactoring (usage of constant names for attributes of authentication check) Michael Peter Christen 2014-01-05 04:23:44 +01:00
  • b9d36e45e0 removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process. Michael Peter Christen 2014-01-05 03:40:10 +01:00
  • e2ccb6ce9d modified DefaultServlet parameter on invoke templates call response with post=0 (if post empty) simulating previous behavior. reger 2014-01-04 20:49:26 +01:00
  • cabe0943cd fix opensearch resultcount in yacysearch.rss see merge request https://gitorious.org/yacy/rc1/merge_requests/24 use result count in searchtrailer.xml which is on p2p search more accurate (timing) reger 2014-01-04 17:14:10 +01:00
  • eaf596a257 adding proxy status to (private) status box (show also transparent and url proxy status) reger 2014-01-04 16:10:54 +01:00
  • 4c38bceafc handle http connect for proxy refactor header cleanup (reuse existing code) reger 2014-01-04 13:09:34 +01:00
  • cfabe8f67a harmonize access restriction for urlproxy servlet with proxy handler, what is currently - use switched on in config - access from a local IP / hostname reger 2014-01-03 12:28:40 +01:00
  • e3d8459906 extend ssl enabled msg on status page - post the portnr reger 2014-01-03 02:56:09 +01:00
  • e6b9643fd6 extended request for local peer check to by hostname resolved ip the current islocal() check did not detect a domain.com address as request for the local peer. reger 2014-01-03 01:13:56 +01:00
  • c797f108a1 add error response on deniedl proxy access send http 403 response reger 2014-01-02 09:11:08 +01:00
  • 0583f44306 reimplement proxy access log (to Jetty ProxyHandler) - using existing HTTPDProxyHandler logger - allow local loopback ip to access proxy reger 2014-01-02 03:37:33 +01:00
  • 8cbc1c970a Security Hot-Fix: for transparent proxy. reger 2014-01-01 20:48:35 +01:00
  • 58ecf5e4dd add to blacklist button in CrawlResults http://bugs.yacy.net/view.php?id=220 introduced Blacklist.add with sourcefile only parameter reger 2014-01-01 11:01:22 +01:00
  • 17b454f957 fix external link (open in new tab) reger 2014-01-01 10:33:20 +01:00
  • e9081c0f17 moved startup execAPIActions call after Jetty startup execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, but it's more robust to place the call after http is assigned to switchboard/serverSwitch. reger 2014-01-01 10:28:49 +01:00
  • 19c1a7a5ca change SolrServlet from Filter to Servlet (as no multicore required) this allows to simplify context/servlet initialization in Jetty init. reger 2014-01-01 10:20:32 +01:00
  • 14c977dd26 fix NPE GSAresponseWriter on query=null java.lang.NullPointerException at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.highlight(GSAResponseWriter.java:328) at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.write(GSAResponseWriter.java:263) at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:235) reger 2013-12-31 23:01:41 +01:00
  • c3dee2d6bd added security patch orbiter 2013-12-31 15:25:44 +01:00