Commit Graph

  • 6944225037 - add GSA search /gsa/search servlet for Jetty to Server init - include SecurityHandler check for /gsa/ /solr/ - change one more YaCyDefaultServlet dependency from Jetty to std. javax.Servlet reger 2013-10-30 23:11:36 +01:00
  • ec3c0582ae update Maven pom and jar dependencies reger 2013-10-30 01:13:12 +01:00
  • 53cb30a221 reduce logging (by assigning logger to existing logger) - small additional cleanups reger 2013-10-30 00:51:04 +01:00
  • 332c6d4fe1 reactivate Domain handler for .yacy / .yacyh handling reger 2013-10-27 19:15:20 +01:00
  • b1ce70434e resolve merge conflict - add missing import statement reger 2013-10-27 15:24:04 +01:00
  • 7869a4c070 Merge origin/master into jetty - merge conflict resolve reger 2013-10-27 15:12:17 +01:00
  • f017066197 Merge origin/master into jetty reger 2013-10-27 15:09:24 +01:00
  • 06da6f517c add YaCyProxyServlet to handle /proxy.html?url=proxyurl - based on Jetty ProxyServlet - at this time use existing HTTPD ProxyHandler for url rewrite - add jetty-client jar (dependency in Jetty ProxyServlet) reger 2013-10-27 05:04:24 +01:00
  • 69599566f9 catch one more malformed url in proxy url rewrite reger 2013-10-27 04:42:33 +01:00
  • 605530fec5 catch proxy url rewrite exception malformed url (" http:\/\/" ) may cause error response testcase http://localhost:8090/proxy.html?url=http://dictionary.reference.com/browse/test reger 2013-10-27 04:06:11 +01:00
  • aaa945518d next intermediate release 1.64 orbiter 2013-10-26 01:31:26 +02:00
  • 25951cee14 - fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border Michael Peter Christen 2013-10-26 00:34:55 +02:00
  • f1bfe64361 integrated startpage to compare_yacy Michael Peter Christen 2013-10-26 00:33:36 +02:00
  • 2f57327f20 added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web. Michael Peter Christen 2013-10-26 00:15:25 +02:00
  • 9bb7eab389 hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit. Michael Peter Christen 2013-10-25 15:05:30 +02:00
  • 3c3cb78555 - removed a lot of garbage and bloated code from GuiHandler. - transformed log lines to String before they are stored because the storage space is about 1:250 (45kb for one line before transformation, 180 bytes afterwards) - this saves up to 10MB RAM so we can increase the number of lines to 1000 again. orbiter 2013-10-24 20:42:34 +02:00
  • 5afa6e3aee Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB. Michael Peter Christen 2013-10-24 17:39:50 +02:00
  • 030d0776ff Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250 Michael Peter Christen 2013-10-24 16:20:20 +02:00
  • 6aabc4e5c8 reduced logging line memory, 10000 lines had filled up 450MB! grrr. (thank you, a bomb from the past) Michael Peter Christen 2013-10-24 16:17:53 +02:00
  • 1a8783147b enhanced computation of number of solr documents. Michael Peter Christen 2013-10-24 15:48:05 +02:00
  • 4948c39e48 added concurrency for mass crawl check Michael Peter Christen 2013-10-23 11:27:19 +02:00
  • 1b4fa2947d - fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html) Michael Peter Christen 2013-10-23 00:16:54 +02:00
  • 82621bead0 When doing bootstraping, always accept one seedlist-File without checking the date of the file. This should help to start the peer in case that the user has a completely wrong date setting. Michael Peter Christen 2013-10-22 15:34:51 +02:00
  • 16e3b357b3 replaced old tag cloud and adopted design a bit Michael Peter Christen 2013-10-22 14:20:17 +02:00
  • dc38d35986 added matching in url field in Table_API_p search Michael Peter Christen 2013-10-22 12:46:10 +02:00
  • 691d7e70fa added hint to development/commit rss feed Michael Peter Christen 2013-10-21 15:16:29 +02:00
  • b81859c751 Show a RSS icon in the right top corner of search results. This replaces the 'API' icon which was the link for the opensearch result which is an extension of RSS. Since it is more appropriate to visualize a RSS link with an RSS icon, this API icon was changed here. Michael Peter Christen 2013-10-21 15:10:58 +02:00
  • 1a09771be8 fixed sitemap crawl start Michael Peter Christen 2013-10-21 12:49:32 +02:00
  • b743e6d79f - prevent that crawl filter have empty (never-match) content - rewrite the description of the options "Restrict to start domain(s)" and "Restrict to sub-path(s)" to an explanation, that the restriction applies to all links in the link list of the option "From Link-List of URL" if this option is selected - allow "Restrict to sub-path(s)" if the "From Link-List of URL" is selected. This is supported in the crawl start. orbiter 2013-10-18 14:14:13 +02:00
  • 20bbde8665 fix for mustmatch regex computation: result had correct semantic, but may have contained multiple same expressions within the disjunction of domain-restrictions. This fix removes the redundant restrictions and makes the regex shorter. orbiter 2013-10-18 13:55:37 +02:00
  • cb2dbcb843 add graceful Jetty shutdown option - as Jetty stop is not synced, yet - include jetty jars and servlet-3.0 api jar in Eclipse .classpath reger 2013-10-18 00:42:38 +02:00
  • f597fdb602 make it easier to filter properties (case insensitive) orbiter 2013-10-17 18:36:35 +02:00
  • f46c723398 allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking reger 2013-10-17 03:34:22 +02:00
  • da4ff5aefa add YaCy HttpCommand "authenticate" check to DefaultServlet reger 2013-10-17 00:06:17 +02:00
  • c833d02cf5 fixed webgraph postprocessing (did nothing and repeated to do this...) Michael Peter Christen 2013-10-16 11:49:04 +02:00
  • 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also without the harvestingkey, remove crawl profiles after postprocessing, speed-up for clickdepth computation. Michael Peter Christen 2013-10-16 11:27:06 +02:00
  • 299f51cb7f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-10-16 04:26:19 +02:00
  • 1adb4b8741 merge rc1/master reger 2013-10-16 03:02:21 +02:00
  • e7a596afda Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-10-16 02:28:13 +02:00
  • 37d24f3318 make use of declared static string ACTION_LOCATION reger 2013-10-16 02:25:39 +02:00
  • 77a73c7475 add YaCy HttpCommand "location" check to DefaultServlet reger 2013-10-16 01:48:44 +02:00
  • 7b69c438f7 more methods for the table class Michael Peter Christen 2013-10-15 16:46:59 +02:00
  • 820b896146 Replaced the inframe loading from yacy.net for donations with the loading of this iframe from the local host. To make this more flexible, this iframe is loaded once after startup from yacy.net. Michael Peter Christen 2013-10-15 16:46:06 +02:00
  • cc223b14a4 remove wrong content mod in SSI parser for virtual path /currentyacypeer/ (is handled on start of request handling) reger 2013-10-15 03:25:24 +02:00
  • dfb73c9519 bump to httpclient-4.3.1 - a bugfix release sixcooler 2013-10-14 23:32:24 +02:00
  • 5606291574 fix last commit (not needed test of GZipInputStream) reger 2013-10-14 04:29:34 +02:00
  • f9eed8cb44 add support for gzip encoded multipart forms (needed for transferRWI.html) - quick and dirty reuse of existing HTTPDemon implementation reger 2013-10-14 04:18:52 +02:00
  • cf32a92629 - add size check to multipart form data handling of YaCyDefaultServlet (same as in HTTPDemon.parseMultipart) - reduce Jetty logging - give build.run a bit more memory (set to YaCy.default 600m from 512m) reger 2013-10-13 20:56:03 +02:00
  • 705f147820 - add localpeername.yacy to list of local address detection for AbstractRemoteHandler - use proxy via header info as in legacy proxy handler reger 2013-10-13 18:06:42 +02:00
  • 0d4efabaa8 fix YaCy version string in proxy headers (config parameter vString not longer used) reger 2013-10-13 17:56:53 +02:00
  • 2226189743 disable domainhandler due to error - domainhandler causes closed response output stream in following handlers on addresses resolved to local peer (like in hello protocoll preventing peer to switch to senior peer) reger 2013-10-13 07:24:33 +02:00
  • eea504c117 update Info.plist small DefaultServlet refactoring reger 2013-10-12 23:01:14 +02:00
  • a44eede8b8 merge rc1/master reger 2013-10-11 01:50:25 +02:00
  • d9a02ed277 NPE fix for my last commit sixcooler 2013-10-11 00:44:04 +02:00
  • 54a0272338 searchpage javascript (latestinfo) causes reset of search statistic after moving to next page - disabled call via setTimeout in yacysearch.html reger 2013-10-10 23:23:58 +02:00
  • 61f627eb85 fix for ssl-connections from proxy-usage staying in close-wait-state + some extra 'close' in HttpClient sixcooler 2013-10-10 20:57:37 +02:00
  • 91fa99e9bb added new icon/image for latest commit Michael Peter Christen 2013-10-09 22:07:59 +02:00
  • 9fac9249bc - replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible. Michael Peter Christen 2013-10-09 22:07:32 +02:00
  • 0f6db6ad5b Merge remote-tracking branch 'jensbees/crawlexpert-post' Michael Peter Christen 2013-10-09 21:32:27 +02:00
  • 3fcf7a94c5 rolling back wrong merge bhoerdzn 2013-10-09 21:06:11 +02:00
  • 3252c1ec39 Merge upstream/master into crawlexpert-post Jens Bertram 2013-10-09 20:49:14 +02:00
  • d328cc4a83 fix for didyoumean, added also more asian alphabets Michael Peter Christen 2013-10-09 16:17:50 +02:00
  • 90c8577840 enhanced ranking; patches to replace old ranking Michael Peter Christen 2013-10-09 15:10:03 +02:00
  • 9f6b98d374 Merge master into crawlexpert-post Jens Bertram 2013-10-09 14:39:20 +02:00
  • 6e33be4ce6 reverting local changes to project.xml bhoerdzn 2013-10-09 14:23:06 +02:00
  • a3824dfbaa check URL on inital load, if set bhoerdzn 2013-10-09 13:52:44 +02:00
  • 52f49d475b add a hidden field for "crawlingstart" since jQuery omits the submit button value bhoerdzn 2013-10-09 13:38:20 +02:00
  • b0c0ec2dec link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler" bhoerdzn 2013-10-09 12:55:42 +02:00
  • d64d45361c use integer types for boolean values bhoerdzn 2013-10-09 12:42:04 +02:00
  • eda123d6fd remove debugging code intercepting post requests bhoerdzn 2013-10-09 11:51:07 +02:00
  • 5057f27bbd fix typo in parsing "cachePolicy" parameter bhoerdzn 2013-10-09 11:41:15 +02:00
  • 98f5c9018d Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load. bhoerdzn 2013-10-09 11:32:17 +02:00
  • a6a62986d4 correct state handling for country code restriction bhoerdzn 2013-10-09 10:42:35 +02:00
  • 4066b85155 correctly set initial state for load filters bhoerdzn 2013-10-09 10:36:08 +02:00
  • 8c91c3e7cd set form boolean values to 0 & 1 instead of false & true bhoerdzn 2013-10-09 10:05:51 +02:00
  • c27fabc88e fixed wrong parameter check bhoerdzn 2013-10-09 10:00:16 +02:00
  • 2214bf5396 Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation. bhoerdzn 2013-10-09 09:48:00 +02:00
  • e74f548551 make legacy http server (serverCore) implement YaCyHttpServer interface reger 2013-10-09 01:07:22 +02:00
  • 71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets reger 2013-10-09 00:40:48 +02:00
  • 1b61bd40ed - Added new solr field url_file_name_tokens_t which stores the file name tokens. This can be used to enhance the ranking. - Added also a rating_i field as basis for later usage. - enhanced the tokenization process. Michael Peter Christen 2013-10-08 23:48:13 +02:00
  • 6efa7532d2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-10-08 19:04:57 +02:00
  • 5f5a97bafc added the anchor text within web pages to the searcheable entities of a web page. This can be of benefit for the ranking if these fields are used for boosts. orbiter 2013-10-08 18:41:07 +02:00
  • 705b3338ee list more fields available for search and for ranking boosts orbiter 2013-10-08 18:15:35 +02:00
  • d536092fe4 fix false fill NAME_CACHE_MISS-DNS-Cache in case of a timeout for eg. caused by massive requests when crawl from file sixcooler 2013-10-08 18:02:42 +02:00
  • 405878182f Use list template for all other option lists. Fixed some template expressions. bhoerdzn 2013-10-08 15:04:31 +02:00
  • 8e74098cd4 Use list template for "reloadIfOlderNumber". bhoerdzn 2013-10-08 13:26:09 +02:00
  • 52bad7b908 Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields. bhoerdzn 2013-10-08 13:24:27 +02:00
  • 78e7aadb26 removed unused initialization method Michael Peter Christen 2013-10-07 23:51:28 +02:00
  • e56aa4fe93 fixed search navigation Michael Peter Christen 2013-10-07 23:51:08 +02:00
  • 4fbc4740df removed warnings Michael Peter Christen 2013-10-07 23:41:50 +02:00
  • 202a9fbdad adding synonyms from German OpenThesaurus ready for use in YaCy Lotus 2013-10-07 22:02:42 +02:00
  • 21aa6a0321 migration to Solr 4.5.0 Michael Peter Christen 2013-10-07 17:09:40 +02:00
  • 45cf553bc3 try to guess default crawling mode, if none set bhoerdzn 2013-10-07 13:13:22 +02:00
  • b4f0c822f2 assign strings before checking contents bhoerdzn 2013-10-07 13:01:39 +02:00
  • ef31d0f279 fix for rss reader, see http://bugs.yacy.net/view.php?id=294 Michael Peter Christen 2013-10-07 12:59:54 +02:00
  • 499abe8f91 set default values for string parameters bhoerdzn 2013-10-07 12:32:23 +02:00
  • 85316b3ac6 Merge branch 'master' into crawlexpert-post Jens Bertram 2013-10-07 12:02:52 +02:00
  • 42ea56eaad made crawStartExpert_p aware of post variables; extended template where needed bhoerdzn 2013-10-07 11:25:59 +02:00
  • 101a6e6e14 Patch the citation index for links with canonical tags. This shall fulfill the following requirement: If a document A links to B and B contains a 'canonical C', then the citation rank computation shall consider that A links to C and B does not link to C. To do so, we first must collect all canonical links, find all references to them, get the anchor list of the documents and patch the citation reference of these links. Michael Peter Christen 2013-10-07 11:15:58 +02:00
  • daebeb93aa add call to AccessTracker to jetty security handler reger 2013-10-04 01:16:17 +02:00