Commit Graph

  • c152d996e6 reduced footprint of BookmarksDB which can take quite a lot of memory if the number of bookmarks is high (i.e. > 2000 URLs) Michael Peter Christen 2013-11-07 10:55:02 +01:00
  • 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes. Michael Peter Christen 2013-11-07 10:01:44 +01:00
  • 7b17cdf6dd add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type reger 2013-11-07 03:11:03 +01:00
  • 082c9a98c1 move writeHeaders from Jetty8 servlet to YaCyDefaultServlet - after removing Jetty server dependency (of Response using HttpServletResponse only) reger 2013-11-07 00:32:21 +01:00
  • 987f410011 URL-export:add query and fix for cast-class-exception sixcooler 2013-11-06 19:22:26 +01:00
  • ffe8276063 replaced referrer link masking to 'pure' links to the referring page (that was more useful during testing) Michael Peter Christen 2013-11-06 18:05:46 +01:00
  • a8253ca49c added missing unicode transformation in href link contents during parsing Michael Peter Christen 2013-11-06 18:05:02 +01:00
  • 0cf9e9580b added clickdepth and CR computation debug code to verify that the process is complete Michael Peter Christen 2013-11-06 15:01:40 +01:00
  • 7f768b42d3 we do not need the load-image flag any more since this is now controlled by parser switches Michael Peter Christen 2013-11-06 15:00:57 +01:00
  • b85f702f22 add AccessTracker logging to SolrServlet reger 2013-11-05 22:57:55 +01:00
  • de1f02420b implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet. - set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html" - set a contenttype to GSAsearchServlet reger 2013-11-04 21:11:12 +01:00
  • 234a974955 load image only if their parser flag is activated Michael Peter Christen 2013-11-04 11:59:28 +01:00
  • b2c329929f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-11-04 10:18:52 +01:00
  • 60187a4ec2 fix in html parser Michael Peter Christen 2013-11-04 10:16:20 +01:00
  • e1c1e57877 less overhead calling exist() with only one hash Michael Peter Christen 2013-11-04 09:37:31 +01:00
  • 3d5d366f1c fix html header in Solr HTMLResponseWriter - move 1st body content after </head> tag - add closing <span> tag reger 2013-11-04 03:12:02 +01:00
  • bfdb404867 implement a Jetty reconnect to work with Configbasic_p.html port change - instead of shutting down the server it should be sufficient to manipulate the Jetty http connector reger 2013-11-03 21:34:21 +01:00
  • 5a02d650ee avoid cloning Michael Peter Christen 2013-11-03 18:31:50 +01:00
  • 8ec350bad2 upd Maven pom (take back introduced java-templates) to handle filtering of yacyBuildProperties.java. To keep it compatible with ant filter directly from original sourcd/.... location. reger 2013-11-03 02:38:36 +01:00
  • d6760df3e5 fix servlet class exist check to use default path only (in Jetty8YaCyDefaultServlet) - del redundant doget code in yacydefaultservlet - small declaration code opts - del obsolete libt/proxyservlet.java reger 2013-11-03 02:26:00 +01:00
  • 9da87c0c7f update Maven build script - use current YaCy version number - make use of libbuild\GitRevMavenTask (maven-plugin-gitrevisionnumber) - make yacyBuildProperties.java available for source filtering by Maven-plugin (copy to libbuild\java-templates) - update assembly definition to include lib\yacycore.jar without version number (needed this way by startupscript) reger 2013-11-02 06:27:18 +01:00
  • 62c591ffd1 add Maven plugin to return a YaCy style Git repository build release number and timestamp - it injects properties which can be used in pom via ${DSTAMP} ${releaseNr} if added as plugin via <plugin> <groupId>net.yacy</groupId> <artifactId>maven-plugin-gitrevisionnumber</artifactId> <version>1.0</version> <executions><execution> <phase>initialize</phase> <goals><goal>create</goal></goals> </execution></executions> </plugin> reger 2013-11-02 02:33:06 +01:00
  • b38de92a16 Merge origin/master into jetty reger 2013-11-02 00:48:42 +01:00
  • a09e70cd68 fix typo in GitRevTask (branch) reger 2013-11-02 00:18:24 +01:00
  • cc39667399 Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests) Michael Peter Christen 2013-11-01 17:24:36 +01:00
  • 434e13b46d in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!) Michael Peter Christen 2013-11-01 13:30:53 +01:00
  • 176acce5cb version number change for next development cycle orbiter 2013-10-31 16:20:33 +01:00
  • 1ac504ae51 use html encoding for urls in metadata orbiter 2013-10-31 16:16:29 +01:00
  • 6944225037 - add GSA search /gsa/search servlet for Jetty to Server init - include SecurityHandler check for /gsa/ /solr/ - change one more YaCyDefaultServlet dependency from Jetty to std. javax.Servlet reger 2013-10-30 23:11:36 +01:00
  • ec3c0582ae update Maven pom and jar dependencies reger 2013-10-30 01:13:12 +01:00
  • 53cb30a221 reduce logging (by assigning logger to existing logger) - small additional cleanups reger 2013-10-30 00:51:04 +01:00
  • 332c6d4fe1 reactivate Domain handler for .yacy / .yacyh handling reger 2013-10-27 19:15:20 +01:00
  • b1ce70434e resolve merge conflict - add missing import statement reger 2013-10-27 15:24:04 +01:00
  • 7869a4c070 Merge origin/master into jetty - merge conflict resolve reger 2013-10-27 15:12:17 +01:00
  • f017066197 Merge origin/master into jetty reger 2013-10-27 15:09:24 +01:00
  • 06da6f517c add YaCyProxyServlet to handle /proxy.html?url=proxyurl - based on Jetty ProxyServlet - at this time use existing HTTPD ProxyHandler for url rewrite - add jetty-client jar (dependency in Jetty ProxyServlet) reger 2013-10-27 05:04:24 +01:00
  • 69599566f9 catch one more malformed url in proxy url rewrite reger 2013-10-27 04:42:33 +01:00
  • 605530fec5 catch proxy url rewrite exception malformed url (" http:\/\/" ) may cause error response testcase http://localhost:8090/proxy.html?url=http://dictionary.reference.com/browse/test reger 2013-10-27 04:06:11 +01:00
  • aaa945518d next intermediate release 1.64 orbiter 2013-10-26 01:31:26 +02:00
  • 25951cee14 - fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border Michael Peter Christen 2013-10-26 00:34:55 +02:00
  • f1bfe64361 integrated startpage to compare_yacy Michael Peter Christen 2013-10-26 00:33:36 +02:00
  • 2f57327f20 added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web. Michael Peter Christen 2013-10-26 00:15:25 +02:00
  • 9bb7eab389 hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit. Michael Peter Christen 2013-10-25 15:05:30 +02:00
  • 3c3cb78555 - removed a lot of garbage and bloated code from GuiHandler. - transformed log lines to String before they are stored because the storage space is about 1:250 (45kb for one line before transformation, 180 bytes afterwards) - this saves up to 10MB RAM so we can increase the number of lines to 1000 again. orbiter 2013-10-24 20:42:34 +02:00
  • 5afa6e3aee Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB. Michael Peter Christen 2013-10-24 17:39:50 +02:00
  • 030d0776ff Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250 Michael Peter Christen 2013-10-24 16:20:20 +02:00
  • 6aabc4e5c8 reduced logging line memory, 10000 lines had filled up 450MB! grrr. (thank you, a bomb from the past) Michael Peter Christen 2013-10-24 16:17:53 +02:00
  • 1a8783147b enhanced computation of number of solr documents. Michael Peter Christen 2013-10-24 15:48:05 +02:00
  • 4948c39e48 added concurrency for mass crawl check Michael Peter Christen 2013-10-23 11:27:19 +02:00
  • 1b4fa2947d - fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html) Michael Peter Christen 2013-10-23 00:16:54 +02:00
  • 82621bead0 When doing bootstraping, always accept one seedlist-File without checking the date of the file. This should help to start the peer in case that the user has a completely wrong date setting. Michael Peter Christen 2013-10-22 15:34:51 +02:00
  • 16e3b357b3 replaced old tag cloud and adopted design a bit Michael Peter Christen 2013-10-22 14:20:17 +02:00
  • dc38d35986 added matching in url field in Table_API_p search Michael Peter Christen 2013-10-22 12:46:10 +02:00
  • 691d7e70fa added hint to development/commit rss feed Michael Peter Christen 2013-10-21 15:16:29 +02:00
  • b81859c751 Show a RSS icon in the right top corner of search results. This replaces the 'API' icon which was the link for the opensearch result which is an extension of RSS. Since it is more appropriate to visualize a RSS link with an RSS icon, this API icon was changed here. Michael Peter Christen 2013-10-21 15:10:58 +02:00
  • 1a09771be8 fixed sitemap crawl start Michael Peter Christen 2013-10-21 12:49:32 +02:00
  • b743e6d79f - prevent that crawl filter have empty (never-match) content - rewrite the description of the options "Restrict to start domain(s)" and "Restrict to sub-path(s)" to an explanation, that the restriction applies to all links in the link list of the option "From Link-List of URL" if this option is selected - allow "Restrict to sub-path(s)" if the "From Link-List of URL" is selected. This is supported in the crawl start. orbiter 2013-10-18 14:14:13 +02:00
  • 20bbde8665 fix for mustmatch regex computation: result had correct semantic, but may have contained multiple same expressions within the disjunction of domain-restrictions. This fix removes the redundant restrictions and makes the regex shorter. orbiter 2013-10-18 13:55:37 +02:00
  • cb2dbcb843 add graceful Jetty shutdown option - as Jetty stop is not synced, yet - include jetty jars and servlet-3.0 api jar in Eclipse .classpath reger 2013-10-18 00:42:38 +02:00
  • f597fdb602 make it easier to filter properties (case insensitive) orbiter 2013-10-17 18:36:35 +02:00
  • f46c723398 allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking reger 2013-10-17 03:34:22 +02:00
  • da4ff5aefa add YaCy HttpCommand "authenticate" check to DefaultServlet reger 2013-10-17 00:06:17 +02:00
  • c833d02cf5 fixed webgraph postprocessing (did nothing and repeated to do this...) Michael Peter Christen 2013-10-16 11:49:04 +02:00
  • 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also without the harvestingkey, remove crawl profiles after postprocessing, speed-up for clickdepth computation. Michael Peter Christen 2013-10-16 11:27:06 +02:00
  • 299f51cb7f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-10-16 04:26:19 +02:00
  • 1adb4b8741 merge rc1/master reger 2013-10-16 03:02:21 +02:00
  • e7a596afda Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2013-10-16 02:28:13 +02:00
  • 37d24f3318 make use of declared static string ACTION_LOCATION reger 2013-10-16 02:25:39 +02:00
  • 77a73c7475 add YaCy HttpCommand "location" check to DefaultServlet reger 2013-10-16 01:48:44 +02:00
  • 7b69c438f7 more methods for the table class Michael Peter Christen 2013-10-15 16:46:59 +02:00
  • 820b896146 Replaced the inframe loading from yacy.net for donations with the loading of this iframe from the local host. To make this more flexible, this iframe is loaded once after startup from yacy.net. Michael Peter Christen 2013-10-15 16:46:06 +02:00
  • cc223b14a4 remove wrong content mod in SSI parser for virtual path /currentyacypeer/ (is handled on start of request handling) reger 2013-10-15 03:25:24 +02:00
  • dfb73c9519 bump to httpclient-4.3.1 - a bugfix release sixcooler 2013-10-14 23:32:24 +02:00
  • 5606291574 fix last commit (not needed test of GZipInputStream) reger 2013-10-14 04:29:34 +02:00
  • f9eed8cb44 add support for gzip encoded multipart forms (needed for transferRWI.html) - quick and dirty reuse of existing HTTPDemon implementation reger 2013-10-14 04:18:52 +02:00
  • cf32a92629 - add size check to multipart form data handling of YaCyDefaultServlet (same as in HTTPDemon.parseMultipart) - reduce Jetty logging - give build.run a bit more memory (set to YaCy.default 600m from 512m) reger 2013-10-13 20:56:03 +02:00
  • 705f147820 - add localpeername.yacy to list of local address detection for AbstractRemoteHandler - use proxy via header info as in legacy proxy handler reger 2013-10-13 18:06:42 +02:00
  • 0d4efabaa8 fix YaCy version string in proxy headers (config parameter vString not longer used) reger 2013-10-13 17:56:53 +02:00
  • 2226189743 disable domainhandler due to error - domainhandler causes closed response output stream in following handlers on addresses resolved to local peer (like in hello protocoll preventing peer to switch to senior peer) reger 2013-10-13 07:24:33 +02:00
  • eea504c117 update Info.plist small DefaultServlet refactoring reger 2013-10-12 23:01:14 +02:00
  • a44eede8b8 merge rc1/master reger 2013-10-11 01:50:25 +02:00
  • d9a02ed277 NPE fix for my last commit sixcooler 2013-10-11 00:44:04 +02:00
  • 54a0272338 searchpage javascript (latestinfo) causes reset of search statistic after moving to next page - disabled call via setTimeout in yacysearch.html reger 2013-10-10 23:23:58 +02:00
  • 61f627eb85 fix for ssl-connections from proxy-usage staying in close-wait-state + some extra 'close' in HttpClient sixcooler 2013-10-10 20:57:37 +02:00
  • 91fa99e9bb added new icon/image for latest commit Michael Peter Christen 2013-10-09 22:07:59 +02:00
  • 9fac9249bc - replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible. Michael Peter Christen 2013-10-09 22:07:32 +02:00
  • 0f6db6ad5b Merge remote-tracking branch 'jensbees/crawlexpert-post' Michael Peter Christen 2013-10-09 21:32:27 +02:00
  • 3fcf7a94c5 rolling back wrong merge bhoerdzn 2013-10-09 21:06:11 +02:00
  • 3252c1ec39 Merge upstream/master into crawlexpert-post Jens Bertram 2013-10-09 20:49:14 +02:00
  • d328cc4a83 fix for didyoumean, added also more asian alphabets Michael Peter Christen 2013-10-09 16:17:50 +02:00
  • 90c8577840 enhanced ranking; patches to replace old ranking Michael Peter Christen 2013-10-09 15:10:03 +02:00
  • 9f6b98d374 Merge master into crawlexpert-post Jens Bertram 2013-10-09 14:39:20 +02:00
  • 6e33be4ce6 reverting local changes to project.xml bhoerdzn 2013-10-09 14:23:06 +02:00
  • a3824dfbaa check URL on inital load, if set bhoerdzn 2013-10-09 13:52:44 +02:00
  • 52f49d475b add a hidden field for "crawlingstart" since jQuery omits the submit button value bhoerdzn 2013-10-09 13:38:20 +02:00
  • b0c0ec2dec link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler" bhoerdzn 2013-10-09 12:55:42 +02:00
  • d64d45361c use integer types for boolean values bhoerdzn 2013-10-09 12:42:04 +02:00
  • eda123d6fd remove debugging code intercepting post requests bhoerdzn 2013-10-09 11:51:07 +02:00
  • 5057f27bbd fix typo in parsing "cachePolicy" parameter bhoerdzn 2013-10-09 11:41:15 +02:00
  • 98f5c9018d Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load. bhoerdzn 2013-10-09 11:32:17 +02:00