Commit Graph

  • 957f6297fb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-11-30 01:46:03 +01:00
  • 1e94719084 fix NPE on mime detection of unknown file extension reger 2013-11-29 23:23:47 +01:00
  • effea4bca0 Merge origin/master into jetty reger 2013-11-29 22:39:52 +01:00
  • b49e90d2e9 remove reference to solrServlet from YaCy servlet select - reference is not used - solrServlet is used in Jetty branch and adjustments there conflict with unused solrServlet here. reger 2013-11-29 22:10:14 +01:00
  • 38e1e3a707 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-11-29 02:46:38 +01:00
  • 2c2ebb0d92 tried some hardening in order not letting any Solr-Searchers open sixcooler 2013-11-29 02:40:12 +01:00
  • cca79d12ef setting of some default values to make an client development start easy using the description at http://www.yacy-websuche.de/wiki/index.php/Dev:APIhello Michael Peter Christen 2013-11-29 01:28:48 +01:00
  • a16534cb0a tried to fix timeout and connection-lost problems when using an outside solr. Michael Peter Christen 2013-11-28 01:31:53 +01:00
  • c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over to second solr core in case of unrecoverable OOM. Michael Peter Christen 2013-11-28 01:10:25 +01:00
  • 9932c441c8 fixed a problem with Date fields parsing Solr results if a remote Solr is attached. Michael Peter Christen 2013-11-28 00:54:53 +01:00
  • 94db054aff memory-leak-fix: the DocListSearcher fires an query in its constructor and it is highly recommend to close every SolrRequest. Every Request, which is not closed leaves a Searcher with its Chaches an can not be garbage-collectet. sixcooler 2013-11-27 19:07:36 +01:00
  • 26bb1e37b7 implement core selection in SolrServlet - making initcore() obsolete reger 2013-11-27 02:51:02 +01:00
  • ae55d69ef6 include/exclude size NPE fix (recently added) Michael Peter Christen 2013-11-26 11:47:04 +01:00
  • 3d4b5e66ce disallow remote robots to crawl the HostBrowser servlet Michael Peter Christen 2013-11-26 07:06:25 +01:00
  • 234ca720f5 only admins should be able to force a commit Michael Peter Christen 2013-11-26 07:03:20 +01:00
  • 2c39b65409 fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class. Michael Peter Christen 2013-11-26 02:24:47 +01:00
  • 5592ea57f0 hack to remove compiler warnings about deprecated classes. It would be better to remove the deprecated usage but to do this the Solr core must adopt the latest apache http core changes as well .. this is not our fault. Michael Peter Christen 2013-11-25 23:30:35 +01:00
  • 037cd0a57c using the BinaryResponseWriter which is supported within the YaCy solr servlet since YaCy 1.63. This is much more performant for the client than using the XMLResponseWriter because parsing of XML data is very CPU intensive. Older YaCy peers are still requested using the XMLResponseWriter but the majority of YaCy peers already respond with the binary writer. This makes remote searches much faster and less CPU intensive. orbiter 2013-11-25 21:31:40 +01:00
  • 61409788eb less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form. orbiter 2013-11-25 15:20:54 +01:00
  • f23471c471 add check to prevent index entries containing url_file_ext_s with ";jsession=xyz" note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication) reger 2013-11-25 00:14:53 +01:00
  • 5c4a3d1c01 Merge origin/master into jetty reger 2013-11-24 21:00:39 +01:00
  • 444a9ae674 remove unused options and attributes from DefaultServlet cleanup obsolete class files reger 2013-11-24 20:11:39 +01:00
  • 8da75a4b0c fix contentType definition for Solr html responswriter from xml to html (hint: value is currently not used, but is in SolrServlet) reger 2013-11-24 04:31:08 +01:00
  • caa20d63d9 fixed seedlist (hash was missing) Michael Peter Christen 2013-11-22 14:15:52 +01:00
  • ccf2f4e43b refactoring of seed attributes (introduced more constants) Michael Peter Christen 2013-11-22 14:15:31 +01:00
  • 1f0bfa8fec added test to Base64Order (runs successfully!) Michael Peter Christen 2013-11-22 10:38:42 +01:00
  • c927b428d3 fixed json Michael Peter Christen 2013-11-22 10:07:08 +01:00
  • 64048ff217 fir for XSS Michael Peter Christen 2013-11-22 09:53:32 +01:00
  • b7f1e5af51 added new servlet which generates the same file as the principal peers upload to a bootstrap position you can call it either with http://localhost:8090/yacy/seedlist.html or to generate json (or jsonp) with http://localhost:8090/yacy/seedlist.json http://localhost:8090/yacy/seedlist.json?callback=seedlist orbiter 2013-11-19 15:56:10 +01:00
  • 3e552550d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-11-18 22:48:00 +01:00
  • c2d720cdaf purge a lucene cache - possible memory leak fix orbiter 2013-11-18 22:47:35 +01:00
  • e4f49fb175 for searchresults with empty title use filename as title - to not store a title in index which isn't extracted from source the title is empty check only added to ResultEntry class reger 2013-11-18 19:41:31 +01:00
  • b1dc9a6f52 - disable Jetty servlet defaultUseCache (prevent double caching) - include short memory status check for class cache in DefaultServlet - remove obsolete Resource interface for Jetty8YaCyDefaultServlet reger 2013-11-18 03:15:45 +01:00
  • f111f30ace Merge origin/master into jetty reger 2013-11-17 00:18:25 +01:00
  • f4172cbb3d fix for another XSS bug Michael Peter Christen 2013-11-17 00:17:25 +01:00
  • 94293176a3 use writeOptionHeaders with ServletResponse parameter only reger 2013-11-17 00:02:08 +01:00
  • ff86cb683f fixed some XSS bugs reported by Marius from http://ctf365.com/ orbiter 2013-11-16 20:34:31 +01:00
  • da33ee0d77 extended also timeout fr webgraph postprocessing orbiter 2013-11-16 18:30:06 +01:00
  • 74f9e40747 extended timeout during postprocessing of 30 minutes. orbiter 2013-11-16 18:29:08 +01:00
  • 19a051bec8 more monitoring for postprocessing and enhanced layout in Crawler monitor page orbiter 2013-11-16 18:23:14 +01:00
  • 9cf9727685 fix for wrong counter Michael Peter Christen 2013-11-16 11:33:35 +01:00
  • fceac8cffd more monitoring for postprocessing Michael Peter Christen 2013-11-16 08:23:42 +01:00
  • 6842783761 fixed and enhanced postprocessing Michael Peter Christen 2013-11-16 08:23:21 +01:00
  • 219d5934a4 fixed termination bug in Solr Connector Michael Peter Christen 2013-11-16 08:22:29 +01:00
  • bf1bdd52a6 prevent requesting of 0-facets (which actually exist) Michael Peter Christen 2013-11-15 15:41:41 +01:00
  • 9d5895f643 enhanced and fixed postprocessing Michael Peter Christen 2013-11-15 15:41:12 +01:00
  • f86fe90eda enhanced mass storage speed to remote solr servers Michael Peter Christen 2013-11-15 15:40:07 +01:00
  • 6ed9821209 fixed several problems in solr connectors Michael Peter Christen 2013-11-15 15:39:35 +01:00
  • 191fd3d7e7 added an optimization option to HandleSet mass data storage structure Michael Peter Christen 2013-11-15 15:38:00 +01:00
  • 94b565ea0d fixed keepalive min value Michael Peter Christen 2013-11-15 15:37:01 +01:00
  • 5ec5be5769 fixed logging for remote solr configuration Michael Peter Christen 2013-11-15 15:36:24 +01:00
  • b26787dc2d - DefaultServlet: remove static gzip option YaCy doesn't use pre-gzip'ed static html pages - ProxyServlet: remove not neede procedure - Server init: skip one overlaping servlet context reger 2013-11-14 01:37:51 +01:00
  • 24a052ecb9 removed debug code for existsByIds Michael Peter Christen 2013-11-13 13:41:18 +01:00
  • 087df05e24 added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off. Michael Peter Christen 2013-11-13 13:38:01 +01:00
  • 1a4a69c226 set more logger to 'final static' Michael Peter Christen 2013-11-13 06:18:48 +01:00
  • c60947360d logger should be static Michael Peter Christen 2013-11-13 06:04:28 +01:00
  • 69b8d61c47 fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.) Michael Peter Christen 2013-11-12 15:54:54 +01:00
  • b085cb522b replaced old existsByIds for embedded Solr with obviously much faster new selection method (including stil existing debug code to test that this is in fact better) orbiter 2013-11-11 11:25:01 +01:00
  • 1a6158e338 make test directory available in Maven pom - exclude reference to old slf4j-log4j12 reger 2013-11-10 22:20:35 +01:00
  • b4fdb8c887 cleanup test directory from Jetty 9 implementation samples - current Jetty implementation advances so that it seems not beneficial to keep the code as it makes the test unuseable and use of Jetty 9 is due to Java 1.7 dependency not in sight. reger 2013-11-10 22:01:31 +01:00
  • b29d262e70 implement Jetty8HttpServerImpl.generateSocketAddress (code 1:1 copied from serverCore) reger 2013-11-10 18:59:18 +01:00
  • 4234b0ed6c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-11-10 18:50:43 +01:00
  • 909bbb49d8 added (partly commented) test code for url rewrite methods .. to be completed orbiter 2013-11-10 18:50:34 +01:00
  • 74c86a72a0 better default value for crawler user agent orbiter 2013-11-10 18:48:00 +01:00
  • 066a1ecf0a add highlight queryparams to solrservlet if missing - modify query params in Solr parameter map (instead of querystring) reger 2013-11-10 01:36:57 +01:00
  • 899e7e92b0 added debug code Michael Peter Christen 2013-11-09 02:37:12 +01:00
  • a5c1249ee2 reverted autowarming setting in solrconfig Michael Peter Christen 2013-11-09 01:43:44 +01:00
  • 4684330505 Merge origin/master into jetty reger 2013-11-07 21:44:14 +01:00
  • 1437c45383 merge rc1/master reger 2013-11-07 21:30:17 +01:00
  • 87a956e881 calculating and showing the number of files and the average size of a file in the HTCACHE in ConfigHTCache_p.html Michael Peter Christen 2013-11-07 12:13:12 +01:00
  • acc1f8a749 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-11-07 12:01:37 +01:00
  • 81d9e23532 fixed another memory leak in the PDF parser: the class org.apache.pdfbox.pdmodel.font.PDFont occupies 8MB of space which cannot be cleaned if PDFont.clearResources is called. The attempt to clean the class cache therefore causes that the class is loaded and this cache is initialized with some rubbish. I tried to prevent to instantiate this class by usage of a hacked findLoadedClass call to the SystemClassLoader (which is protected ...). Now, without using the PDF parser at all, 8MB of RAM space is not occupied, however, when the first PDF arrives this space will be taked and never given back to GC. WAKE UP YOU LAZY PDFBOX HACKER AND FIX THIS SHIT! Michael Peter Christen 2013-11-07 11:57:01 +01:00
  • c152d996e6 reduced footprint of BookmarksDB which can take quite a lot of memory if the number of bookmarks is high (i.e. > 2000 URLs) Michael Peter Christen 2013-11-07 10:55:02 +01:00
  • 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes. Michael Peter Christen 2013-11-07 10:01:44 +01:00
  • 7b17cdf6dd add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type reger 2013-11-07 03:11:03 +01:00
  • 082c9a98c1 move writeHeaders from Jetty8 servlet to YaCyDefaultServlet - after removing Jetty server dependency (of Response using HttpServletResponse only) reger 2013-11-07 00:32:21 +01:00
  • 987f410011 URL-export:add query and fix for cast-class-exception sixcooler 2013-11-06 19:22:26 +01:00
  • ffe8276063 replaced referrer link masking to 'pure' links to the referring page (that was more useful during testing) Michael Peter Christen 2013-11-06 18:05:46 +01:00
  • a8253ca49c added missing unicode transformation in href link contents during parsing Michael Peter Christen 2013-11-06 18:05:02 +01:00
  • 0cf9e9580b added clickdepth and CR computation debug code to verify that the process is complete Michael Peter Christen 2013-11-06 15:01:40 +01:00
  • 7f768b42d3 we do not need the load-image flag any more since this is now controlled by parser switches Michael Peter Christen 2013-11-06 15:00:57 +01:00
  • b85f702f22 add AccessTracker logging to SolrServlet reger 2013-11-05 22:57:55 +01:00
  • de1f02420b implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet. - set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html" - set a contenttype to GSAsearchServlet reger 2013-11-04 21:11:12 +01:00
  • 234a974955 load image only if their parser flag is activated Michael Peter Christen 2013-11-04 11:59:28 +01:00
  • b2c329929f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-11-04 10:18:52 +01:00
  • 60187a4ec2 fix in html parser Michael Peter Christen 2013-11-04 10:16:20 +01:00
  • e1c1e57877 less overhead calling exist() with only one hash Michael Peter Christen 2013-11-04 09:37:31 +01:00
  • 3d5d366f1c fix html header in Solr HTMLResponseWriter - move 1st body content after </head> tag - add closing <span> tag reger 2013-11-04 03:12:02 +01:00
  • bfdb404867 implement a Jetty reconnect to work with Configbasic_p.html port change - instead of shutting down the server it should be sufficient to manipulate the Jetty http connector reger 2013-11-03 21:34:21 +01:00
  • 5a02d650ee avoid cloning Michael Peter Christen 2013-11-03 18:31:50 +01:00
  • 8ec350bad2 upd Maven pom (take back introduced java-templates) to handle filtering of yacyBuildProperties.java. To keep it compatible with ant filter directly from original sourcd/.... location. reger 2013-11-03 02:38:36 +01:00
  • d6760df3e5 fix servlet class exist check to use default path only (in Jetty8YaCyDefaultServlet) - del redundant doget code in yacydefaultservlet - small declaration code opts - del obsolete libt/proxyservlet.java reger 2013-11-03 02:26:00 +01:00
  • 9da87c0c7f update Maven build script - use current YaCy version number - make use of libbuild\GitRevMavenTask (maven-plugin-gitrevisionnumber) - make yacyBuildProperties.java available for source filtering by Maven-plugin (copy to libbuild\java-templates) - update assembly definition to include lib\yacycore.jar without version number (needed this way by startupscript) reger 2013-11-02 06:27:18 +01:00
  • 62c591ffd1 add Maven plugin to return a YaCy style Git repository build release number and timestamp - it injects properties which can be used in pom via ${DSTAMP} ${releaseNr} if added as plugin via <plugin> <groupId>net.yacy</groupId> <artifactId>maven-plugin-gitrevisionnumber</artifactId> <version>1.0</version> <executions><execution> <phase>initialize</phase> <goals><goal>create</goal></goals> </execution></executions> </plugin> reger 2013-11-02 02:33:06 +01:00
  • b38de92a16 Merge origin/master into jetty reger 2013-11-02 00:48:42 +01:00
  • a09e70cd68 fix typo in GitRevTask (branch) reger 2013-11-02 00:18:24 +01:00
  • cc39667399 Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests) Michael Peter Christen 2013-11-01 17:24:36 +01:00
  • 434e13b46d in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!) Michael Peter Christen 2013-11-01 13:30:53 +01:00
  • 176acce5cb version number change for next development cycle orbiter 2013-10-31 16:20:33 +01:00
  • 1ac504ae51 use html encoding for urls in metadata orbiter 2013-10-31 16:16:29 +01:00