Commit Graph

  • 080d80c9de do not write an empty failreason in case that there is no fail. Because of the lazy instantiation rule this value was not actually written, but if lazy instantiation is switched on, then this causes that all crawl starts delete all crawl-start-hosts completely because this looks for filled error reasons. orbiter 2013-07-26 17:53:28 +02:00
  • 4c242f9af9 always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets Michael Peter Christen 2013-07-25 12:17:29 +02:00
  • 61e015268b fix in forced deletion: forced commit needed Michael Peter Christen 2013-07-25 09:53:19 +02:00
  • 83e2921b39 new test case for http://bugs.yacy.net/view.php?id=141 Michael Peter Christen 2013-07-25 09:31:48 +02:00
  • 304aacb2cc fix for http://bugs.yacy.net/view.php?id=267 Michael Peter Christen 2013-07-25 09:26:24 +02:00
  • c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268 Michael Peter Christen 2013-07-25 09:21:37 +02:00
  • aa1a1f1d2c - small adjustment to make sure genericParser is tried last -- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser (like .ps .rdf files and other) - remove redundant file extension registration reger 2013-07-23 20:24:13 +02:00
  • 3e901dcb06 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-23 19:33:07 +02:00
  • f50b596e0b do not run dht ditribution if system load is over 2.5 orbiter 2013-07-23 19:32:32 +02:00
  • 9c681cc00d added segment sizes, postprocessing status and cpu load to crawler monitor orbiter 2013-07-23 19:10:11 +02:00
  • 86b514cf46 added load info to status_p.xml orbiter 2013-07-23 18:20:07 +02:00
  • 056b42f5aa - added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end orbiter 2013-07-23 18:03:33 +02:00
  • 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index orbiter 2013-07-23 16:46:44 +02:00
  • af740f3058 changed optimization to a segment-size of index-size/5.000.000 + one if not idle + one (and force) if postprocessing sixcooler 2013-07-23 14:21:12 +02:00
  • 336f86394c replaced StringBuffer with StringBuilder Michael Peter Christen 2013-07-23 12:21:27 +02:00
  • aeac2fb763 replaced more containsKey() -> get() usages by a simple get(), followed by a test for NULL. This should increase the application speed and reduces the lookup time for the affected methods by 50% Michael Peter Christen 2013-07-23 12:16:51 +02:00
  • 5364c4dcc9 delayed first peer-ping to send the first ping out after the http got up; if the ping comes before the http is up, it cannot be recognized as senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266 orbiter 2013-07-22 18:21:37 +02:00
  • e24016e30a added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254 orbiter 2013-07-22 17:45:12 +02:00
  • c124037f19 removed forced non-soft commits to prevent index fragmentation orbiter 2013-07-22 17:28:20 +02:00
  • 31483c47e1 fixed problem with remote luke requests Michael Peter Christen 2013-07-22 15:55:20 +02:00
  • c15aa758dc removed failreason_t removal patch because that causes too much confusion using an external solr. to clean up the index after a schema change, use the index cleaner function from the online servlet Michael Peter Christen 2013-07-22 14:17:38 +02:00
  • 2b7a38640a extend content type detection on file extension for .tif .tiff .htm reger 2013-07-21 22:57:21 +02:00
  • ac1aad5064 added a getSegmentCount method and use it to disable optimize if wanted current segment count is below optimization level Michael Peter Christen 2013-07-18 14:31:42 +02:00
  • 36035e0a0a - used reger's LukeRequest to generalize the index info in SolrServerConnector - used the LukeRequest in SolrServerConnector to replace the index size method by a getNumDocs request to a LukeRequest result Michael Peter Christen 2013-07-18 13:26:07 +02:00
  • 39fceb5ccf fix for NPE & bug #264 Michael Peter Christen 2013-07-18 12:37:32 +02:00
  • 735a66eff3 enhancements to crawler Michael Peter Christen 2013-07-18 12:29:04 +02:00
  • 232100301c removed double-ocurring value assignments orbiter 2013-07-17 19:09:25 +02:00
  • be0ff6018f Removed trailing spaces + some more final Roland Haeder 2013-07-15 18:22:35 +02:00
  • aaedc0405d Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Roland Haeder 2013-07-17 18:37:34 +02:00
  • 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Roland Haeder 2013-07-17 18:31:30 +02:00
  • 98e10f95e2 Added some cora package loggers Roland Haeder 2013-07-14 09:01:10 +02:00
  • 553f83a14e Recommended cleanup (please, one day, execute this cleanup) Roland Haeder 2013-07-14 08:04:10 +02:00
  • 03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file. Felix Ableitner 2013-07-17 16:42:10 +02:00
  • 376f9cd9d0 Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure Felix Ableitner 2013-07-17 15:58:09 +02:00
  • 89c0aa0e74 added collection_sxt to error documents Michael Peter Christen 2013-07-17 15:20:56 +02:00
  • 0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-07-17 12:42:06 +02:00
  • 1fd006cc56 fixes using the embedded connector Michael Peter Christen 2013-07-17 12:41:54 +02:00
  • d0dc86cf3d logging of deadlocks (if any) during cleanup process orbiter 2013-07-17 12:38:58 +02:00
  • aba7cc5de7 added cpu load information to status page orbiter 2013-07-17 12:38:12 +02:00
  • c6a6f159e8 fix for crawl stack domain counter Michael Peter Christen 2013-07-16 18:18:55 +02:00
  • 93d1bac140 do a more frequent optimization, reduces IO after optimization Michael Peter Christen 2013-07-16 17:16:48 +02:00
  • 260d0c96c7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-16 10:49:36 +02:00
  • b71d13a014 added load and deadlock detector in Memory util orbiter 2013-07-16 10:49:20 +02:00
  • 2271b17191 Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git Lotus 2013-07-14 18:53:29 +02:00
  • af07007799 partly revert latest windows changes: YaCy has to be installed to a directory with write access for the running user. DATA folder is now used in the YaCy folder again. For using another location, the start script has to be heavily modified for loading proper start parameters after YaCy has been started once. Lotus 2013-07-14 18:43:32 +02:00
  • 290e24564b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-14 17:41:32 +02:00
  • 5533fc8e01 fix for bug 260 orbiter 2013-07-14 17:40:28 +02:00
  • b79471ee67 grr Michael Peter Christen 2013-07-14 10:15:47 +02:00
  • a79f288ac1 automatically running optimize on solr if user/search is idle for some time Michael Peter Christen 2013-07-14 10:02:08 +02:00
  • a9c8046c87 do a light optimization at the end of a crawl postprocessing orbiter 2013-07-13 19:09:46 +02:00
  • 1b43e02b86 Merge branch 'master' of git://gitorious.org/~quix0r/yacy/quix0rs-yacy-rc1 orbiter 2013-07-13 18:54:18 +02:00
  • a548354c71 replaced type of solr schema object sku of text_en_splitting_tight by string orbiter 2013-07-13 18:54:09 +02:00
  • 59b4fdd5ad Merge remote-tracking branch 'upstream/master' Roland Haeder 2013-07-13 15:12:51 +02:00
  • 5493389576 stealth mode shall only be available for authorized users, because unauthorized users can otherwise be monitored by authorized users orbiter 2013-07-13 14:49:36 +02:00
  • ebbb3bc5c1 Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet Roland Haeder 2013-07-13 13:12:36 +02:00
  • 2f1ec8d4a2 npe fix orbiter 2013-07-13 11:10:05 +02:00
  • bcc623a843 refactoring of load_delay: this is a matter of client identification Michael Peter Christen 2013-07-12 16:24:56 +02:00
  • 0d0b3a30f5 activate api actions after postprocessing of crawls orbiter 2013-07-12 16:05:48 +02:00
  • 3978c5ca5d fix for http://bugs.yacy.net/view.php?id=255 orbiter 2013-07-12 14:38:30 +02:00
  • 2be456e7fb added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle) orbiter 2013-07-12 14:29:22 +02:00
  • 575f913154 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-12 14:17:13 +02:00
  • c4efb612e2 added list of crawls to status_p.xml orbiter 2013-07-12 14:16:51 +02:00
  • 5f666220b7 added files to uninstall Lotus 2013-07-11 22:04:01 +02:00
  • bb6caa346c Do not allow automatic update in case YaCy is installed to the Program Files folder on Windows. There are no permissions to write that folder and update would fail. Lotus 2013-07-11 21:50:06 +02:00
  • 338cf0d133 Revert "Revert "Windows installer: update logo"" Lotus 2013-07-11 21:48:49 +02:00
  • c66631d407 Revert "Windows installer: update logo" Lotus 2013-07-11 21:48:13 +02:00
  • 41cc9be62b Windows installer: update logo Lotus 2013-07-11 21:46:46 +02:00
  • 4fdcfb8230 adapt windows start script parameters to linux start script parameter Lotus 2013-07-11 21:46:17 +02:00
  • dac88561ae minimum access time has a tight connection to ClientIdentification, therefore it is defined there. orbiter 2013-07-11 17:04:24 +02:00
  • a020697d64 Fixed problems with blacklist entry insertion. Felix Ableitner 2013-07-11 13:10:23 +02:00
  • 9a29ab469e another patch to prevent CLOSE_WAIT status on solr connections Michael Peter Christen 2013-07-11 12:53:39 +02:00
  • 5091d627bc fixed parsing of peer flags Michael Peter Christen 2013-07-11 12:53:16 +02:00
  • 87e9052081 added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof) Michael Peter Christen 2013-07-11 11:54:11 +02:00
  • 2a19a60074 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-11 10:22:55 +02:00
  • bff8c753c6 re-insert this file - was deleted by mistake + correct an other case-typo sixcooler 2013-07-10 18:32:12 +02:00
  • e609ec388a metager whitelist update orbiter 2013-07-10 15:13:04 +02:00
  • 5c6946dd5f replaced usage of log4j by ConcurrentLog where possible Michael Peter Christen 2013-07-09 14:42:39 +02:00
  • 5878c1d599 - refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger Michael Peter Christen 2013-07-09 14:28:25 +02:00
  • 6d5533c9cd Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-07-09 11:52:20 +02:00
  • c79f687110 enhanced the network scanner: find more hosts automatically by removal of common subdomains before application of protocol-specific prefix orbiter 2013-07-09 11:42:13 +02:00
  • f4f6551c66 better handling of time-out at solrj in case that a commit is done in a fail-over case during add orbiter 2013-07-09 11:01:37 +02:00
  • b4677d1cad fix for bug #252 the naming of the servlet was wrong, the bug may not be present on systems where upper/lowercase matching is lazy (windows) orbiter 2013-07-09 10:50:47 +02:00
  • 2716dfc46c increase crawler speed by reduction if the busysleep time Michael Peter Christen 2013-07-08 23:40:31 +02:00
  • 07261fe274 Merge remote-tracking branch 'nutomics/blacklist_structure' Michael Peter Christen 2013-07-08 23:32:15 +02:00
  • dea71851d2 - better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index Michael Peter Christen 2013-07-08 16:29:30 +02:00
  • a34e137e27 fix for citation index generation in case that entry.referrerhash() is null. This is especially the case if ftp sites are crawled Michael Peter Christen 2013-07-08 16:26:11 +02:00
  • a2c8116a8f accept (but ignore) a '+' sign in front of search words Michael Peter Christen 2013-07-08 16:20:40 +02:00
  • 9f0cc9b401 enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains orbiter 2013-07-08 13:17:09 +02:00
  • d8354a389c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-07 21:31:28 +02:00
  • 6e120e90fe do not cut text on submit buttons Lotus 2013-07-07 19:17:29 +02:00
  • f8c28efd66 fix for rssTerminal coloring orbiter 2013-07-04 21:46:46 +02:00
  • 308d73f855 do not use remote proxy if not switched on - regardless of the proto sixcooler 2013-07-04 19:16:13 +02:00
  • 69906b1d2e Revert "do not use remote proxy if not switched on - regardless of the proto" sixcooler 2013-07-04 19:13:51 +02:00
  • 20f452d228 do not use remote proxy if not switched on - regardless of the proto sixcooler 2013-07-04 19:12:50 +02:00
  • 9551720d5c re-enable saved setting for proxy-crawl-profile sixcooler 2013-07-04 19:10:57 +02:00
  • d5d8936f9d For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method sixcooler 2013-07-04 19:08:53 +02:00
  • 44f8fcf62e Changed class structure of Blacklist. Felix Ableitner 2013-07-04 18:37:57 +02:00
  • 3054a6d4b9 added a patch from Sebastian M.B., submitted by email for coloring of rss terminal Michael Peter Christen 2013-07-04 17:12:19 +02:00
  • 78af998f8f Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594' Michael Peter Christen 2013-07-04 16:56:54 +02:00
  • 57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by default. Michael Peter Christen 2013-07-03 14:50:06 +02:00