Commit Graph

  • aaedc0405d Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Roland Haeder 2013-07-17 18:37:34 +02:00
  • 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Roland Haeder 2013-07-17 18:31:30 +02:00
  • 98e10f95e2 Added some cora package loggers Roland Haeder 2013-07-14 09:01:10 +02:00
  • 553f83a14e Recommended cleanup (please, one day, execute this cleanup) Roland Haeder 2013-07-14 08:04:10 +02:00
  • 03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file. Felix Ableitner 2013-07-17 16:42:10 +02:00
  • 376f9cd9d0 Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure Felix Ableitner 2013-07-17 15:58:09 +02:00
  • 89c0aa0e74 added collection_sxt to error documents Michael Peter Christen 2013-07-17 15:20:56 +02:00
  • 0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-07-17 12:42:06 +02:00
  • 1fd006cc56 fixes using the embedded connector Michael Peter Christen 2013-07-17 12:41:54 +02:00
  • d0dc86cf3d logging of deadlocks (if any) during cleanup process orbiter 2013-07-17 12:38:58 +02:00
  • aba7cc5de7 added cpu load information to status page orbiter 2013-07-17 12:38:12 +02:00
  • c6a6f159e8 fix for crawl stack domain counter Michael Peter Christen 2013-07-16 18:18:55 +02:00
  • 93d1bac140 do a more frequent optimization, reduces IO after optimization Michael Peter Christen 2013-07-16 17:16:48 +02:00
  • 260d0c96c7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-16 10:49:36 +02:00
  • b71d13a014 added load and deadlock detector in Memory util orbiter 2013-07-16 10:49:20 +02:00
  • 2271b17191 Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git Lotus 2013-07-14 18:53:29 +02:00
  • af07007799 partly revert latest windows changes: YaCy has to be installed to a directory with write access for the running user. DATA folder is now used in the YaCy folder again. For using another location, the start script has to be heavily modified for loading proper start parameters after YaCy has been started once. Lotus 2013-07-14 18:43:32 +02:00
  • 290e24564b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-14 17:41:32 +02:00
  • 5533fc8e01 fix for bug 260 orbiter 2013-07-14 17:40:28 +02:00
  • b79471ee67 grr Michael Peter Christen 2013-07-14 10:15:47 +02:00
  • a79f288ac1 automatically running optimize on solr if user/search is idle for some time Michael Peter Christen 2013-07-14 10:02:08 +02:00
  • a9c8046c87 do a light optimization at the end of a crawl postprocessing orbiter 2013-07-13 19:09:46 +02:00
  • 1b43e02b86 Merge branch 'master' of git://gitorious.org/~quix0r/yacy/quix0rs-yacy-rc1 orbiter 2013-07-13 18:54:18 +02:00
  • a548354c71 replaced type of solr schema object sku of text_en_splitting_tight by string orbiter 2013-07-13 18:54:09 +02:00
  • 59b4fdd5ad Merge remote-tracking branch 'upstream/master' Roland Haeder 2013-07-13 15:12:51 +02:00
  • 5493389576 stealth mode shall only be available for authorized users, because unauthorized users can otherwise be monitored by authorized users orbiter 2013-07-13 14:49:36 +02:00
  • ebbb3bc5c1 Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet Roland Haeder 2013-07-13 13:12:36 +02:00
  • 2f1ec8d4a2 npe fix orbiter 2013-07-13 11:10:05 +02:00
  • bcc623a843 refactoring of load_delay: this is a matter of client identification Michael Peter Christen 2013-07-12 16:24:56 +02:00
  • 0d0b3a30f5 activate api actions after postprocessing of crawls orbiter 2013-07-12 16:05:48 +02:00
  • 3978c5ca5d fix for http://bugs.yacy.net/view.php?id=255 orbiter 2013-07-12 14:38:30 +02:00
  • 2be456e7fb added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle) orbiter 2013-07-12 14:29:22 +02:00
  • 575f913154 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-12 14:17:13 +02:00
  • c4efb612e2 added list of crawls to status_p.xml orbiter 2013-07-12 14:16:51 +02:00
  • 5f666220b7 added files to uninstall Lotus 2013-07-11 22:04:01 +02:00
  • bb6caa346c Do not allow automatic update in case YaCy is installed to the Program Files folder on Windows. There are no permissions to write that folder and update would fail. Lotus 2013-07-11 21:50:06 +02:00
  • 338cf0d133 Revert "Revert "Windows installer: update logo"" Lotus 2013-07-11 21:48:49 +02:00
  • c66631d407 Revert "Windows installer: update logo" Lotus 2013-07-11 21:48:13 +02:00
  • 41cc9be62b Windows installer: update logo Lotus 2013-07-11 21:46:46 +02:00
  • 4fdcfb8230 adapt windows start script parameters to linux start script parameter Lotus 2013-07-11 21:46:17 +02:00
  • dac88561ae minimum access time has a tight connection to ClientIdentification, therefore it is defined there. orbiter 2013-07-11 17:04:24 +02:00
  • a020697d64 Fixed problems with blacklist entry insertion. Felix Ableitner 2013-07-11 13:10:23 +02:00
  • 9a29ab469e another patch to prevent CLOSE_WAIT status on solr connections Michael Peter Christen 2013-07-11 12:53:39 +02:00
  • 5091d627bc fixed parsing of peer flags Michael Peter Christen 2013-07-11 12:53:16 +02:00
  • 87e9052081 added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof) Michael Peter Christen 2013-07-11 11:54:11 +02:00
  • 2a19a60074 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-11 10:22:55 +02:00
  • bff8c753c6 re-insert this file - was deleted by mistake + correct an other case-typo sixcooler 2013-07-10 18:32:12 +02:00
  • e609ec388a metager whitelist update orbiter 2013-07-10 15:13:04 +02:00
  • 5c6946dd5f replaced usage of log4j by ConcurrentLog where possible Michael Peter Christen 2013-07-09 14:42:39 +02:00
  • 5878c1d599 - refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger Michael Peter Christen 2013-07-09 14:28:25 +02:00
  • 6d5533c9cd Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-07-09 11:52:20 +02:00
  • c79f687110 enhanced the network scanner: find more hosts automatically by removal of common subdomains before application of protocol-specific prefix orbiter 2013-07-09 11:42:13 +02:00
  • f4f6551c66 better handling of time-out at solrj in case that a commit is done in a fail-over case during add orbiter 2013-07-09 11:01:37 +02:00
  • b4677d1cad fix for bug #252 the naming of the servlet was wrong, the bug may not be present on systems where upper/lowercase matching is lazy (windows) orbiter 2013-07-09 10:50:47 +02:00
  • 2716dfc46c increase crawler speed by reduction if the busysleep time Michael Peter Christen 2013-07-08 23:40:31 +02:00
  • 07261fe274 Merge remote-tracking branch 'nutomics/blacklist_structure' Michael Peter Christen 2013-07-08 23:32:15 +02:00
  • dea71851d2 - better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index Michael Peter Christen 2013-07-08 16:29:30 +02:00
  • a34e137e27 fix for citation index generation in case that entry.referrerhash() is null. This is especially the case if ftp sites are crawled Michael Peter Christen 2013-07-08 16:26:11 +02:00
  • a2c8116a8f accept (but ignore) a '+' sign in front of search words Michael Peter Christen 2013-07-08 16:20:40 +02:00
  • 9f0cc9b401 enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains orbiter 2013-07-08 13:17:09 +02:00
  • d8354a389c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-07-07 21:31:28 +02:00
  • 6e120e90fe do not cut text on submit buttons Lotus 2013-07-07 19:17:29 +02:00
  • f8c28efd66 fix for rssTerminal coloring orbiter 2013-07-04 21:46:46 +02:00
  • 308d73f855 do not use remote proxy if not switched on - regardless of the proto sixcooler 2013-07-04 19:16:13 +02:00
  • 69906b1d2e Revert "do not use remote proxy if not switched on - regardless of the proto" sixcooler 2013-07-04 19:13:51 +02:00
  • 20f452d228 do not use remote proxy if not switched on - regardless of the proto sixcooler 2013-07-04 19:12:50 +02:00
  • 9551720d5c re-enable saved setting for proxy-crawl-profile sixcooler 2013-07-04 19:10:57 +02:00
  • d5d8936f9d For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method sixcooler 2013-07-04 19:08:53 +02:00
  • 44f8fcf62e Changed class structure of Blacklist. Felix Ableitner 2013-07-04 18:37:57 +02:00
  • 3054a6d4b9 added a patch from Sebastian M.B., submitted by email for coloring of rss terminal Michael Peter Christen 2013-07-04 17:12:19 +02:00
  • 78af998f8f Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594' Michael Peter Christen 2013-07-04 16:56:54 +02:00
  • 57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by default. Michael Peter Christen 2013-07-03 14:50:06 +02:00
  • fd90fcc4e0 Fixes #196. Felix Ableitner 2013-07-02 20:45:41 +02:00
  • 5a5d411ec0 new robots_i attribute fields Michael Peter Christen 2013-07-02 14:29:13 +02:00
  • fa08bd9d5a hack to prevent long waiting times in crawler Michael Peter Christen 2013-07-01 13:24:52 +02:00
  • f1c5338210 prepartion for greedy crawl profiles and refactoring Michael Peter Christen 2013-07-01 13:10:09 +02:00
  • e6f361f474 adding the canonical tag to crawl queues Michael Peter Christen 2013-07-01 13:09:41 +02:00
  • 40c5ee47c1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-06-30 12:07:25 +02:00
  • ae23a0badb updated copyright message; included LGPL for 'cora' and a warranty warning. orbiter 2013-06-30 11:30:39 +02:00
  • a6bf44212e bugfix: location (lat/lon) meta data retrival (Double.NaN check) reger 2013-06-30 03:50:07 +02:00
  • 203921006a redesign of citation index storage Michael Peter Christen 2013-06-30 02:11:46 +02:00
  • 7c6ccc426c set crawlingQ to true by default because most webpages are dynamic and crawlingQ should only be switched off in case of crawler traps orbiter 2013-06-29 20:28:14 +02:00
  • 5de4267a9d windows installer: update to latest jre Lotus 2013-06-29 18:54:30 +02:00
  • 83763ee4a4 jpeg parser: extract GPS location from meta data reger 2013-06-29 00:35:43 +02:00
  • e92b9275ce Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-06-28 15:33:29 +02:00
  • 56cdcfa2fa fixed greedy learning mode - global is not a search attribute in searchitems Michael Peter Christen 2013-06-28 15:33:19 +02:00
  • 32aa1d4569 removed unused option for queries Michael Peter Christen 2013-06-28 15:32:36 +02:00
  • 0c5bed7e2c added configuration option for greedy learning function to ConfigPortal servlet Michael Peter Christen 2013-06-28 15:31:36 +02:00
  • 5d1f619f07 possible helpful closing of solr-requests sixcooler 2013-06-28 15:19:50 +02:00
  • 9d291764d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-06-28 15:03:25 +02:00
  • e5abccdfe4 added optimize-option sixcooler 2013-06-28 14:51:37 +02:00
  • 8ea6ddf636 removed attributes from ConfigPortal.html which are redundant to ConfigSearchPage_p.html Michael Peter Christen 2013-06-28 14:17:14 +02:00
  • 64140f35cd fix for solr requests if no query part is given (prevent npe) Michael Peter Christen 2013-06-28 13:16:25 +02:00
  • 8caaf6203a fixed false multiple-generation of remote facet search which caused high cpu usage on remote side. Michael Peter Christen 2013-06-28 12:39:36 +02:00
  • 23fb458963 - fix to gsa searchresult answer in case that no query part is given - fix to gsa default number of results (is 'num') Michael Peter Christen 2013-06-28 12:22:33 +02:00
  • 823ae4d6a7 added url_protocol_s to error documents Michael Peter Christen 2013-06-26 16:51:36 +02:00
  • 660a196989 refactoring Michael Peter Christen 2013-06-26 09:27:22 +02:00
  • c4538d8d91 added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib Michael Peter Christen 2013-06-26 09:26:34 +02:00
  • 3760e2616b bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments reger 2013-06-25 23:24:02 +02:00
  • 9a6fcdf597 npe fix Michael Peter Christen 2013-06-25 16:36:16 +02:00