Commit Graph

  • 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods orbiter 2012-07-10 22:59:03 +02:00
  • 28b30231c3 fix for url matcher of multiple amp& in an url, see: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650 orbiter 2012-07-10 17:39:56 +02:00
  • aef9dd0350 - removed cleaning of blacklist cache on startup - added cleaning of blacklist cache if cache is modified in interface - extended cache saving to all cache types - moved cache location to DATA/LISTS - fixed static file path which was relative to the application path but should be relative to data path - which is different in debian and mac implementations Roland 'Quix0r' Haeder 2012-07-10 13:08:16 +02:00
  • c7afa8bc48 using SwitchboardConstants for solr attributes orbiter 2012-07-10 12:01:20 +02:00
  • a99ef68422 bump to httpclient-4.2.1 sixcooler 2012-07-09 18:58:33 +02:00
  • c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-07-09 14:33:11 +02:00
  • 5f3b8dc040 fix for RSS reader orbiter 2012-07-09 14:32:35 +02:00
  • 62202e2d71 refactoring of query attribute variable names for better consistency with (next) stored query words orbiter 2012-07-09 11:14:50 +02:00
  • 2160f9a819 Release 1.04 Release_1.04 Michael Peter Christen 2012-07-09 00:13:59 +02:00
  • 1addbc792c use less memory for md5 cache Michael Peter Christen 2012-07-08 22:05:04 +02:00
  • f32de94723 more logging Michael Peter Christen 2012-07-08 22:04:36 +02:00
  • d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of 240). Michael Peter Christen 2012-07-08 21:25:22 +02:00
  • 434ee90c59 added classification for control file types which shall not be loaded but placed onto the noload-queue Michael Peter Christen 2012-07-08 21:17:33 +02:00
  • 1517a3b7b9 added webm mime-type Michael Peter Christen 2012-07-08 17:59:20 +02:00
  • a90bcb48f6 added webm Michael Peter Christen 2012-07-08 17:58:05 +02:00
  • 801972fe6f fix for url camel case parser and sentence reader Michael Peter Christen 2012-07-08 16:48:09 +02:00
  • fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within small memory configurations Michael Peter Christen 2012-07-08 16:11:50 +02:00
  • 92731e5287 fix for sevenzip parser Michael Peter Christen 2012-07-08 16:11:19 +02:00
  • 45641b0c23 catch and log a warning in RasterPlotter Michael Peter Christen 2012-07-06 09:21:12 +02:00
  • 8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch - more logging for errors Michael Peter Christen 2012-07-06 09:05:41 +02:00
  • c3db015410 prevent loading of content from the cache when retrieval with IFFRESH is used and cache is stale. Should speed up snippet generation when cache strategy is IFFRESH. Michael Peter Christen 2012-07-06 08:29:41 +02:00
  • 91f14ea38e fix to solr configuration (case where the external solr was not online) Michael Peter Christen 2012-07-06 01:29:13 +02:00
  • 2c5b68d932 more abstraction of error message sixcooler 2012-07-05 14:50:37 +02:00
  • 9758c521ab abstraction of error message Michael Peter Christen 2012-07-05 14:27:28 +02:00
  • ef0d09f103 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-07-05 14:24:19 +02:00
  • b1e7c11fba fix for pattern matcher in html parser Michael Peter Christen 2012-07-05 14:24:03 +02:00
  • 8a6edc0031 fix for solr shutdown Michael Peter Christen 2012-07-05 14:23:43 +02:00
  • b8bcc06283 fix for urls beginning with "//" Michael Peter Christen 2012-07-05 14:23:29 +02:00
  • 9b6e4e46ca fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430 sixcooler 2012-07-05 14:06:00 +02:00
  • b0c408788b made class methods static where possible Michael Peter Christen 2012-07-05 12:38:41 +02:00
  • 5bd3c90907 - removed unnecessary semicolons - added default case for switch Michael Peter Christen 2012-07-05 11:18:31 +02:00
  • 132afaf687 removed unaccessible code Michael Peter Christen 2012-07-05 11:09:44 +02:00
  • 7c1ba99755 removed more unused method parameters Michael Peter Christen 2012-07-05 10:44:30 +02:00
  • 83701a1b4c removed unused ImageReference package Michael Peter Christen 2012-07-05 10:24:52 +02:00
  • 0301aba1e9 removed unused method parameters Michael Peter Christen 2012-07-05 10:23:07 +02:00
  • 241dd8410a removed snippet pattern filter - it was not used Michael Peter Christen 2012-07-05 09:21:27 +02:00
  • d3964253ae - added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements Michael Peter Christen 2012-07-05 09:14:04 +02:00
  • ea10766bfd cleaned unnecessary nested code Michael Peter Christen 2012-07-05 08:44:39 +02:00
  • 1481037820 replaced non-generic array with collection Michael Peter Christen 2012-07-05 01:02:51 +02:00
  • 4de50fe808 adding more principal peers for bootstraping Michael Peter Christen 2012-07-05 00:43:41 +02:00
  • fc0f9543fe More SentenceReader cleanup orbiter 2012-07-05 00:20:58 +02:00
  • 586bb0eb6a Simplified SentenceReader (no more Reader inside..) orbiter 2012-07-04 22:06:20 +02:00
  • 7f851d62a7 replaced HashARC with SizeLimited Objects which are less costly orbiter 2012-07-04 21:56:25 +02:00
  • d4291ac1f3 more tolerance when creating solar document orbiter 2012-07-04 21:15:38 +02:00
  • 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class. orbiter 2012-07-04 21:15:10 +02:00
  • bb8dcb4911 automatically adopt size of word cache to available memory orbiter 2012-07-03 18:22:25 +02:00
  • ad09b786bf clean up parser data Michael Peter Christen 2012-07-03 17:20:41 +02:00
  • 276a66a793 Adding a limit of 1000 links that a parser shall store during indexing. A limit was necessary because some web pages have such huge numbers of links that it can easily cause a OOM just by the number of links. The quesion if the number of 1000 links is sufficient or too weak must be answered with the result of testing this feature. Michael Peter Christen 2012-07-03 17:06:20 +02:00
  • 613b45f604 - better data structures in secondary search - fixed a big memory leak in secondary search Michael Peter Christen 2012-07-03 07:12:20 +02:00
  • de903a53a0 parser refactoring & hacks Michael Peter Christen 2012-07-03 06:06:38 +02:00
  • 8a82609360 - smaller caches to save memory - close cloneable iterators to free memory Michael Peter Christen 2012-07-02 15:40:40 +02:00
  • 7249d9c9de bugfix for concurrent seed loader Michael Peter Christen 2012-07-02 14:37:57 +02:00
  • c72d3b12cd concurrently initialize the seed list during p2p network bootstrap Michael Peter Christen 2012-07-02 14:27:37 +02:00
  • 1825f165b8 better integration of blacklist according to use case Michael Peter Christen 2012-07-02 13:57:29 +02:00
  • c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 Michael Peter Christen 2012-07-02 12:20:57 +02:00
  • ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. Michael Peter Christen 2012-07-02 10:27:46 +02:00
  • 0c345d1559 giving threads name so its easier to see whats happening during debugging and within a thread dump Michael Peter Christen 2012-07-02 09:51:43 +02:00
  • 067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages) reger 2012-07-01 00:12:20 +02:00
  • 2f46085be0 more logging Michael Peter Christen 2012-06-30 10:30:01 +02:00
  • 6ea77e5d7a Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Michael Peter Christen 2012-06-29 21:17:24 +02:00
  • 65f56b1fd4 Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Michael Peter Christen 2012-06-29 21:16:20 +02:00
  • 03280fb161 removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index. Michael Peter Christen 2012-06-28 14:27:29 +02:00
  • 508a81b86c added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field. Michael Peter Christen 2012-06-28 13:27:45 +02:00
  • f3167def64 do not fill the keywords with title content if keywords do not exist. Michael Peter Christen 2012-06-27 13:07:02 +02:00
  • 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields. Michael Peter Christen 2012-06-27 12:17:58 +02:00
  • 97f60010d8 fix crawl start from file sixcooler 2012-06-26 16:11:39 +02:00
  • c03d306afa shorter autocommit time (now: 1 second) to prevent that user cannot see results in solr the first time they try it out. The value can now be easily set to a higher number using the IndexFederated_p interface. Michael Peter Christen 2012-06-26 14:53:45 +02:00
  • 0294a53459 - add canonical field only if requested by solr schema - remove canonical url from in/outbound urls if present Michael Peter Christen 2012-06-26 14:51:57 +02:00
  • 3fd4a01286 added option to record urls that are forwarded to the solr index Michael Peter Christen 2012-06-26 13:54:48 +02:00
  • d763e4d94b fixed bad referer computation in SSIs which causes a NPE during host computation. This error was there before the latest IPv6 hack but did not cause a NPE. The IPv6 hack was not the cause for this bug, but it discovered the misconfiguration of the 'referer' referrer. Michael Peter Christen 2012-06-26 11:18:29 +02:00
  • e6792ed37d Merge remote-tracking branch 'original yacy/master' cominch 2012-06-26 10:13:13 +02:00
  • 358b04885e more IPv6 hacks Michael Peter Christen 2012-06-26 00:25:46 +02:00
  • 96aeb127e3 generalized localhost naming. this is also a preparation for a better IPv6 implementation. Michael Peter Christen 2012-06-26 00:08:25 +02:00
  • 77f795756c fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr. Michael Peter Christen 2012-06-25 18:17:31 +02:00
  • 8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly Michael Peter Christen 2012-06-25 14:59:46 +02:00
  • 5d9bd4ddc2 Merge remote-tracking branch 'origin/master' Michael Peter Christen 2012-06-25 11:37:32 +02:00
  • b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage) Michael Peter Christen 2012-06-25 11:34:38 +02:00
  • 2931726386 adjusted NetBeans classpath for new and updated libraries in lib reger 2012-06-24 22:50:08 +02:00
  • cc1b6762bb root, not yacy Michael Peter Christen 2012-06-24 10:58:09 +02:00
  • 2589158f44 changed recommended line in /etc/crontab for high-availability Michael Peter Christen 2012-06-24 10:57:18 +02:00
  • 4156d4e12b Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-06-23 21:22:46 +02:00
  • 7b53be141f upgraded to pdfbox 1.7.0 changes in http://www.apache.org/dist/pdfbox/1.7.0/RELEASE-NOTES.txt with many bugfixes, including performance related Michael Peter Christen 2012-06-22 16:49:58 +02:00
  • fad3b14813 added jetty libraries, needed for future use as web server and as application server for the solr search interface Michael Peter Christen 2012-06-22 15:31:17 +02:00
  • a38b0a2c46 extended embedded solr tests to ensure that it will be usable within a jetty instance Michael Peter Christen 2012-06-22 11:40:02 +02:00
  • b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods Michael Peter Christen 2012-06-22 11:39:17 +02:00
  • a5eb91fa60 refactoring Michael Peter Christen 2012-06-22 00:49:32 +02:00
  • 1be0025a9c - added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr Michael Peter Christen 2012-06-22 00:36:49 +02:00
  • dbdd697f4d moved RDFaParser.xsl configuration file to defaults Michael Peter Christen 2012-06-21 16:09:12 +02:00
  • 90b82ce994 using guava for host resolution (non-blocking for ips) and time-out Michael Peter Christen 2012-06-21 16:04:48 +02:00
  • f094936b89 added new class libraries to mac app Michael Peter Christen 2012-06-21 14:59:55 +02:00
  • e12bb254b4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-21 14:55:50 +02:00
  • 3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will follow later) - added embedded solr connector which can connect to solr programmatically (without using a server in between) Michael Peter Christen 2012-06-21 14:55:38 +02:00
  • c1ba58ae51 Augmented browsing: Small CSS fix cominch 2012-06-21 14:22:32 +02:00
  • b2b205aa38 Augmented browsing: small js fix cominch 2012-06-21 12:02:14 +02:00
  • dc9ee0cdb3 Augmented browsing: CSS fix cominch 2012-06-21 11:19:55 +02:00
  • 74fcc6f8c5 Augmented browsing: small UI modifications cominch 2012-06-21 11:01:02 +02:00
  • 2fccc4e883 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-06-21 01:01:12 +02:00
  • c337190a00 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-20 20:18:10 +02:00
  • c63c3a4495 Show additional interaction elements in footer section on each page, if activated in ConfigPortal.html. This footer is also visible in augmented browsing proxy mode. cominch 2012-06-20 18:04:23 +02:00
  • 786be7d175 better integration of RDFaParser Michael Peter Christen 2012-06-20 16:39:04 +02:00