Commit Graph

  • 241dd8410a removed snippet pattern filter - it was not used Michael Peter Christen 2012-07-05 09:21:27 +02:00
  • d3964253ae - added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements Michael Peter Christen 2012-07-05 09:14:04 +02:00
  • ea10766bfd cleaned unnecessary nested code Michael Peter Christen 2012-07-05 08:44:39 +02:00
  • 1481037820 replaced non-generic array with collection Michael Peter Christen 2012-07-05 01:02:51 +02:00
  • 4de50fe808 adding more principal peers for bootstraping Michael Peter Christen 2012-07-05 00:43:41 +02:00
  • fc0f9543fe More SentenceReader cleanup orbiter 2012-07-05 00:20:58 +02:00
  • 586bb0eb6a Simplified SentenceReader (no more Reader inside..) orbiter 2012-07-04 22:06:20 +02:00
  • 7f851d62a7 replaced HashARC with SizeLimited Objects which are less costly orbiter 2012-07-04 21:56:25 +02:00
  • d4291ac1f3 more tolerance when creating solar document orbiter 2012-07-04 21:15:38 +02:00
  • 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class. orbiter 2012-07-04 21:15:10 +02:00
  • bb8dcb4911 automatically adopt size of word cache to available memory orbiter 2012-07-03 18:22:25 +02:00
  • ad09b786bf clean up parser data Michael Peter Christen 2012-07-03 17:20:41 +02:00
  • 276a66a793 Adding a limit of 1000 links that a parser shall store during indexing. A limit was necessary because some web pages have such huge numbers of links that it can easily cause a OOM just by the number of links. The quesion if the number of 1000 links is sufficient or too weak must be answered with the result of testing this feature. Michael Peter Christen 2012-07-03 17:06:20 +02:00
  • 613b45f604 - better data structures in secondary search - fixed a big memory leak in secondary search Michael Peter Christen 2012-07-03 07:12:20 +02:00
  • de903a53a0 parser refactoring & hacks Michael Peter Christen 2012-07-03 06:06:38 +02:00
  • 8a82609360 - smaller caches to save memory - close cloneable iterators to free memory Michael Peter Christen 2012-07-02 15:40:40 +02:00
  • 7249d9c9de bugfix for concurrent seed loader Michael Peter Christen 2012-07-02 14:37:57 +02:00
  • c72d3b12cd concurrently initialize the seed list during p2p network bootstrap Michael Peter Christen 2012-07-02 14:27:37 +02:00
  • 1825f165b8 better integration of blacklist according to use case Michael Peter Christen 2012-07-02 13:57:29 +02:00
  • c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 Michael Peter Christen 2012-07-02 12:20:57 +02:00
  • ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. Michael Peter Christen 2012-07-02 10:27:46 +02:00
  • 0c345d1559 giving threads name so its easier to see whats happening during debugging and within a thread dump Michael Peter Christen 2012-07-02 09:51:43 +02:00
  • 067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages) reger 2012-07-01 00:12:20 +02:00
  • 2f46085be0 more logging Michael Peter Christen 2012-06-30 10:30:01 +02:00
  • 6ea77e5d7a Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Michael Peter Christen 2012-06-29 21:17:24 +02:00
  • 65f56b1fd4 Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Michael Peter Christen 2012-06-29 21:16:20 +02:00
  • 03280fb161 removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index. Michael Peter Christen 2012-06-28 14:27:29 +02:00
  • 508a81b86c added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field. Michael Peter Christen 2012-06-28 13:27:45 +02:00
  • f3167def64 do not fill the keywords with title content if keywords do not exist. Michael Peter Christen 2012-06-27 13:07:02 +02:00
  • 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields. Michael Peter Christen 2012-06-27 12:17:58 +02:00
  • 97f60010d8 fix crawl start from file sixcooler 2012-06-26 16:11:39 +02:00
  • c03d306afa shorter autocommit time (now: 1 second) to prevent that user cannot see results in solr the first time they try it out. The value can now be easily set to a higher number using the IndexFederated_p interface. Michael Peter Christen 2012-06-26 14:53:45 +02:00
  • 0294a53459 - add canonical field only if requested by solr schema - remove canonical url from in/outbound urls if present Michael Peter Christen 2012-06-26 14:51:57 +02:00
  • 3fd4a01286 added option to record urls that are forwarded to the solr index Michael Peter Christen 2012-06-26 13:54:48 +02:00
  • d763e4d94b fixed bad referer computation in SSIs which causes a NPE during host computation. This error was there before the latest IPv6 hack but did not cause a NPE. The IPv6 hack was not the cause for this bug, but it discovered the misconfiguration of the 'referer' referrer. Michael Peter Christen 2012-06-26 11:18:29 +02:00
  • e6792ed37d Merge remote-tracking branch 'original yacy/master' cominch 2012-06-26 10:13:13 +02:00
  • 358b04885e more IPv6 hacks Michael Peter Christen 2012-06-26 00:25:46 +02:00
  • 96aeb127e3 generalized localhost naming. this is also a preparation for a better IPv6 implementation. Michael Peter Christen 2012-06-26 00:08:25 +02:00
  • 77f795756c fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr. Michael Peter Christen 2012-06-25 18:17:31 +02:00
  • 8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly Michael Peter Christen 2012-06-25 14:59:46 +02:00
  • 5d9bd4ddc2 Merge remote-tracking branch 'origin/master' Michael Peter Christen 2012-06-25 11:37:32 +02:00
  • b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage) Michael Peter Christen 2012-06-25 11:34:38 +02:00
  • 2931726386 adjusted NetBeans classpath for new and updated libraries in lib reger 2012-06-24 22:50:08 +02:00
  • cc1b6762bb root, not yacy Michael Peter Christen 2012-06-24 10:58:09 +02:00
  • 2589158f44 changed recommended line in /etc/crontab for high-availability Michael Peter Christen 2012-06-24 10:57:18 +02:00
  • 4156d4e12b Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-06-23 21:22:46 +02:00
  • 7b53be141f upgraded to pdfbox 1.7.0 changes in http://www.apache.org/dist/pdfbox/1.7.0/RELEASE-NOTES.txt with many bugfixes, including performance related Michael Peter Christen 2012-06-22 16:49:58 +02:00
  • fad3b14813 added jetty libraries, needed for future use as web server and as application server for the solr search interface Michael Peter Christen 2012-06-22 15:31:17 +02:00
  • a38b0a2c46 extended embedded solr tests to ensure that it will be usable within a jetty instance Michael Peter Christen 2012-06-22 11:40:02 +02:00
  • b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods Michael Peter Christen 2012-06-22 11:39:17 +02:00
  • a5eb91fa60 refactoring Michael Peter Christen 2012-06-22 00:49:32 +02:00
  • 1be0025a9c - added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr Michael Peter Christen 2012-06-22 00:36:49 +02:00
  • dbdd697f4d moved RDFaParser.xsl configuration file to defaults Michael Peter Christen 2012-06-21 16:09:12 +02:00
  • 90b82ce994 using guava for host resolution (non-blocking for ips) and time-out Michael Peter Christen 2012-06-21 16:04:48 +02:00
  • f094936b89 added new class libraries to mac app Michael Peter Christen 2012-06-21 14:59:55 +02:00
  • e12bb254b4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-21 14:55:50 +02:00
  • 3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will follow later) - added embedded solr connector which can connect to solr programmatically (without using a server in between) Michael Peter Christen 2012-06-21 14:55:38 +02:00
  • c1ba58ae51 Augmented browsing: Small CSS fix cominch 2012-06-21 14:22:32 +02:00
  • b2b205aa38 Augmented browsing: small js fix cominch 2012-06-21 12:02:14 +02:00
  • dc9ee0cdb3 Augmented browsing: CSS fix cominch 2012-06-21 11:19:55 +02:00
  • 74fcc6f8c5 Augmented browsing: small UI modifications cominch 2012-06-21 11:01:02 +02:00
  • 2fccc4e883 Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-06-21 01:01:12 +02:00
  • c337190a00 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-20 20:18:10 +02:00
  • c63c3a4495 Show additional interaction elements in footer section on each page, if activated in ConfigPortal.html. This footer is also visible in augmented browsing proxy mode. cominch 2012-06-20 18:04:23 +02:00
  • 786be7d175 better integration of RDFaParser Michael Peter Christen 2012-06-20 16:39:04 +02:00
  • fa98657bb3 Augmented Browsing: changed the settings page cominch 2012-06-20 09:10:39 +02:00
  • 751eeade0d Merge remote-tracking branch 'original yacy/master' cominch 2012-06-20 07:58:27 +02:00
  • 84a11ec48c Corrected loading of default page settings on ConfigPortal.html cominch 2012-06-20 07:55:28 +02:00
  • bea002dc15 correct table in new look of Crawler_p sixcooler 2012-06-19 13:13:00 +02:00
  • 15f4551d88 Release 1.03 Release_1.03 Michael Peter Christen 2012-06-19 08:51:26 +02:00
  • 8738336408 set Xms lower than Xmx Michael Peter Christen 2012-06-19 08:45:49 +02:00
  • de3ef8ad73 removed unimportant warnings Michael Peter Christen 2012-06-19 08:45:34 +02:00
  • f7c43e964c enable asserts only with debugging Michael Peter Christen 2012-06-19 08:23:10 +02:00
  • 82a682b31d fixed problem with seed when switching network Michael Peter Christen 2012-06-19 07:44:44 +02:00
  • b89a69ae2e Merge branch 'master' of git://gitorious.org/yacy/rc1.git reger 2012-06-19 03:04:44 +02:00
  • 6b4545d6b0 Only load tag information if necessary cominch 2012-06-19 01:40:22 +02:00
  • 011f8a5818 Auto Tagging: Add hyperlinks to tags (provisional) cominch 2012-06-19 01:24:06 +02:00
  • 8c544edee4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-18 21:26:06 +02:00
  • 7dc59979bc fix for npe, possibly for http://bugs.yacy.net/view.php?id=195 Michael Peter Christen 2012-06-18 21:25:39 +02:00
  • 1d4e206b2b bugfix in vocabulary generation Michael Peter Christen 2012-06-18 18:10:40 +02:00
  • 2c89975378 Merge remote-tracking branch 'original yacy/master' cominch 2012-06-18 16:16:46 +02:00
  • 71047fe63a Augmented browsing: CSS fix cominch 2012-06-18 16:16:31 +02:00
  • 52f5d40043 better abstraction of document model generation Michael Peter Christen 2012-06-18 15:55:18 +02:00
  • 8b7c4d3144 produce a rdf output containing the triplestore with yacydoc; ie: http://localhost:8090/api/yacydoc.rdf?urlhash=yOiCM7Fh1hyQ Michael Peter Christen 2012-06-18 15:47:54 +02:00
  • f7160dae5c Merge remote-tracking branch 'original yacy/master' cominch 2012-06-18 15:44:50 +02:00
  • e4555cbee3 Augmented browsing: Pass on additional action parameter cominch 2012-06-18 15:44:01 +02:00
  • 24bbe359ca integrate also geonames library files for less cities. these are more useful for tagging since less normal words are false-identified as location Michael Peter Christen 2012-06-18 15:19:57 +02:00
  • 5a41e739b4 better apilink description Michael Peter Christen 2012-06-18 13:04:20 +02:00
  • e16e4bd2ba added ontology extraction in xml as api call for vocabularies Michael Peter Christen 2012-06-18 13:02:12 +02:00
  • 8cf47a8335 Merge remote-tracking branch 'original yacy/master' cominch 2012-06-18 12:11:07 +02:00
  • b85f01a14e Augmented browsing: small UI fix cominch 2012-06-18 12:01:03 +02:00
  • 223a5440ab preventing that an empty pnd is inserted into the vocabularies Michael Peter Christen 2012-06-18 01:22:39 +02:00
  • 8e97ada7c9 IPv6 bugfix Michael Peter Christen 2012-06-18 00:33:32 +02:00
  • 26cb1c65c2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-06-17 23:50:44 +02:00
  • 963f92ed9a - merged files - changed behaviour of delete button in vocabulary edit - fixed size numbe in vocabulary listing Michael Peter Christen 2012-06-17 23:48:33 +02:00
  • d8815db877 Merge remote-tracking branch 'original yacy/master' cominch 2012-06-17 23:07:00 +02:00
  • e4dab19045 Augmented Browsing: added template for document info bar cominch 2012-06-17 23:05:53 +02:00
  • dd88d0ace2 more logging Michael Peter Christen 2012-06-17 19:03:53 +02:00
  • 743b0ec89f - added size of vocabulary to vocabulary view - fixed bad terms in vocabulary-from-titles autogeneration Michael Peter Christen 2012-06-17 17:32:52 +02:00
  • be928815fc fixed wrong parsing of style and script Michael Peter Christen 2012-06-17 17:18:19 +02:00