Commit Graph

  • 8879cc1db2 removed System.out.println orbiter 2011-04-28 14:08:02 +00:00
  • c493f101c0 added one more script file to release build script orbiter 2011-04-28 13:19:24 +00:00
  • 528da7c9ea removed unused class and added license header for new class orbiter 2011-04-28 13:14:30 +00:00
  • f6077b3cc0 added more attributes for html parser and enhanced data structures orbiter 2011-04-28 13:09:01 +00:00
  • 0b02083e97 * function for simple crawl of one url f1ori 2011-04-28 13:04:33 +00:00
  • d671de8c17 add ranking weight to json-search-results f1ori 2011-04-28 11:18:14 +00:00
  • 4eb9c1e7c3 not setting userAgent from Constructor as default for following calls sixcooler 2011-04-26 17:39:16 +00:00
  • d8e934c085 better abstraction of http client identification orbiter 2011-04-26 13:35:29 +00:00
  • a3e707283d not using HTTPConnector anymore sixcooler 2011-04-26 11:46:31 +00:00
  • 9f1f47ec67 added some comments to explain the isLocal patch orbiter 2011-04-21 21:59:56 +00:00
  • b77b8cac0c - enhanced html parser: recognized much more details in the content - added more properties to solr index - refactoring - more constants in switchboard - fix for some NPEs - recognition of more images - removed synchronization in HandleMap (obviously not necessary?) - added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local. orbiter 2011-04-21 13:58:49 +00:00
  • bc84d2bc9d *) fixed typo in stop script *) added <u> </u> tags for underlined text in Wiki Code *) minor code changes low012 2011-04-20 22:54:29 +00:00
  • b2281f0b7d YMark: intermediate work towards flexigrid support apfelmaennchen 2011-04-20 22:33:01 +00:00
  • 06d50fd801 *) fixed stupid bug (introduced in r7663 by myself) which caused wrong parsing of Wiki pages low012 2011-04-20 17:27:59 +00:00
  • 60412d2bb3 YMark: - more refactoring >> YMarkEntry - integration of SurrogateReader as bookmark importer - various small bug fixes e.g. get_xbel.xml apfelmaennchen 2011-04-18 21:42:14 +00:00
  • 7c149e0f9d *) ./stopYACY:sh -f kills YaCy in case regular shutdown does not work low012 2011-04-18 19:09:54 +00:00
  • 3d5104d357 - fixed a bug in crawl start with file name (npe in new url) - added deletion of solr index in IndexControlRWIs - added asynchronous adding of large url lists (happens when crawls are startet with file) - fixed npe in Image display - replaced language warning with fine logging - added a domain name cache in Domains that helps to speed up the isLocal property (less DNS lookups) - added a new storage class for this new cache: KeyList. The domain key list is stored in DATA/WORK/globalhosts.list - added concurrent solr updates and chunked transfers (50 documents until a commit is done) for high-speed feeding (> 40000 ppm) - fixed a bug in content scraper that chopped off large parts of crawl lists (using crawl start from file) orbiter 2011-04-18 16:11:16 +00:00
  • 08108f0ece fix for http://bugs.yacy.net/view.php?id=12 orbiter 2011-04-17 22:53:15 +00:00
  • fd3baa9025 fix for http://bugs.yacy.net/view.php?id=24 orbiter 2011-04-17 22:37:04 +00:00
  • 2e9694c9e9 *) removed recursion which hopefully prevents exception *) fixed bug in creation of table of content which caused double entries if a page was previewed more than once low012 2011-04-17 21:02:18 +00:00
  • a2e86daae9 YMark: more bug fixes apfelmaennchen 2011-04-16 22:09:50 +00:00
  • 62855f9567 YMark: code clean up and some small fixes apfelmaennchen 2011-04-16 21:19:42 +00:00
  • 667e912b19 YMark: - some improvements to firefox json bookmark importer - test import with: /api/ymarks/test_import.html - view ymarks with: /api/ymarks/test_treeview.html apfelmaennchen 2011-04-16 09:09:33 +00:00
  • 0abd99621c correct slip of click in classpath from last commit - I wonder there are 7658'is around apflemaenchen, please don't take this amiss sixcooler 2011-04-16 03:08:25 +00:00
  • a0e4960a4d YMark: - first attempt for a firefox json bookmark importer - added JSON library json-simple-1.1.jar apfelmaennchen 2011-04-15 20:58:58 +00:00
  • 958ff4778e enhanced location search: search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found. This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed. orbiter 2011-04-15 15:54:19 +00:00
  • 8d63f3b70f just cosmetics - keeping my baby clean :-) sixcooler 2011-04-15 00:48:39 +00:00
  • e402622584 removed httpclient-3.1 (this was added with last commit which was a mistake) the httpclient is required by solrj but no class from solrj is used which references to httpclient-3.1 Instead the YaCy http client library based on the apache http client 4.1 is used using a wrapper class which is in net.yacy.cora.services.federated.solr.SolrHTTPClient orbiter 2011-04-14 20:12:14 +00:00
  • 19fd13d3bc Added federated index storage to solr. YaCy supports now the storage to remote solr indexes. More federated storage (and search) methods may follow. orbiter 2011-04-14 20:05:04 +00:00
  • c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments orbiter 2011-04-13 22:04:23 +00:00
  • b788182954 some enhancements to scoring speed orbiter 2011-04-13 15:17:00 +00:00
  • 13724ddd43 * caching in proxy Florian Richter 2011-04-13 15:28:28 +02:00
  • 01690eab86 fix for mediawiki importer and wikicode parser orbiter 2011-04-13 13:22:27 +00:00
  • c5352e6872 added new SearchResult class (to be used later) orbiter 2011-04-13 06:16:31 +00:00
  • 4c013d9088 more UTF8 getBytes() performance hacks orbiter 2011-04-12 05:02:36 +00:00
  • b6d67507db * implement proxy Florian Richter 2011-04-09 11:48:44 +02:00
  • 78d6d6ca06 refactoring for ymarks apfelmaennchen 2011-04-08 21:15:10 +00:00
  • 399d7d6878 * fix permissions of bin/-folder in debian package f1ori 2011-04-07 07:31:17 +00:00
  • 9ac02caf00 different initialization of empty variables in alternative constructor. This leads to wrong interpretation of user credentials, resulting in unnecessary "@" in front of host, and different urlhash values. cominch 2011-04-06 10:59:31 +00:00
  • a47bdc405b better logging for robinson selection according to peer tag orbiter 2011-04-05 08:04:25 +00:00
  • cafcb1f9ed removed the DNS resolving for web structure computation from the indexing queue and placed it in a concurrent computation queue that does not block the crawler. Makes crawling faster and less DNS-speed-dependent orbiter 2011-04-04 22:01:07 +00:00
  • 57ce1fb491 reverted synchronization from SVN 7641 orbiter 2011-04-04 20:31:02 +00:00
  • 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10 orbiter 2011-04-04 12:20:20 +00:00
  • 7c8e764201 removed synchronization again... orbiter 2011-04-04 10:13:30 +00:00
  • 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling orbiter 2011-04-04 09:47:18 +00:00
  • b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer (directly or indirectly) and it grants a crawl-delay of 0. Then all forced pause mechanisms in YaCy are switched off and the domain is crawled at full speed. crawl delay values can be assigned to either - all yacy peers using the user-agent yacybot - a specific peer with peer name <peer-name>.yacy or - a specific peer with peer hash <peer-hash>.yacyh orbiter 2011-04-03 23:39:45 +00:00
  • 21fe5e6c6a * add bin-folder to debian package f1ori 2011-04-02 10:58:56 +00:00
  • e25c1f2ea3 *) preventing whitespace keys in config file low012 2011-04-02 09:25:07 +00:00
  • cb6f709a16 - enhancements in surrogate reading - better display of map in location search orbiter 2011-04-02 00:11:37 +00:00
  • 1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users) *) cleaned up code a little bit low012 2011-04-01 23:32:40 +00:00
  • 564184909a enhanced the surrogate parser: better reading of UTF-8 characters orbiter 2011-04-01 11:05:42 +00:00
  • 156cf02703 - added an index constraint 'has location' to the condenser - added evaluation of the 'has location' constraint to search using the /location operator orbiter 2011-03-31 09:41:30 +00:00
  • 41b8d7f655 fix for url normalization (no backpath resolving in post parameters) orbiter 2011-03-31 09:40:01 +00:00
  • 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages - added parser for in-text appearing geo-locations - added geo-locations to rss search result - added evaluation of metadata-attached geo-locations in yacysearch_location to show search results within a map orbiter 2011-03-30 23:26:36 +00:00
  • 8412f8787d fix for http://bugs.yacy.net/view.php?id=8 orbiter 2011-03-30 08:17:25 +00:00
  • 9b25d07295 - added geo information parsing to html parser - extended metadata information in index with geolocalisation - added display of location in yacydoc and ViewFile orbiter 2011-03-30 00:49:47 +00:00
  • efcf37a953 * show info in log, if robots.txt is rejected due to wrong mime-type f1ori 2011-03-28 19:55:15 +00:00
  • cbf87fe72f write PID to yacy.running lotus 2011-03-26 15:11:29 +00:00
  • 351d264a48 * yacy domain handler for jetty * rewrite from / to /index.html Florian Richter 2011-03-26 00:18:48 +01:00
  • 06afa94f9d hups lotus 2011-03-24 06:24:37 +00:00
  • a9a9db98c8 better rename modified version lotus 2011-03-23 20:22:08 +00:00
  • e19ca27004 do not autocomplete on mouseover. this has resulted in unwanted autocomplete. fixes bug #3 lotus 2011-03-23 20:13:43 +00:00
  • 16cd919795 *) fixed Exceptions which caused 500 error when entering invalid URL mask or invalid prefer mask, invalid masks are ignored, error message is displayed on yacysearch.html (what about yacysearch.rss and yacysearch.json?) *) fixed "more options" link on yacysearch.html low012 2011-03-23 00:48:19 +00:00
  • 1a24917cea *) fixed NPE which occured when empty String was entered as search word low012 2011-03-23 00:44:38 +00:00
  • 01b968d836 better concurrency in ViewImage icon cache and OOM protection for too large icon caches orbiter 2011-03-22 11:00:55 +00:00
  • b1a8d0c020 enhancements to web cache and less strict caching rules orbiter 2011-03-22 10:35:26 +00:00
  • f3baaca920 - enhancements to DNS IP caching and crawler speed - bugfixes (NPEs) orbiter 2011-03-22 09:34:10 +00:00
  • e7860b1239 *) <mode="Homer">D'oh!</Homer> low012 2011-03-21 22:23:20 +00:00
  • 82f1580a60 *) trying to fix ConcurrentModificationException low012 2011-03-21 22:20:19 +00:00
  • df71776929 * fix bug #7 * log requires poison to finish, so Base64Order main-function doesn't finish, when called from debian configure script f1ori 2011-03-21 19:42:22 +00:00
  • 9f0286b380 *) fixed potential "java.lang.IllegalArgumentException: Illegal group reference" which occured if special characters which are also used as metacharacters in regular expression were used inside of <pre>...</pre> (see: http://veerasundar.com/blog/2010/01/java-lang-illegalargumentexception-illegal-group-reference-in-string-replaceall/) low012 2011-03-21 18:02:09 +00:00
  • 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated. This change should affect filtering and navigators and should cause that search navigation gets faster orbiter 2011-03-21 13:05:51 +00:00
  • ba03ca8620 added more configuration options for search: - removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal - added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal - added configuration of navigation options to ConfigPortal - added an option to switch off automatic index cleaning in case that a link verification method fails orbiter 2011-03-21 07:50:34 +00:00
  • e0c7d490f9 * fix bug #6 * exclude signature files from auto-deletion of unknown files in DATA/RELEASE f1ori 2011-03-20 17:59:58 +00:00
  • 18ec7fe53c added a clearall.sh script that deletes the complete index and everything else that belongs to crawling orbiter 2011-03-20 08:36:29 +00:00
  • d98884f1d5 added script for importmediawiki.sh in build.xml orbiter 2011-03-19 23:58:11 +00:00
  • a50f28e6e7 - fixed missing save operation for peer name change - fixed import of mediawiki dump files - added script to add mediawiki dump files orbiter 2011-03-19 23:52:09 +00:00
  • 2b5f8585bf performance hack for Balancer and ip address parsing orbiter 2011-03-17 21:09:18 +00:00
  • 43e1660512 fix/enhancement in Crawler: do not generate domain match pattern if crawl depth is 0 orbiter 2011-03-17 21:07:44 +00:00
  • b1d133b69f another anhancement to the ThreadDump function: better multiple dumps and filtering out of not interesting dump parts orbiter 2011-03-17 20:48:39 +00:00
  • f25cc4407d * authentication complete (using old credentials from config file) Florian Richter 2011-03-17 20:40:05 +01:00
  • a35d513bd8 fix for not-deleted .gap and .idx files see also: http://forum.yacy-websuche.de/viewtopic.php?p=22128#p22128 orbiter 2011-03-17 17:09:19 +00:00
  • 7cfd3762d9 * authentication implemented with own securityhandler Florian Richter 2011-03-16 17:39:31 +01:00
  • a6935e7dc8 fix for active dns resolving: do not resolve in case that the dns server is not available (offline mode) orbiter 2011-03-16 07:05:10 +00:00
  • 859c99886c fix for multiple thread dump orbiter 2011-03-15 23:05:51 +00:00
  • 61acf55da4 avoided using a synchronized(this) for the hash computation to prevent that the lock on the object is (accidently) stolen by another thread and replaced this synchronization using the protocol object. Made also the protocol object final. orbiter 2011-03-15 09:52:39 +00:00
  • c2a968c23f fix for bug in formatting in ThreadDump and added hint for linux/Mac users that they may use the LOCKED feature using the start option -l orbiter 2011-03-15 08:39:05 +00:00
  • 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions low012 2011-03-15 01:03:35 +00:00
  • 078ecacf61 avoid synchronization in DigestURI hash requests orbiter 2011-03-15 00:47:30 +00:00
  • 68ca0fbb2e * add copyright info * implement basic authentication * update jetty to 7.3.0 Florian Richter 2011-03-15 00:33:36 +01:00
  • 1989ebc24b removed more warnings orbiter 2011-03-14 22:52:30 +00:00
  • 0324de1467 removed debug line orbiter 2011-03-14 21:34:42 +00:00
  • 1aba7869bf patch for Windows: do not use the thread lock feature from previous commit if used on Windows orbiter 2011-03-14 21:33:36 +00:00
  • 0a11727374 added new feature for Thread dump: "THREADS WITH STATES: LOCK FOR OTHERS" will show only such threads that lock other threads. This is the 'opposite part' of the blocked threads. Because that this uses a thread dump that is produced with a kill -3 on the PID of the process and such thread dumps are written by the Java core outside of System.out and Sytem.err it is necessary to read the dump from a log in the file system. Such a log is only written if YaCy is started with startYACY.sh on a linux system. That means: this feature is only available on linux and Mac OS X if YaCy is started with ./startYACY.sh -l orbiter 2011-03-14 21:32:20 +00:00
  • b62b79675b removed type cast warnings orbiter 2011-03-14 21:08:18 +00:00
  • a07a1a8b1e removed type cast warnings orbiter 2011-03-14 21:07:15 +00:00
  • 8edaccfedf removed unused variables orbiter 2011-03-14 21:03:37 +00:00
  • e6c3507b17 disabled some of the previous changes (did not work in openjdk) orbiter 2011-03-14 20:48:36 +00:00
  • f9e5c21083 update to thread dump logs orbiter 2011-03-14 20:46:04 +00:00
  • ed3bcfaf71 * SSI work with jetty, it's pretty usable now Florian Richter 2011-03-14 21:17:01 +01:00