Commit Graph

  • 4fc876f4a3 revert back to use EntityUtils.consumeQuietly - as it simply closes the underlying stream sgaebel 2021-10-31 00:27:10 +02:00
  • 4f0392e93e refactor use of AuthSchemeProvider sgaebel 2021-10-30 21:07:26 +02:00
  • b74f337859 removes double setting of UserAgent sgaebel 2021-10-30 20:07:36 +02:00
  • 965748fefb some refactoring using try with resources sgaebel 2021-10-30 16:15:57 +02:00
  • f4834e8e31 link fix Michael Christen 2021-10-29 11:10:23 +02:00
  • 7f5d3e3a12 fixed name Michael Christen 2021-10-29 11:07:34 +02:00
  • 552ab7051b fix for warc importer Michael Peter Christen 2021-10-25 19:35:15 +02:00
  • 3c86b7b780 attempt to make a Mac Release using gradle This is almost working with many workarounds: - run rm lib/yacycore.jar - run ./gradlew clean build bundleNative - run ant clean all - run again rm lib/yacycore.jar - run ./fixMacBuild.sh Michael Peter Christen 2021-10-25 18:37:39 +02:00
  • 49cae8ca62 network bootstraping addresses update Michael Peter Christen 2021-10-25 18:32:57 +02:00
  • 8e4383c49e downgrading gradle to 6.9 to be able to support org.mini2Dx.parcl Michael Peter Christen 2021-10-25 18:32:34 +02:00
  • 999c819e3e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Michael Peter Christen 2021-10-24 20:50:14 +02:00
  • fd770e90e2 spike to identify paths for YaCy within mac application bundles Michael Peter Christen 2021-10-24 20:49:59 +02:00
  • d19872fd26 making sure that crawl queues are closed correctly to prevent data loss Michael Peter Christen 2021-10-14 00:30:04 +02:00
  • 90507c0fdc comments out printing query params to std.out sgaebel 2021-10-04 18:03:06 +02:00
  • be0aebad84 fixes https://github.com/yacy/yacy_search_server/issues/424 Michael Peter Christen 2021-10-04 14:38:49 +02:00
  • 63ad8ce6b2 removed ymarks had not been used since a long time Michael Peter Christen 2021-09-16 22:23:51 +02:00
  • ef5a71a592 enhanced crawl start response time for very very large crawl start lists Michael Peter Christen 2021-09-16 21:01:01 +02:00
  • 1bab4ffe20 calculating the correct size of an export. This can be seen as a fix for https://github.com/yacy/yacy_search_server/issues/343 however, the export was not flawed, it is just the impression that something is wrong, but the export size must be smaller than the index size because the index also containers error documents. Now an information line is presented that shows i.e.: "The local index currently contains 181,319 documents, only 106,887 exportable with status code 200 - the remaining are error documents." Michael Peter Christen 2021-09-16 01:05:09 +02:00
  • 4cadd557dc removed synchronization in table creation to avoid possible deadlocks when handling OnDemandOpenFileIndex which happens quite often during wide crawling Michael Peter Christen 2021-09-15 19:34:49 +02:00
  • 8084960392 disabled citation index that was created but never used Michael Peter Christen 2021-09-15 18:46:37 +02:00
  • 9b7668fa58 reduced memory footprint during indexing/crawling admin 2021-08-24 12:24:52 +02:00
  • fbf8ddd32d upgrade of jsoup 1.12.1 -> 1.14.2 admin 2021-08-24 12:23:57 +02:00
  • 53518a91ab In case of reload404, load only failed documents Ian Smirlis 2021-08-19 20:49:59 +03:00
  • 4c889b7ff9 fixed build paths Michael Peter Christen 2021-08-18 19:05:44 +02:00
  • 683cac125f updated bouncy castle 1.60 -> 1.69 Michael Peter Christen 2021-08-17 15:48:54 +02:00
  • e6a87e0426 enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader. Michael Peter Christen 2021-08-17 15:23:21 +02:00
  • e9c5e78868 replaced new Number(Number) with Number.instanceOf to remove deprecation warnings for Java 9 Michael Peter Christen 2021-08-08 00:39:03 +02:00
  • 9e13d77de4 removed call to class.finalize() because of deprecation in java 9 next: removal of finalize() implementation after testing with assert false Michael Peter Christen 2021-08-07 18:57:49 +02:00
  • 9ef4503672 fixed some newInstance() warnings .. by adding .getDeclaredConstructor() Michael Peter Christen 2021-08-07 18:46:53 +02:00
  • 82df012442 removed old lib Michael Peter Christen 2021-08-07 18:23:22 +02:00
  • 8a2adb2b15 upgraded commons-compress lib cause: alert in https://github.com/yacy/yacy_search_server/security/dependabot/pom.xml/org.apache.commons:commons-compress/open Michael Peter Christen 2021-08-07 18:21:54 +02:00
  • 9182b3dfca enhanced default value Michael Peter Christen 2021-08-05 09:18:05 +02:00
  • 294d56d4a2 addressing better GC behavior after removing Xms with earlier heap increase strategy Michael Peter Christen 2021-08-05 09:16:52 +02:00
  • 3959d43a5c fixed doku link Michael Peter Christen 2021-08-03 16:57:24 +02:00
  • c4659f0fb0 removed Debian and Red Hat build process as announced in https://twitter.com/yacy_search/status/1414608643241152516 because of lack of community support for these kind of distributions. We will still support tarball, Windows, Mac and Docker releases. Michael Peter Christen 2021-07-19 20:33:52 +02:00
  • 73360ed52b add gradle to gitignore Michael Peter Christen 2021-07-19 20:12:03 +02:00
  • 15b7461bc7 removed Xms java memory startup parameter We will use the default value for now on. This is much better for resource economy and fits better into a container/docker/kubernetes strategy. Furthermore, a small memory footprint is essential for the usage on small devices like RaspberryPi. Michael Peter Christen 2021-07-19 20:04:11 +02:00
  • c3b3087077 gradle cleanup admin 2021-07-14 14:07:49 +02:00
  • a13986d659 replaced maven with gradle admin 2021-07-14 13:58:30 +02:00
  • 1d41380f0a better support for mac-specific tray functions in java 9 Michael Peter Christen 2021-07-12 17:27:59 +02:00
  • 4377bd2b70 fix for wrong crawlName construction Michael Peter Christen 2021-06-30 18:03:54 +02:00
  • e81b770f79 enabled crawl starts with very large sets of start urls i.e. 10MB large url list with approx 0.5 million start points Michael Peter Christen 2021-06-30 10:45:58 +02:00
  • 4b73b3f9f2 docker has no latest-alpine frankenstein91 2021-06-20 22:27:50 +02:00
  • c623a3252e fix for jdk 14 bug Michael Peter Christen 2021-04-23 09:11:03 +02:00
  • dbd211a1ad removed/replaced reflection in memory tool Michael Peter Christen 2021-04-22 20:24:13 +02:00
  • 160f00e59e removed reconfigure script which is seven years old any may not up to standards of current password implementation. See https://github.com/yacy/yacy_search_server/issues/409 as hint Michael Peter Christen 2021-04-15 20:41:01 +02:00
  • 1cdb21592b added hazelcast and some modifications to align legacy YaCy with YaCyGrid Michael Peter Christen 2021-04-15 20:39:22 +02:00
  • 42ea2a1c6f Merge pull request #405 from jfhs/jfhs/support-all-html-entities Michael Christen 2021-03-31 01:44:54 +02:00
  • b2af745dd6 Merge pull request #404 from lnceballosz/master Michael Christen 2021-03-30 23:48:21 +02:00
  • 10bddc2c2d Decode HTML entities in all property values by default jfhs 2021-03-30 21:30:52 +02:00
  • 2135d259e3 Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities jfhs 2021-03-30 20:58:47 +02:00
  • 8f876a8c72 added concurrency to enhance indexing speed during json surrogate import Michael Peter Christen 2021-03-30 12:07:36 +02:00
  • f8cbaeef93 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git Michael Peter Christen 2021-03-29 18:46:53 +02:00
  • a857e3d3d5 fix for json importer Michael Peter Christen 2021-03-29 18:46:42 +02:00
  • 7fecd859e5 fixes showing metadata from Searchresult, by removing defType=edismax also removes defType=edismax from IndexBrowser, but still does not show dates sgaebel 2021-03-21 00:04:55 +01:00
  • 1546232c94 adds ranking for multi document queries only sgaebel 2021-03-20 17:10:21 +01:00
  • 93b353d22d does not boost or add fields for zero-row-queries (exists()) sgaebel 2021-03-20 16:29:16 +01:00
  • f16cd154f7 removes unused imports and variables sgaebel 2021-03-20 15:14:09 +01:00
  • c69c462a15 replaces a expensive getLoadTimeURL() by exists() refactors urlExists to getHarvestProcess as that is what it does sgaebel 2021-03-20 14:46:38 +01:00
  • a5488ac8f5 uses edismax queries on query counts > 1 only sgaebel 2021-03-20 01:04:15 +01:00
  • 26223dc25a replaces getLoadTime() by exists() with a simpler query since solr-8.8.1 getLoadTime() causes a high cpu usage sgaebel 2021-03-20 00:35:30 +01:00
  • 8e4d014c06 removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the log sgaebel 2021-03-18 22:05:15 +01:00
  • 88c6bc8cd7 adds missing solr lib: opentracing 0.33.0 sgaebel 2021-03-18 19:36:04 +01:00
  • 139b5a4033 improving license info in README Lina Ceballos 2021-03-11 12:23:53 +01:00
  • a96752f5ab adding SPDX license and copyright headers Lina Ceballos 2021-03-11 12:17:11 +01:00
  • 221038f16d creating LICENSES directory Lina Ceballos 2021-03-11 12:16:37 +01:00
  • e18d0ef544 trying to set a higher priority to the process that is involved in index export Michael Peter Christen 2021-03-09 00:04:05 +01:00
  • c552a2845f added new commons library (missed in latest commit) Michael Peter Christen 2021-03-08 13:39:48 +01:00
  • 8b4394a6c5 fixes for solr 8.8.1 migration - replace new guava 30 with older 25 because that is the correct dependency for solr 8.8.1. The newer one did actually not work! - index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1 subfolder. The older solr_6_6 index is not touched but also not migrated. The index starts with fresh (empty) content. - Older indexes must be migrated by hand (export/import) so far until a better solution is found. - Large schema adoptions for lucene 8.8.1 Michael Peter Christen 2021-03-08 13:39:27 +01:00
  • 3befaaf4f1 reformatting pom.xml to make it easier to update it with recent library versions Michael Peter Christen 2021-03-08 00:41:41 +01:00
  • dffe9e1c23 Merge pull request #402 from SebastianoPistore/junitUpdate Michael Christen 2021-03-06 13:45:11 +01:00
  • 7c86826db3 new version for solr 8 ATTENTION: old indexes from solr 6 CANNOT be migrated to solr 8 DO NOT use this version if you still have a solr 6 index. Michael Peter Christen 2021-03-06 13:37:06 +01:00
  • ed9789214e fixed seed initialization problem Michael Peter Christen 2021-03-06 13:35:46 +01:00
  • f4f3808d43 added missing new dependencies for migration to Solr 8 after pulling https://github.com/yacy/yacy_search_server/pull/403 Michael Peter Christen 2021-03-06 13:35:32 +01:00
  • ffe8786d69 Merge pull request #403 from alsutton/address_security_issues Michael Christen 2021-03-06 12:58:56 +01:00
  • f4dd6e6d41 Update Lucene to 8.8.1 Al Sutton 2021-03-04 17:42:59 +00:00
  • 721dd3e1ba Update Guava to match version pulled through from solr dependencies Al Sutton 2021-03-04 17:32:07 +00:00
  • b5203de923 Update ant build solr dependency to 8.8.1 Al Sutton 2021-03-04 16:48:10 +00:00
  • 8ade8b8775 Remove forced clear to match new behaviour in 2da71c2a40 Al Sutton 2021-03-04 16:37:56 +00:00
  • 09695fc6d3 Update exceptions to match updated API Al Sutton 2021-03-04 16:34:02 +00:00
  • 69014a701e Update API Usage Al Sutton 2021-03-04 16:14:56 +00:00
  • 9ba0fa1beb Update dependencies to address vulnerabilities. Al Sutton 2021-03-04 13:41:43 +00:00
  • 78bd82f8ef Workaround for CVE-2020-15250 Sebastiano Pistore 2021-02-22 20:53:24 +01:00
  • b46513f4a1 added stub of rc3assembly style a little bit late but whatever Michael Peter Christen 2021-02-09 20:30:10 +01:00
  • 3da7628117 use environment variables to overwrite configuration variables you can i.e. do: export YACY_PORT=8092 && ./startYACY.sh Just append "YACY_" to uppercase version of environment variables and replace all "." with "_". Michael Peter Christen 2021-02-09 20:26:49 +01:00
  • 13a2e6dc6e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Michael Peter Christen 2021-01-25 11:49:32 +01:00
  • 0ae8ccf657 Make it possible to set an empty password disabling the authentication protocol completely If you set now an empty password, then the http server will not ask to authentify. This is required for environment where we attach an outside authentification service like keycloak or similar using authentication in an ingress proxy. This change is part of the approach to run YaCy inside of a kubernetes cluster where we do not want individual authentication of peers and want to apply a ingress authentication. Michael Peter Christen 2021-01-25 11:49:21 +01:00
  • 96592a10cf added option to set yacy configuration values using environment variables To use that feature, set an environment variable with prefix "yacy." and suffix identical to the yacy configuration attribute name. Additionaly we implemented a way to set a peer name using the setting "network.unit.agent". This can therefore now be used to set a peer name with the java call parameter -Dyacy.network.unit.agent=anonymous The purpose for this feature is the ability to set peer names in mass-deployed kubernetes clusters to the same name to prevent that we are flooding peer name statistics with auto-deployment-generated names. Michael Peter Christen 2021-01-24 22:50:37 +01:00
  • 198826c362 added network scanner process to discover all YaCy peers in the intranet this will be used to wire YaCy peers in a kubernetes cluster Michael Peter Christen 2021-01-23 15:14:49 +01:00
  • d9602e8325 Implemented a new syntax in the template engine to simplify json APIs Added also an example for one of the existing APIs. The problem is the comma separator between objects which must not be there for the last entry in a sequence. The new syntax adds the separator symbol automatically. Michael Peter Christen 2021-01-18 00:01:08 +01:00
  • 5a7f12a9c1 allow network scans for non-standard http/https ports Michael Peter Christen 2021-01-11 00:28:24 +01:00
  • 022fb15670 fix for https://github.com/yacy/yacy_search_server/issues/385 Michael Peter Christen 2021-01-06 22:12:17 +01:00
  • 17672fcbb4 adding hint how to shrink the disk size after an index deletion. implements https://github.com/yacy/yacy_search_server/issues/360 Michael Peter Christen 2021-01-06 22:02:00 +01:00
  • b8d264f7ec fixes logging sgaebel 2021-01-04 20:46:21 +01:00
  • 13e42c2dd2 aded dockerfiles for 32 and 64 bit ARM/Raspberry Pi Michael Peter Christen 2020-12-31 00:02:23 +01:00
  • 062111a003 improved dockerfiles They do not use git pull to get the latest YaCy code. Instead they copy from local file system. Michael Peter Christen 2020-12-29 21:01:35 +01:00
  • 4c920d05b5 removed superfluous lines Michael Peter Christen 2020-12-29 20:19:58 +01:00
  • 48dd87e1e1 added a dockerignore file Michael Peter Christen 2020-12-29 20:19:45 +01:00
  • ca10f0afca fixed optional default PW Michael Peter Christen 2020-12-29 20:19:07 +01:00
  • 907f121d0c do not overwrite PW with random PW Michael Peter Christen 2020-12-29 20:18:25 +01:00