Commit Graph

  • 331e0a24fc Merge pull request #621 from OFA54/patch-1 Michael Christen 2023-12-20 23:34:35 +01:00
  • d825a85a01 Merge pull request #619 from pr0vieh/initrecrawl Michael Christen 2023-12-09 14:47:13 +01:00
  • 35620762ac bring defaults for recrawlindex to init config pr0vieh 2023-12-09 01:32:31 +01:00
  • d097a642c2 Merge pull request #615 from okybaca/logging2 Michael Christen 2023-12-03 16:40:21 +01:00
  • 6d5e9ff53f Merge pull request #616 from okybaca/logging3 Michael Christen 2023-12-03 16:39:29 +01:00
  • d5d4e8fe3a Merge pull request #617 from pr0vieh/master Michael Christen 2023-12-03 16:38:46 +01:00
  • dfb2b79609 Add setting for DHT receive loadprereq insted of hardcoded load < 2.0 pr0vieh 2023-12-03 01:27:36 +01:00
  • 5dee8dbcbd changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine okybaca 2023-12-02 12:24:36 +01:00
  • 4c603e23f0 Merge pull request #610 from okybaca/cr-text Michael Christen 2023-11-27 12:17:05 +01:00
  • 040cd8be6d Merge pull request #612 from okybaca/sitemap-fix Michael Christen 2023-11-27 12:16:43 +01:00
  • 0233ecd481 Merge pull request #614 from okybaca/logging Michael Christen 2023-11-27 12:15:34 +01:00
  • 7831f294a9 changed regular peerping messages to level fine okybaca 2023-11-27 08:12:03 +01:00
  • 553c859703 logging: moved some log-cluttering DHT messages to level 'fine' okybaca 2023-11-27 07:51:42 +01:00
  • 1c5fca9a58 changed network operation log category from YACY to NETWORK okybaca 2023-11-26 12:24:09 +01:00
  • 2f44fc0257 added some logging prefixes to yacy.logging okybaca 2023-11-25 18:39:08 +01:00
  • 89c2a92cfb tr.lng OFA 2023-11-18 01:03:28 +03:00
  • 3d3bdb0f5f added zim importer rule for mdwiki Michael Peter Christen 2023-11-16 23:11:57 +01:00
  • 4a611ac6a3 another possible fix for https://github.com/yacy/yacy_search_server/issues/500 Michael Peter Christen 2023-11-15 23:45:53 +01:00
  • 9c59c6814b updated apache libs okybaca 2023-11-15 10:22:00 +01:00
  • d72cd7916c Merge branch 'master' of https://github.com/yacy/yacy_search_server sgaebel 2023-11-14 20:43:42 +01:00
  • 0663ae3c99 adds synchornized dumplog sgaebel 2020-12-01 22:34:30 +01:00
  • cba84632ee UI: added a more descriptive message, CitationRank instead of cr okybaca 2023-11-14 00:05:23 +01:00
  • cff0991d85 test if this is helpful for https://github.com/yacy/yacy_search_server/issues/500 Michael Peter Christen 2023-11-13 16:41:19 +01:00
  • ceb07a5218 fixed problem with zim importer which crashed when non-valid urls appeared Michael Peter Christen 2023-11-13 11:12:10 +01:00
  • 656b3e3e77 updated guava to latest and added missing library for failureaccess Michael Peter Christen 2023-11-13 10:59:49 +01:00
  • 3268a93019 added a 'minified' option to YaCy dumps Michael Peter Christen 2023-11-13 10:27:50 +01:00
  • c20c4b8a21 modified export: added maximum number of docs per chunk The export file can now be many files, called chunks. By default still only one chunk is exported. This function is required in case that the exported files shall be imported to an elasticsearch/opensearch index. The bulk import function of elasticsearch/opensearch is limited to 100MB. To make it possible to import YaCy files, those must be splitted into chunks. Right now we cannot estimate the chunk size as bytes, only as number of documents. The user must do experiments to find out the optimum chunk max size, like 50000 docs per chunk. Try this as first attempt. Michael Peter Christen 2023-11-12 22:11:55 +01:00
  • 655d8db802 detailed directions in index export to explain how the export can be imported again using elasticsearch/opensearch Michael Peter Christen 2023-11-12 15:26:18 +01:00
  • 24011dcbcc more file name extensions for json list surrogate files Michael Peter Christen 2023-11-06 22:44:18 +01:00
  • 34a9fc1a07 bugfixes to zim reader: Michael Peter Christen 2023-11-05 12:46:37 +01:00
  • 7db0534d8a Added a zim parser to the surrogate import option. You can now import zim files into YaCy by simply moving them to the DATA/SURROGATE/IN folder. They will be fetched and after parsing moved to DATA/SURROGATE/OUT. There are exceptions where the parser is not able to identify the original URL of the documents in the zim file. In that case the file is simply ignored. This commit also carries an important fix to the pdf parser and an increase of the maximum parsing speed to 60000 PPM which should make it possible to index up to 1000 files in one second. Michael Peter Christen 2023-11-05 02:16:40 +01:00
  • 70e29937ef added a check in zim importer which tests if import URLs actually exist Michael Peter Christen 2023-11-04 19:07:50 +01:00
  • 496f768c44 modified cache strategy for zim clusters Michael Peter Christen 2023-11-03 18:20:10 +01:00
  • fdc6311dc7 added parsing rules for wikibooks and wikinews in zim reader Michael Peter Christen 2023-11-02 00:27:24 +01:00
  • 2ea54b3503 fixed blob iterator in zim cluster definition Michael Peter Christen 2023-11-01 23:43:27 +01:00
  • 54fa5d3c2e added a cluster cache but it requires more testing Michael Peter Christen 2023-11-01 19:52:44 +01:00
  • 53b01dbf2e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Michael Peter Christen 2023-11-01 18:57:04 +01:00
  • 41856e9f34 added an optimized zim file entry iterator Michael Peter Christen 2023-11-01 18:50:28 +01:00
  • 1c0df28bfb added a zim importer that can be used for surrogate imports. Can not be used yet because it requires some security additions to verify that the given urls actually work. Michael Peter Christen 2023-11-01 18:48:40 +01:00
  • b9912ff50d repaired dockerfiles for aarch64 and armv7 Michael Peter Christen 2023-10-29 22:09:24 +00:00
  • 33b6878ded Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Michael Peter Christen 2023-10-29 14:58:47 +01:00
  • 68554cea07 Merge pull request #605 from okybaca/readme-docker-link Michael Christen 2023-10-29 14:56:26 +01:00
  • 06bfd5802f Merge pull request #603 from okybaca/dark-green-css Michael Christen 2023-10-29 14:55:58 +01:00
  • 43d5cd101e Merge pull request #607 from okybaca/wikilinks Michael Christen 2023-10-29 14:55:26 +01:00
  • 4add1f6bc7 replaced all the links to legacy legacy wiki to legacy wiki okybaca 2023-10-29 13:12:24 +01:00
  • e2c86a8eba added a ZIM cluster pointer cache Michael Peter Christen 2023-10-29 12:49:08 +01:00
  • 4a54b24703 fix for "negative seek offset" error during extension of heap files. This would have always happend when a heap file exceeds 2GB. should fix https://github.com/yacy/yacy_search_server/issues/372 Michael Peter Christen 2023-10-29 09:32:21 +01:00
  • 69db75ce45 added a link to docker build guide okybaca 2023-10-29 02:35:57 +01:00
  • 9c8fb97985 introduced url list and title list caching and enhanced input stream performance in ZIM reader Michael Peter Christen 2023-10-29 00:43:12 +02:00
  • b0ae660790 added Zstandard compressed data decompression for ZIM files type 5 also: more generalization and performance enhancements Michael Peter Christen 2023-10-28 12:24:29 +02:00
  • ad8ee3a0b6 fixed typo in class name Michael Peter Christen 2023-10-28 08:57:42 +02:00
  • c4082c4ff2 refactoring of ZIM reader, simplification, removed unnecessary code Michael Peter Christen 2023-10-28 08:56:58 +02:00
  • c2b6b6e7b9 Fixed a large number of problems in the ZIM reader. This library was not prepared for large data because it was missing long data types for pointers. I had to modify the code-base in a fundamental way: - Proof-Reading, - unclustering, - refactoring, - naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format, - change of Exception handling, - extension to more attributes as defined in spec (bugfix for mime type loading) - bugfix to long parsing (prevented reading of large files) The code is furthermore very inefficient and requires more attention. However the format is very useful for YaCy as there are numerous data sources for ZIM-Files. Michael Peter Christen 2023-10-27 15:49:23 +02:00
  • 5ba5fb5d23 upgraded pdfbox to 3.0.0 Michael Peter Christen 2023-10-27 12:05:24 +02:00
  • c10944bd4a updated bcmail-jdk15on 1.75 to bcmail-jdk18on 1.67 Michael Peter Christen 2023-10-27 11:08:19 +02:00
  • 1fefae9baf integrated the source code of a openzim file format reader. These are the raw format reader files with no integration in YaCy yet, which will maybe follow as a next step. The zim file format is documented in https://openzim.org and the reader code was taken from the archived, non-maintained repository at https://github.com/openzim/zimreader-java Michael Peter Christen 2023-10-27 10:59:06 +02:00
  • ec2d14e973 fine tuning the dark-green color scheme okybaca 2023-10-26 12:35:22 +02:00
  • 4308aa5415 removed concept of empty passwords as "no passwords used", because we now start YaCy with a default password (yacy). This has impact of all function that check the current state of password-protection that included the empty password situation, including the warnings to set a password in case that none is set (which cannot be the case any more). Michael Peter Christen 2023-10-25 22:56:06 +02:00
  • 2c60ff14bb fixed default pw comparison Michael Peter Christen 2023-10-25 13:59:02 +02:00
  • 4da320bebf added a warning message in ConfigBasic in case that the default password was not changed. Michael Peter Christen 2023-10-24 23:36:26 +02:00
  • 7830268be1 fix 756c817b5a must be applied to all code where a transaction token is generated. Michael Peter Christen 2023-10-21 13:00:49 +02:00
  • dc6f218520 set the default password for the admin account to "yacy" Michael Peter Christen 2023-10-21 12:09:19 +02:00
  • 756c817b5a fix for https://github.com/yacy/yacy_search_server/issues/544 Michael Peter Christen 2023-10-21 11:45:26 +02:00
  • bab1cfc7ea added required build tools installation Michael Christen 2023-10-20 16:09:47 +02:00
  • 03bf259601 fix for https://github.com/yacy/yacy_search_server/issues/363 We still need to set the load in the process because a demand for higher crawl speed may require to increase the maximum load limit. However, following the criticism in the bug, we do never reduce the load limit again. Michael Peter Christen 2023-10-16 18:26:47 +02:00
  • 5bc09af426 Merge pull request #600 from okybaca/scheduler-sort Michael Christen 2023-10-16 13:00:24 +02:00
  • 4c1eb34e85 modified link to Process Scheduler in left menu okybaca 2023-10-10 08:30:04 +02:00
  • aeb4c7a660 removed warnings during normal build Michael Peter Christen 2023-10-04 22:00:30 +02:00
  • 095a444aa7 removed wiki links and added more shields badges Michael Peter Christen 2023-09-30 18:16:38 +02:00
  • ca2a21008a added screenshots Michael Peter Christen 2023-09-30 13:07:18 +02:00
  • 961d3cc8af Merge pull request #597 from joestr/issue/574-fix-mac-script Michael Christen 2023-09-28 21:10:49 +02:00
  • a035b21f63 Merge pull request #598 from joestr/improvement/remove-travis-yml Michael Christen 2023-09-28 21:10:04 +02:00
  • b29c0ef133 remove .travis.yml since YaCy is not build on Travis CI anymore Joel Strasser 2023-09-27 21:29:22 +02:00
  • 09783ae89e apply patches from @HenryLoenwind Joel Strasser 2023-09-27 19:56:08 +02:00
  • 94db89a757 small remaining changes in readme Michael Peter Christen 2023-09-26 16:15:58 +02:00
  • 0c4478cd71 migrated jetty to 9.4.52.v20230823 Michael Peter Christen 2023-09-26 16:15:42 +02:00
  • 938724caa8 new development on-boarding process in eclipse with changes for ivy Michael Peter Christen 2023-09-26 16:07:59 +02:00
  • 8fc51f66c6 fixed a test class which prevented compilation on latest jvm mchristen 2023-09-26 15:39:34 +02:00
  • bda118af5d Merge pull request #594 from joestr/master Michael Christen 2023-09-26 09:39:15 +02:00
  • 53bafa1544 consistent formatting in string concatenation Joel Strasser 2023-09-25 23:31:55 +02:00
  • 22c4188001 additionally match release stub for YaCy version Joel Strasser 2023-09-25 22:41:04 +02:00
  • 4a5820eb03 7zip parser was removed previously. see also https://github.com/yacy/yacy_search_server/issues/491 Michael Peter Christen 2023-09-03 20:23:23 +02:00
  • ff8fe7b6a4 fix for ',' or '.' appearing within a word or number. This will not tokenize the query into parts around that character to make it possible to search for numbers or version numbers. Michael Peter Christen 2023-09-03 11:37:25 +02:00
  • 0689f4f0ae Check if the character is a minus sign and is followed by a letter or a digit. Treat it as part of the word/number. Michael Peter Christen 2023-09-03 10:22:03 +02:00
  • 5db97a8928 parser can now separate numbers from words also when they are not separated by space, i.e. 4.7Ohm Michael Peter Christen 2023-09-02 19:15:22 +02:00
  • 079eafe7f1 removed 7zip from eclipse classpath Michael Peter Christen 2023-09-02 11:44:33 +02:00
  • e3797de7de enhanced the word tokenizer to recognize numbers in a proper way Michael Peter Christen 2023-09-01 20:10:08 +02:00
  • 88cd17ea57 migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon. Michael Peter Christen 2023-09-01 18:24:52 +02:00
  • 0089f234f4 added npe protection Michael Peter Christen 2023-09-01 12:18:47 +02:00
  • 8285fe715a tab to spaces for classes supporting the condenser. This is a preparation step to make changes in condenser and parser more visible; no functional changes so far. Michael Peter Christen 2023-09-01 11:00:42 +02:00
  • ce4a2450da fixed workflow for ci process/2 Michael Peter Christen 2023-08-31 18:05:58 +02:00
  • a3ca4eac08 fixed workflow for ci process Michael Peter Christen 2023-08-31 18:03:04 +02:00
  • 6bd5f49c41 Migrated from java 8 to java 11. This step is required to upgrade certain packages, most important solr which will be migrated from 8.9 to 9.x Michael Peter Christen 2023-08-31 17:52:30 +02:00
  • 376bcfd54c Merge pull request #588 from okybaca/crawlurl Michael Christen 2023-08-28 22:29:04 +02:00
  • d353202489 Merge pull request #589 from okybaca/restartbuild Michael Christen 2023-08-28 22:22:26 +02:00
  • 1de37bc60b added restartYACY.sh so it's included in release package okybaca 2023-08-28 13:17:30 +02:00
  • 08b769f63a modified crawl list so the URL links to external URL okybaca 2023-08-28 13:01:45 +02:00
  • 195bd2e444 extended the maximum header size to 16k to prevent http error 431 Michael Peter Christen 2023-08-19 15:21:24 +02:00
  • 0554056c63 added .txt search result page (just replace '.html' with '.txt' in yacysearch.html page to get a url list) Michael Peter Christen 2023-08-19 14:57:31 +02:00
  • 117e2d0663 Merge pull request #580 from okybaca/restartyacy Michael Christen 2023-08-09 16:32:43 +02:00