1
0
mirror of https://github.com/yacy/yacy_search_server.git synced 2025-05-02 20:19:34 -04:00

8988 Commits

Author SHA1 Message Date
Michael Peter Christen
8176f75285 removed warnings 2025-04-10 12:48:04 +02:00
Henschi
3c50bc701f fix build command line (commit: ab3ef87abf1928a3174f7d46af4f05ee712bbdaa), test Html2ImageTest is green now 2025-04-04 15:55:38 +02:00
Henschi
83c842681d remove excecutable flag 2025-04-04 15:51:38 +02:00
Henschi
332a42be72 prevent NPE because summaryInfo.getTitle can returned null 2025-04-03 20:49:54 +02:00
Henschi
7a9ffebaab fix cast brackets to suppress low memory warning 2025-04-03 20:13:10 +02:00
okybaca
39218b74b1 modified kelondro so it uses KELONDRO prefix when logging 2024-12-24 00:54:23 +01:00
Michael Christen
0d58f60f45
Merge pull request from okybaca/log-kelo
renamed INDEX-TRANSFER-DISPATCHER to DHT-OUT in the log
2024-11-25 12:40:34 +01:00
Michael Peter Christen
54fa724352 reverted deprecation change since we are still using java 11, not 19 2024-11-25 12:34:05 +01:00
Michael Peter Christen
6ef3a0fca5 code maintenance - removed warnings and replaced deprecated functions 2024-11-25 12:29:11 +01:00
okybaca
91c753ab96 renamed INDEX-TRANSFER-DISPATCHER to DHT-OUT in the log 2024-11-25 01:23:57 +01:00
Michael Peter Christen
feca150672 Automatically adjust crawling load limit to the local machine cpu cores
The settings in the default configuration file is historic. Many
machines have much more CPU cores today and now an auto-scaling to this
hardware is better.
2024-11-25 00:30:36 +01:00
Michael Peter Christen
7e8a1ef0e2 log file when OOM appears 2024-10-18 14:25:42 +02:00
Michael Peter Christen
a8c64b1af2 added an artificial snippet [Synonym Match] in case that there is only a match in the synonyms] 2024-09-21 21:56:12 +02:00
Michael Peter Christen
c88c30a5c5 added an option to ViewFile to see all solr fields which contain texts 2024-09-21 21:51:19 +02:00
Michael Peter Christen
3944984840 added snippet extraction with synonym matching 2024-08-26 23:44:42 +02:00
Michael Peter Christen
910a496c9f replaced http links with https 2024-07-21 18:02:58 +02:00
Michael Peter Christen
833d720989 upgraded ppt parser by migration of org.apache,poi from 3.17 to 5.3.0
This also fixes the security waning
https://github.com/yacy/yacy_search_server/security/dependabot/37
2024-07-21 15:28:13 +02:00
Michael Peter Christen
687820788d this assert does not work because of the 9_0_0 solr version format.
An 9_0 is expected but it does not work this way with this version.
2024-07-21 13:33:47 +02:00
Michael Christen
accf4e424b
Merge pull request from virginOne/master
Fix the issue of not being able to import the JSON format
2024-07-10 16:36:54 +02:00
zutto
5268ae2ce9 check the document protocol & host values before proceeding to form final url. 2024-06-29 10:11:58 +03:00
zutto
d958d1c0c4 ensure that returned SolrDocument is not null 2024-06-29 09:33:06 +03:00
Virgo
89c07f0900 Fix the issue of not being able to import the JSON format export of Solr index due to the inconsistency in time format between the exported JSON format and the Solr time format. 2024-06-21 10:03:08 +08:00
Michael Peter Christen
70454654f3 by default open the https url for a given host, not the http url
(http does almost not exist any more)
2024-05-27 00:53:18 +02:00
Michael Peter Christen
71a6074cc5 added setting of cache configuration for solr
according to recommendation from
https://community.searchlab.eu/t/yacy-support-gpt-chatgpt-assistant/1622
However it is not clear if this configuration actually works (has an
effect at all) or is the solution for performance issues.
2024-05-26 12:59:59 +02:00
Michael Peter Christen
b8479430b6 400 is too small 2024-05-25 01:21:06 +02:00
Michael Peter Christen
5f4ea9ac5d reduced memory amount for network image
reduced also the number of memory allocation for image storage
2024-05-25 01:10:23 +02:00
Michael Peter Christen
fe4c0aa890 refactoring of RAG reverse proxy: extracted code for ollama code to
their own classes
2024-05-21 00:06:19 +02:00
Michael Peter Christen
f1c70dce33 Merge branch 'master' of github.com:yacy/yacy_search_server 2024-05-19 17:35:24 +02:00
Michael Peter Christen
8eb0d490aa migrated solr to 9.0
This is a major step because solr removed support for embedded solr
instances in 9.0 and we want to keep it because we want to ship
YaCy with an embedded solr. It was necessary to add parts of solr
code into YaCy to make this migration possible. Further on with
Solr 9.1 they removed even more parts which are required for embedded
operation, therefore we cannot migrate yet further without big
changes.
If you are running a YaCy instance with Solr 8.x, the migration should
be done automatically. If not you require to first migrate to a YaCy
version 1.93 with Solr 8.x to migrate to Solr 8 data.
2024-05-19 17:34:57 +02:00
Michael Peter Christen
b8417e5619 removed Mac specific code which is not working any more on recent Macs 2024-05-19 17:29:16 +02:00
Michael Peter Christen
13fbff0bff Added a RAG Proxy for AI Chat with YaCy
RAG (Retrieval Augmented Generation) is a method to combine a search
engine with a LLM (Large Language Model). When a new prompt is
submitted, a search engine injects knowledge from a search into the
content. This is done using a reverse proxy between the Chat Client and
the LLM. In this case, we used the following software:

LLM Backend - Ollama:
https://github.com/ollama/ollama
Install ollama and then load two required LLM models
with the following commands:
ollama pull phi3:3.8b
ollama pull llama3:8b

Chat Client - susi_chat:
https://github.com/susiai/susi_chat
just clone the repository and the open the file
susi_chat/chat_terminal/index.html
in your browser. This displays a chat terminal.
In this terminal, run the following command:
host http://localhost:8090
This sets the LLM backend to your YaCy peer.

Then start YaCy. It will provide the LLM endpoint to the client
while using ollama in the backend. It then injects search results
only from the local Solr index, not from the p2p network (so far).
2024-05-19 17:19:09 +02:00
Michael Peter Christen
b295e38969 fine-tuned the import process of jsonl files which had been missing
to actually be able to make searches and browse the index with the host
browser
2024-05-10 12:13:44 +02:00
Michael Christen
d097a642c2
Merge pull request from okybaca/logging2
Logging unclutter
2023-12-03 16:40:21 +01:00
Michael Christen
6d5e9ff53f
Merge pull request from okybaca/logging3
changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine
2023-12-03 16:39:29 +01:00
pr0vieh
dfb2b79609 Add setting for DHT receive loadprereq insted of hardcoded load < 2.0 2023-12-03 01:27:36 +01:00
okybaca
5dee8dbcbd changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine 2023-12-02 12:24:36 +01:00
Michael Christen
4c603e23f0
Merge pull request from okybaca/cr-text
UI: added a more descriptive message, CitationRank instead of cr
2023-11-27 12:17:05 +01:00
okybaca
7831f294a9 changed regular peerping messages to level fine 2023-11-27 08:12:03 +01:00
okybaca
553c859703 logging: moved some log-cluttering DHT messages to level 'fine' 2023-11-27 07:51:42 +01:00
okybaca
1c5fca9a58 changed network operation log category from YACY to NETWORK 2023-11-26 12:24:09 +01:00
Michael Peter Christen
3d3bdb0f5f added zim importer rule for mdwiki 2023-11-16 23:11:57 +01:00
Michael Peter Christen
4a611ac6a3 another possible fix for
https://github.com/yacy/yacy_search_server/issues/500
2023-11-15 23:45:53 +01:00
sgaebel
d72cd7916c Merge branch 'master' of https://github.com/yacy/yacy_search_server 2023-11-14 20:43:42 +01:00
sgaebel
0663ae3c99 adds synchornized dumplog 2023-11-14 20:42:00 +01:00
okybaca
cba84632ee UI: added a more descriptive message, CitationRank instead of cr 2023-11-14 00:05:23 +01:00
Michael Peter Christen
cff0991d85 test if this is helpful for https://github.com/yacy/yacy_search_server/issues/500 2023-11-13 16:41:19 +01:00
Michael Peter Christen
ceb07a5218 fixed problem with zim importer which crashed when non-valid urls appeared 2023-11-13 11:12:10 +01:00
Michael Peter Christen
3268a93019 added a 'minified' option to YaCy dumps 2023-11-13 10:27:50 +01:00
Michael Peter Christen
c20c4b8a21 modified export: added maximum number of docs per chunk
The export file can now be many files, called chunks.
By default still only one chunk is exported.
This function is required in case that the exported files shall be
imported to an elasticsearch/opensearch index. The bulk import function
of elasticsearch/opensearch is limited to 100MB. To make it possible to
import YaCy files, those must be splitted into chunks. Right now we
cannot estimate the chunk size as bytes, only as number of documents.
The user must do experiments to find out the optimum chunk max size,
like 50000 docs per chunk. Try this as first attempt.
2023-11-12 22:11:55 +01:00
Michael Peter Christen
24011dcbcc more file name extensions for json list surrogate files 2023-11-06 22:44:18 +01:00