278 Commits

Author SHA1 Message Date
26bc5ff50f Add max size check to FxCache & method to remove item from FxCache 2018-01-05 17:01:41 +01:00
1c2df49a3e Make mlockall() call configurable 2018-01-02 12:23:24 +01:00
6e583b59a4 Add linkdb lookup page 2017-12-28 14:10:06 +01:00
191e975655 Don't log ENOLINKTEXT_AREATAG as error in Multicast and don't try sending it to a twin.
Added LogDebug option for Msg20.
2017-12-16 16:00:24 +01:00
6ed890237f Toned down error logs for Msg20 when they are really not errors. If inlink text cannot be fetched because the link is in an area tag, a new error code ENOLINKTEXT_AREATAG is used instead of EBADENGINEER. Added trace log for Msg25. 2017-12-16 11:54:13 +01:00
6ff1f91726 Moved rankign/scoring parameters into a single struct BaseScoringParameters
The ranking parameters/weights/flags/etc were spread over many variables and difficult to keep track of. Collected into a single struct for easier overview.
2017-12-11 14:44:58 +01:00
dd1e08f0cc Use BaseScoringParameters struct more places 2017-12-08 13:54:15 +01:00
55308013c3 Add overridable TitleRec version 2017-12-01 12:28:31 +01:00
58e052d8a7 Initial implementation of DocProcess with DocDelete, DocRebuild, DocReindex 2017-11-27 11:03:24 +01:00
9a3f87b56e Add ContentTypeBlockList to block by http content-type 2017-11-15 12:36:18 +01:00
e4c6595ab8 Add code to run through external filter when the html document have low word count (configurable) 2017-11-09 17:14:17 +01:00
97ba7bc72d renamed new term checklist classes and added option to check for potential spam docs 2017-11-08 14:14:24 +01:00
1ec5c427b9 Support rebulding spiderdb from titledb documnet URLs only
Only rebuilds spiderdb with the first-URL of each titlerec. The links in documents are not added. Useful for cleaning out obsolete/useless/unwated links in spiderdb.
2017-10-31 15:48:49 +01:00
071907bf19 added option to disable spidering of adult content 2017-10-30 20:00:13 +01:00
045e98353b bit of code cleanup in AdultCheck. Added trace log option 2017-10-26 15:13:50 +02:00
de5362ddbd Add ResultOverride for exact url 2017-10-26 11:09:44 +02:00
6dd90e04c5 Add custom title/summary for results that are blocked by robots.txt 2017-10-24 10:41:19 +02:00
7de31a3066 Add g_conf.m_logTracePageSpiderdbLookup for enabling/disabling trace logs for PageSpiderdbLookup 2017-10-11 11:23:57 +02:00
958dfb0df1 Add SpiderdbHostDelete feature 2017-09-18 16:22:11 +02:00
baf96a3f7f Add RobotsCheckList feature so that we don't spider pages we're not allowed to 2017-09-12 12:05:48 +02:00
548f964fbb Enable block of IP based urls 2017-09-06 16:52:18 +02:00
8a72b10b33 Made bigram weight configurable 2017-09-01 13:57:46 +02:00
319a564ae3 Merge branch 'master' into dev-docdelete 2017-08-24 16:42:08 +02:00
a0b00870bc Renamed UrlBlock* to UrlMatch* 2017-08-24 14:46:02 +02:00
9fcf540d3f New logtrace for DocDelete 2017-08-21 18:29:31 +02:00
d9375bd968 Fix compilation error from previous commit 2017-08-17 11:04:10 +02:00
d04e94a538 Add spidered url cache 2017-08-17 11:01:28 +02:00
02e6879985 Nuke doledb periodically to hide the bugs/limitations in the spidering stuff 2017-08-07 15:26:29 +02:00
d37ab5b89c Add caching to GbDns 2017-07-25 12:53:27 +02:00
0d16c5bb4e Merge branch 'master' into dev-dns 2017-07-17 16:04:36 +02:00
267c620dbf Added trace-log option to query-reindex 2017-07-17 15:24:41 +02:00
d2c1d24f49 Merge branch 'master' into dev-dns 2017-07-17 15:09:02 +02:00
cbd7ecea39 Separate log setting for reindex (was caught by setting for repair) 2017-07-17 14:20:40 +02:00
2e6b846386 Add first version of DnsBlockList based on UrlBlockList 2017-06-30 12:49:02 +02:00
da72115e55 Add more logs/config for RdbIndex & RdbMap addList timings from merge 2017-06-08 20:35:35 +02:00
c0dd3265d2 Abort if we have not seen ourself alive for the past x minutes (defaults to 5) 2017-06-02 12:20:50 +02:00
9f3de50ad2 Add debug vagus log (config). Change some vague log to info instead 2017-06-01 13:49:47 +02:00
cc0967510b Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header) 2017-05-30 12:12:32 +02:00
bceaee7c17 changed Titledb log to trace log and made Titledb trace log configurable 2017-05-23 11:07:27 +02:00
3558e64edf Move log debug variables around (more alphabetical?) 2017-05-22 14:53:55 +02:00
51fcc7696e Separate multicast debug messages from net 2017-05-22 12:22:37 +02:00
2bbad454d9 Configurable max udp slots 2017-05-17 11:10:36 +02:00
78bbebfedb Implemented trace log for Query.cpp 2017-05-15 14:42:07 +02:00
08ab074237 Removed unused Conf::m_logDebugThread 2017-05-15 13:09:11 +02:00
befb7447a6 Removed unused Conf::m_logDebugQuota 2017-05-15 13:07:33 +02:00
1d769203e7 Merge branch 'nomerge2' 2017-05-08 11:59:43 +02:00
18bb35eeb2 Removed configuration items for sending email (was handled by PingServer) 2017-05-07 14:24:42 +02:00
3415d008ba Removed configuration associated with PingServer 2017-05-07 14:18:35 +02:00
d3ecbda478 Use Vagus for some host/instnace information
hosts.conf CRC and total_docs_indexed is now exchanged over Vagus. rest will follow in separate commits.
The vagus cluster identifier defaults to "gb-"$USER
2017-05-01 17:21:34 +02:00
94bb71587c Merge remote-tracking branch 'origin/master' into nomerge2 2017-04-28 19:21:38 +02:00