26bc5ff50f
Add max size check to FxCache & method to remove item from FxCache
2018-01-05 17:01:41 +01:00
1c2df49a3e
Make mlockall() call configurable
2018-01-02 12:23:24 +01:00
6e583b59a4
Add linkdb lookup page
2017-12-28 14:10:06 +01:00
191e975655
Don't log ENOLINKTEXT_AREATAG as error in Multicast and don't try sending it to a twin.
...
Added LogDebug option for Msg20.
2017-12-16 16:00:24 +01:00
6ed890237f
Toned down error logs for Msg20 when they are really not errors. If inlink text cannot be fetched because the link is in an area tag, a new error code ENOLINKTEXT_AREATAG is used instead of EBADENGINEER. Added trace log for Msg25.
2017-12-16 11:54:13 +01:00
6ff1f91726
Moved rankign/scoring parameters into a single struct BaseScoringParameters
...
The ranking parameters/weights/flags/etc were spread over many variables and difficult to keep track of. Collected into a single struct for easier overview.
2017-12-11 14:44:58 +01:00
dd1e08f0cc
Use BaseScoringParameters struct more places
2017-12-08 13:54:15 +01:00
55308013c3
Add overridable TitleRec version
2017-12-01 12:28:31 +01:00
58e052d8a7
Initial implementation of DocProcess with DocDelete, DocRebuild, DocReindex
2017-11-27 11:03:24 +01:00
9a3f87b56e
Add ContentTypeBlockList to block by http content-type
2017-11-15 12:36:18 +01:00
e4c6595ab8
Add code to run through external filter when the html document have low word count (configurable)
2017-11-09 17:14:17 +01:00
97ba7bc72d
renamed new term checklist classes and added option to check for potential spam docs
2017-11-08 14:14:24 +01:00
1ec5c427b9
Support rebulding spiderdb from titledb documnet URLs only
...
Only rebuilds spiderdb with the first-URL of each titlerec. The links in documents are not added. Useful for cleaning out obsolete/useless/unwated links in spiderdb.
2017-10-31 15:48:49 +01:00
071907bf19
added option to disable spidering of adult content
2017-10-30 20:00:13 +01:00
045e98353b
bit of code cleanup in AdultCheck. Added trace log option
2017-10-26 15:13:50 +02:00
de5362ddbd
Add ResultOverride for exact url
2017-10-26 11:09:44 +02:00
6dd90e04c5
Add custom title/summary for results that are blocked by robots.txt
2017-10-24 10:41:19 +02:00
7de31a3066
Add g_conf.m_logTracePageSpiderdbLookup for enabling/disabling trace logs for PageSpiderdbLookup
2017-10-11 11:23:57 +02:00
958dfb0df1
Add SpiderdbHostDelete feature
2017-09-18 16:22:11 +02:00
baf96a3f7f
Add RobotsCheckList feature so that we don't spider pages we're not allowed to
2017-09-12 12:05:48 +02:00
548f964fbb
Enable block of IP based urls
2017-09-06 16:52:18 +02:00
8a72b10b33
Made bigram weight configurable
2017-09-01 13:57:46 +02:00
319a564ae3
Merge branch 'master' into dev-docdelete
2017-08-24 16:42:08 +02:00
a0b00870bc
Renamed UrlBlock* to UrlMatch*
2017-08-24 14:46:02 +02:00
9fcf540d3f
New logtrace for DocDelete
2017-08-21 18:29:31 +02:00
d9375bd968
Fix compilation error from previous commit
2017-08-17 11:04:10 +02:00
d04e94a538
Add spidered url cache
2017-08-17 11:01:28 +02:00
02e6879985
Nuke doledb periodically to hide the bugs/limitations in the spidering stuff
2017-08-07 15:26:29 +02:00
d37ab5b89c
Add caching to GbDns
2017-07-25 12:53:27 +02:00
0d16c5bb4e
Merge branch 'master' into dev-dns
2017-07-17 16:04:36 +02:00
267c620dbf
Added trace-log option to query-reindex
2017-07-17 15:24:41 +02:00
d2c1d24f49
Merge branch 'master' into dev-dns
2017-07-17 15:09:02 +02:00
cbd7ecea39
Separate log setting for reindex (was caught by setting for repair)
2017-07-17 14:20:40 +02:00
2e6b846386
Add first version of DnsBlockList based on UrlBlockList
2017-06-30 12:49:02 +02:00
da72115e55
Add more logs/config for RdbIndex & RdbMap addList timings from merge
2017-06-08 20:35:35 +02:00
c0dd3265d2
Abort if we have not seen ourself alive for the past x minutes (defaults to 5)
2017-06-02 12:20:50 +02:00
9f3de50ad2
Add debug vagus log (config). Change some vague log to info instead
2017-06-01 13:49:47 +02:00
cc0967510b
Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header)
2017-05-30 12:12:32 +02:00
bceaee7c17
changed Titledb log to trace log and made Titledb trace log configurable
2017-05-23 11:07:27 +02:00
3558e64edf
Move log debug variables around (more alphabetical?)
2017-05-22 14:53:55 +02:00
51fcc7696e
Separate multicast debug messages from net
2017-05-22 12:22:37 +02:00
2bbad454d9
Configurable max udp slots
2017-05-17 11:10:36 +02:00
78bbebfedb
Implemented trace log for Query.cpp
2017-05-15 14:42:07 +02:00
08ab074237
Removed unused Conf::m_logDebugThread
2017-05-15 13:09:11 +02:00
befb7447a6
Removed unused Conf::m_logDebugQuota
2017-05-15 13:07:33 +02:00
1d769203e7
Merge branch 'nomerge2'
2017-05-08 11:59:43 +02:00
18bb35eeb2
Removed configuration items for sending email (was handled by PingServer)
2017-05-07 14:24:42 +02:00
3415d008ba
Removed configuration associated with PingServer
2017-05-07 14:18:35 +02:00
d3ecbda478
Use Vagus for some host/instnace information
...
hosts.conf CRC and total_docs_indexed is now exchanged over Vagus. rest will follow in separate commits.
The vagus cluster identifier defaults to "gb-"$USER
2017-05-01 17:21:34 +02:00
94bb71587c
Merge remote-tracking branch 'origin/master' into nomerge2
2017-04-28 19:21:38 +02:00