133 Commits

Author SHA1 Message Date
5cf404e413 XmlDoc::hashNeighborhoods(): goto-loop -> for-loop 2018-01-05 14:12:47 +01:00
bd5fe9397c Remove useTimeAxis feature 2017-12-11 12:25:27 +01:00
5e667a758f index prefix sitenoindex instead of site when we're not suppose to index the page 2017-12-05 17:46:15 +01:00
9be350d909 Revert "Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid)"
This reverts commit 4465c993e0.
2017-11-30 14:48:26 +01:00
4465c993e0 Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid) 2017-11-30 14:01:53 +01:00
45aad17d8e indexing: handle URL componets better
Changed hardcoded filtering or "com" "uk" "ru" etc to use the TLD list from Doaminss.cpp
Also doesn't filter those terms out in the path-part of the URL
2017-11-10 15:57:23 +01:00
175098a73a Store empty titlerec & index middomain url for root url disallowed by robots.txt 2017-10-23 16:40:41 +02:00
84b9fe27cc Remove gbispermalink 2017-10-06 17:46:14 +02:00
bc9f783b47 Don't index mid domain for redirect/canonical error documents 2017-10-06 17:04:57 +02:00
bc84788c11 Remove gbisadult 2017-10-06 17:04:57 +02:00
32e6072855 Removed superflous intermediate local variables 2017-08-29 16:33:28 +02:00
8501e6a8bc Moved is-url-blocked check to separate file 2017-08-24 15:12:04 +02:00
a0b00870bc Renamed UrlBlock* to UrlMatch* 2017-08-24 14:46:02 +02:00
fd8690d395 Remove some unused fields 2017-07-05 11:38:12 +02:00
3efde8a60d Merge branch 'master' into dev-dns 2017-07-05 11:27:56 +02:00
9f2f00eec6 Remove indexing of status doc, remove commented out code, simplify some statements, remove unused variables 2017-07-05 11:10:25 +02:00
335f479705 Don't set S(ynonym) bit in bigram posdb entries.
The S bit was set for bigrams so the scoring later on could perform som non-obvious logic on it (see commit 9f072606d3). But that logic is gone and there is no need to overload that bit in the entriy two two meanings.
2017-06-29 15:21:23 +02:00
86f5cbd2b9 Avoid testing on undfined values
'm_contentLen' is apparently not set if XmlDoc is loaded from Titledb
2017-06-27 14:29:04 +02:00
3404178dad Fix issue with page term display that is not displaying some terms that are hashed 2017-05-31 11:41:19 +02:00
7db9c4354e Removed non-renetrant version of iptoa()
Mass-change. Many places it could have been done in a better way (eg. calculate nice name for UdpSlot peer once and not for every log line).
2017-05-10 17:54:00 +02:00
f55ba149d4 Revert "Let's hash content type for xml/json document as well"
This reverts commit 81f3fb8234.
2017-05-01 15:06:34 +02:00
81f3fb8234 Let's hash content type for xml/json document as well 2017-05-01 14:36:35 +02:00
9b930818a0 Use size_utf8Content instead of m_contentLen 2017-05-01 14:21:09 +02:00
836da2ae32 Don't index body/inlink text when we have no content 2017-05-01 14:21:09 +02:00
4c860a94a4 Don't hash gbcontenthash when contentLen is 0 2017-05-01 14:21:09 +02:00
daf99a093e Remove temp variable 2017-05-01 14:21:09 +02:00
fc2ca29591 Don't index mid domain url for document with no content 2017-05-01 14:21:09 +02:00
cf79f0c670 Remove now unused code (we have an early return for json/xml content type) 2017-05-01 14:21:09 +02:00
e9003b1540 Remove commented out code 2017-05-01 14:21:09 +02:00
71b32ddf3c Code style changes 2017-05-01 14:21:09 +02:00
d257d74cf0 Remove unused input variables 2017-05-01 14:21:09 +02:00
9a54b7de4d Remove commented out code 2017-05-01 14:21:09 +02:00
26d3341d03 Detect inlinks with siteranks>15 (corrupt data in titledb?) 2017-05-01 11:54:30 +02:00
f9c1a65c6b Access using functions instead of direct variable access 2017-04-07 15:55:02 +02:00
7f9b04e1dd Remove commented out code 2017-03-17 12:56:03 +01:00
b54cd86191 Remove privacore specific tld checks 2017-03-06 16:13:00 +01:00
aebc2e8ff8 Cater for multiple criteria types for UrlBlockList to speed things up 2017-03-06 16:12:43 +01:00
9c97fbd0cc Revert "new UrlBlockList temporarily disabled".
This reverts commit 4e97f2939d.
2017-02-06 15:54:04 +01:00
489f401966 Remvoed this!=NULL check in LinkInfo::getNextInlink()
Compilers optimize away that check nowadays.
Also added explicit NULL checks in calls site where 'this' might be NULL.
2017-02-05 22:52:07 +01:00
cdb7f08134 Removed unused parameters to XmlDoc::hashJSONFields2)= 2017-01-29 01:20:12 +01:00
23263f32cc Removed unused parameters to XmlDoc::hashString3() 2017-01-29 01:14:33 +01:00
4e97f2939d new UrlBlockList temporarily disabled 2017-01-21 23:12:42 +01:00
8fea89b797 Add url block list 2017-01-18 17:17:33 +01:00
026f1c1556 Use StackBuf<> instead of direct char[]+SafeBuf 2017-01-06 12:19:21 +01:00
96ddc6243f Removed commented-out code 2017-01-05 14:12:40 +01:00
24a2049bf0 fix buffer overrun in tld indexing for very long tlds 2016-12-16 11:07:02 +01:00
6c6622a61f We don't have any date json field anymore 2016-12-08 17:18:46 +01:00
a10d38ae57 Remove hopcount from Linkdb::makeKey_uk 2016-11-18 15:47:37 +01:00
aa90b4b47d Code style changes 2016-11-15 15:13:21 +01:00
c15af1adfe Remove commented out code 2016-11-15 15:13:21 +01:00