5cf404e413
XmlDoc::hashNeighborhoods(): goto-loop -> for-loop
2018-01-05 14:12:47 +01:00
bd5fe9397c
Remove useTimeAxis feature
2017-12-11 12:25:27 +01:00
5e667a758f
index prefix sitenoindex instead of site when we're not suppose to index the page
2017-12-05 17:46:15 +01:00
9be350d909
Revert "Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid)"
...
This reverts commit 4465c993e0
.
2017-11-30 14:48:26 +01:00
4465c993e0
Modify code so we don't index the body of xml & json document (we shoudl index other prefixed terms like gbdocid)
2017-11-30 14:01:53 +01:00
45aad17d8e
indexing: handle URL componets better
...
Changed hardcoded filtering or "com" "uk" "ru" etc to use the TLD list from Doaminss.cpp
Also doesn't filter those terms out in the path-part of the URL
2017-11-10 15:57:23 +01:00
175098a73a
Store empty titlerec & index middomain url for root url disallowed by robots.txt
2017-10-23 16:40:41 +02:00
84b9fe27cc
Remove gbispermalink
2017-10-06 17:46:14 +02:00
bc9f783b47
Don't index mid domain for redirect/canonical error documents
2017-10-06 17:04:57 +02:00
bc84788c11
Remove gbisadult
2017-10-06 17:04:57 +02:00
32e6072855
Removed superflous intermediate local variables
2017-08-29 16:33:28 +02:00
8501e6a8bc
Moved is-url-blocked check to separate file
2017-08-24 15:12:04 +02:00
a0b00870bc
Renamed UrlBlock* to UrlMatch*
2017-08-24 14:46:02 +02:00
fd8690d395
Remove some unused fields
2017-07-05 11:38:12 +02:00
3efde8a60d
Merge branch 'master' into dev-dns
2017-07-05 11:27:56 +02:00
9f2f00eec6
Remove indexing of status doc, remove commented out code, simplify some statements, remove unused variables
2017-07-05 11:10:25 +02:00
335f479705
Don't set S(ynonym) bit in bigram posdb entries.
...
The S bit was set for bigrams so the scoring later on could perform som non-obvious logic on it (see commit 9f072606d3
). But that logic is gone and there is no need to overload that bit in the entriy two two meanings.
2017-06-29 15:21:23 +02:00
86f5cbd2b9
Avoid testing on undfined values
...
'm_contentLen' is apparently not set if XmlDoc is loaded from Titledb
2017-06-27 14:29:04 +02:00
3404178dad
Fix issue with page term display that is not displaying some terms that are hashed
2017-05-31 11:41:19 +02:00
7db9c4354e
Removed non-renetrant version of iptoa()
...
Mass-change. Many places it could have been done in a better way (eg. calculate nice name for UdpSlot peer once and not for every log line).
2017-05-10 17:54:00 +02:00
f55ba149d4
Revert "Let's hash content type for xml/json document as well"
...
This reverts commit 81f3fb8234
.
2017-05-01 15:06:34 +02:00
81f3fb8234
Let's hash content type for xml/json document as well
2017-05-01 14:36:35 +02:00
9b930818a0
Use size_utf8Content instead of m_contentLen
2017-05-01 14:21:09 +02:00
836da2ae32
Don't index body/inlink text when we have no content
2017-05-01 14:21:09 +02:00
4c860a94a4
Don't hash gbcontenthash when contentLen is 0
2017-05-01 14:21:09 +02:00
daf99a093e
Remove temp variable
2017-05-01 14:21:09 +02:00
fc2ca29591
Don't index mid domain url for document with no content
2017-05-01 14:21:09 +02:00
cf79f0c670
Remove now unused code (we have an early return for json/xml content type)
2017-05-01 14:21:09 +02:00
e9003b1540
Remove commented out code
2017-05-01 14:21:09 +02:00
71b32ddf3c
Code style changes
2017-05-01 14:21:09 +02:00
d257d74cf0
Remove unused input variables
2017-05-01 14:21:09 +02:00
9a54b7de4d
Remove commented out code
2017-05-01 14:21:09 +02:00
26d3341d03
Detect inlinks with siteranks>15 (corrupt data in titledb?)
2017-05-01 11:54:30 +02:00
f9c1a65c6b
Access using functions instead of direct variable access
2017-04-07 15:55:02 +02:00
7f9b04e1dd
Remove commented out code
2017-03-17 12:56:03 +01:00
b54cd86191
Remove privacore specific tld checks
2017-03-06 16:13:00 +01:00
aebc2e8ff8
Cater for multiple criteria types for UrlBlockList to speed things up
2017-03-06 16:12:43 +01:00
9c97fbd0cc
Revert "new UrlBlockList temporarily disabled".
...
This reverts commit 4e97f2939d
.
2017-02-06 15:54:04 +01:00
489f401966
Remvoed this!=NULL check in LinkInfo::getNextInlink()
...
Compilers optimize away that check nowadays.
Also added explicit NULL checks in calls site where 'this' might be NULL.
2017-02-05 22:52:07 +01:00
cdb7f08134
Removed unused parameters to XmlDoc::hashJSONFields2)=
2017-01-29 01:20:12 +01:00
23263f32cc
Removed unused parameters to XmlDoc::hashString3()
2017-01-29 01:14:33 +01:00
4e97f2939d
new UrlBlockList temporarily disabled
2017-01-21 23:12:42 +01:00
8fea89b797
Add url block list
2017-01-18 17:17:33 +01:00
026f1c1556
Use StackBuf<> instead of direct char[]+SafeBuf
2017-01-06 12:19:21 +01:00
96ddc6243f
Removed commented-out code
2017-01-05 14:12:40 +01:00
24a2049bf0
fix buffer overrun in tld indexing for very long tlds
2016-12-16 11:07:02 +01:00
6c6622a61f
We don't have any date json field anymore
2016-12-08 17:18:46 +01:00
a10d38ae57
Remove hopcount from Linkdb::makeKey_uk
2016-11-18 15:47:37 +01:00
aa90b4b47d
Code style changes
2016-11-15 15:13:21 +01:00
c15af1adfe
Remove commented out code
2016-11-15 15:13:21 +01:00