4a6e93fc78
Add configurable xpath for summary
2016-09-01 13:43:07 +02:00
8f8bd0dadf
Get multi layer tagrec (site -> host -> domain).
...
Eg: www.youtube.com/users/abc -> www.youtube.com -> youtube.com
2016-09-01 13:43:07 +02:00
e30dbc47f7
Handle unknown sizes by passing in a boolean instead of using -1
2016-08-31 11:13:31 +02:00
55f15747dc
Use rdbid_t in Safebuf
2016-08-25 00:24:07 +02:00
dc8dcba664
Removed always-false if()
2016-08-23 10:32:28 +02:00
86d988c866
Removed unused static
2016-08-19 16:10:35 +02:00
13840a06f7
Add constness to some tld/domain/url functions
2016-08-11 12:58:51 +02:00
5c73e9e611
More thread safety: gmtime() -> gmtime_r(), asctime() -> asctime_r()
2016-08-09 16:44:39 +02:00
ba7bdb619f
Code style changes
2016-08-08 14:23:37 +02:00
c7b18667ae
Add comment to refer to removed logic
2016-08-08 14:23:37 +02:00
82789ae96c
Remove flushMsg4Buffers as it was never called
2016-08-08 14:23:37 +02:00
abfba8cad7
Remove support for param "raw"
2016-08-03 17:22:37 +02:00
4f86e31625
removed unused config option. Added temporary (?) config option to enable no-merge changes - first change causes the full document to be indexed every time (no posdb delete keys used)
2016-08-03 16:21:33 +02:00
9d35b9a151
Code style changes
2016-08-03 12:28:45 +02:00
4e4e3371e6
Log function will now return void instead of a boolean
2016-08-01 18:12:10 +02:00
af4e70a051
Revert "Copied over changes from broken 'nomerge' branch"
...
This reverts commit 9e48989e93
.
2016-08-01 14:52:50 +02:00
9e48989e93
Copied over changes from broken 'nomerge' branch
2016-08-01 14:52:09 +02:00
da81afbf85
Fix comments
2016-07-29 13:53:56 +02:00
94675ea077
Rename THIS to that. set m_errno if g_errno is set but m_errno is not for XmlDoc
2016-07-29 13:53:56 +02:00
3c2773a928
Removed gbstrlen()
...
gbstrlen() just checked for NULL and called gbshutdownAbort(). Dereferncing NULL on mordern platforms cases a SIGSEGV which is cought by our signal handler and .... gbshutdownAbort() is called. So gbstrlen() was superfluous and complicated static analysis.
2016-07-28 17:04:35 +02:00
fa328000f3
Code style changes
2016-07-28 12:35:59 +02:00
d4c4d55510
Add trace logs for XmlDoc::getUrlFilterNum
2016-07-26 15:19:01 +02:00
89a247c110
Modify log line to be more similar so we can grep all url that are unwanted for indexing
2016-07-26 15:19:01 +02:00
37d203f540
Add privacore tld blacklist
2016-07-26 15:19:01 +02:00
fd6e8cbb21
Remove more qa specific code
2016-07-26 15:19:01 +02:00
46892c9588
removed qa.cpp and immediate callers
2016-07-26 11:32:11 +02:00
35d4becbbc
Remove g_stats.addSpiderPoint and related spider statistics
2016-07-25 17:03:51 +02:00
570838d7f6
Remove m_isCustomCrawl & logic surrounding it
2016-07-22 13:59:55 +02:00
026b286c98
Remove commented out code
2016-07-22 13:59:55 +02:00
8e4b84aa96
Code style changes. Change log level to WARN for errors
2016-07-20 18:16:22 +02:00
ce7511a1a9
Removed cygwin support
2016-07-14 14:06:09 +02:00
d6d7d34510
Add trace logs to XmlDoc::getLangId
2016-07-14 10:34:00 +02:00
3bc0e3a5ba
temporary fix for conversion of html entities to utf8, where the size of the utf8 string is larger than the length of the html entity name. would cause crashes in htmlDecode for e.g. ≪⃒
2016-07-13 17:16:10 +02:00
0836f9be7f
Code style changes
2016-07-12 16:52:31 +02:00
4c9048236f
Add gb spider statistics
2016-07-04 11:31:02 +02:00
ab1e3836e7
Removed silly pointer underflow check
...
If it underflowed then it means the document text is in the first 500 bytes from
address 0. which never happens on modern machines wherethe first page (4KB) is
normally no-access.
2016-07-01 13:54:09 +02:00
b87804c05b
Removed unnecessary null check
2016-07-01 12:35:11 +02:00
55e93de571
Fix null pointer dereference
2016-07-01 12:33:30 +02:00
8fe2aebf05
Remove commented out code
2016-06-30 11:06:13 +02:00
40412c5d45
Use EMFILE instead of 24 & E2BIG instead of 7
2016-06-29 17:16:53 +02:00
45f88eb8e5
Remove EHITPROCESSLIMIT. g_errno is never set to EHITPROCESSLIMIT
2016-06-29 16:56:53 +02:00
afdd437956
Remove EHITCRAWLLIMIT. g_errno is never set to EHITCRAWLLIMIT
2016-06-29 16:54:18 +02:00
76dc08637f
Code style changes
2016-06-29 12:05:02 +02:00
7f86968533
Fix valgrind uninitialized data warning
2016-06-28 15:49:23 +02:00
14591acae7
Moved up NULL-check to before first use in XmlDoc::set2
2016-06-28 14:12:53 +02:00
bd9ff20d78
Fix unreachable-code warnings
2016-06-28 11:48:30 +02:00
992631a482
Remove unused rootlang
2016-06-24 17:18:14 +02:00
ee0c1d029c
Remove commented out code
2016-06-24 14:39:32 +02:00
81e3af7cc3
Fix compilation warnings
2016-06-22 15:22:40 +02:00
56a81b765a
Fix misleading-indentation warning
2016-06-22 10:40:55 +00:00