Commit Graph

1199 Commits

Author SHA1 Message Date
4a6e93fc78 Add configurable xpath for summary 2016-09-01 13:43:07 +02:00
8f8bd0dadf Get multi layer tagrec (site -> host -> domain).
Eg: www.youtube.com/users/abc -> www.youtube.com -> youtube.com
2016-09-01 13:43:07 +02:00
e30dbc47f7 Handle unknown sizes by passing in a boolean instead of using -1 2016-08-31 11:13:31 +02:00
55f15747dc Use rdbid_t in Safebuf 2016-08-25 00:24:07 +02:00
dc8dcba664 Removed always-false if() 2016-08-23 10:32:28 +02:00
86d988c866 Removed unused static 2016-08-19 16:10:35 +02:00
13840a06f7 Add constness to some tld/domain/url functions 2016-08-11 12:58:51 +02:00
5c73e9e611 More thread safety: gmtime() -> gmtime_r(), asctime() -> asctime_r() 2016-08-09 16:44:39 +02:00
ba7bdb619f Code style changes 2016-08-08 14:23:37 +02:00
c7b18667ae Add comment to refer to removed logic 2016-08-08 14:23:37 +02:00
82789ae96c Remove flushMsg4Buffers as it was never called 2016-08-08 14:23:37 +02:00
abfba8cad7 Remove support for param "raw" 2016-08-03 17:22:37 +02:00
4f86e31625 removed unused config option. Added temporary (?) config option to enable no-merge changes - first change causes the full document to be indexed every time (no posdb delete keys used) 2016-08-03 16:21:33 +02:00
9d35b9a151 Code style changes 2016-08-03 12:28:45 +02:00
4e4e3371e6 Log function will now return void instead of a boolean 2016-08-01 18:12:10 +02:00
af4e70a051 Revert "Copied over changes from broken 'nomerge' branch"
This reverts commit 9e48989e93.
2016-08-01 14:52:50 +02:00
9e48989e93 Copied over changes from broken 'nomerge' branch 2016-08-01 14:52:09 +02:00
da81afbf85 Fix comments 2016-07-29 13:53:56 +02:00
94675ea077 Rename THIS to that. set m_errno if g_errno is set but m_errno is not for XmlDoc 2016-07-29 13:53:56 +02:00
3c2773a928 Removed gbstrlen()
gbstrlen() just checked for NULL and called gbshutdownAbort(). Dereferncing NULL on mordern platforms cases a SIGSEGV which is cought by our signal handler and .... gbshutdownAbort() is called. So gbstrlen() was superfluous and complicated static analysis.
2016-07-28 17:04:35 +02:00
fa328000f3 Code style changes 2016-07-28 12:35:59 +02:00
d4c4d55510 Add trace logs for XmlDoc::getUrlFilterNum 2016-07-26 15:19:01 +02:00
89a247c110 Modify log line to be more similar so we can grep all url that are unwanted for indexing 2016-07-26 15:19:01 +02:00
37d203f540 Add privacore tld blacklist 2016-07-26 15:19:01 +02:00
fd6e8cbb21 Remove more qa specific code 2016-07-26 15:19:01 +02:00
46892c9588 removed qa.cpp and immediate callers 2016-07-26 11:32:11 +02:00
35d4becbbc Remove g_stats.addSpiderPoint and related spider statistics 2016-07-25 17:03:51 +02:00
570838d7f6 Remove m_isCustomCrawl & logic surrounding it 2016-07-22 13:59:55 +02:00
026b286c98 Remove commented out code 2016-07-22 13:59:55 +02:00
8e4b84aa96 Code style changes. Change log level to WARN for errors 2016-07-20 18:16:22 +02:00
ce7511a1a9 Removed cygwin support 2016-07-14 14:06:09 +02:00
d6d7d34510 Add trace logs to XmlDoc::getLangId 2016-07-14 10:34:00 +02:00
3bc0e3a5ba temporary fix for conversion of html entities to utf8, where the size of the utf8 string is larger than the length of the html entity name. would cause crashes in htmlDecode for e.g. ≪⃒ 2016-07-13 17:16:10 +02:00
0836f9be7f Code style changes 2016-07-12 16:52:31 +02:00
4c9048236f Add gb spider statistics 2016-07-04 11:31:02 +02:00
ab1e3836e7 Removed silly pointer underflow check
If it underflowed then it means the document text is in the first 500 bytes from
address 0. which never happens on modern machines wherethe first page (4KB) is
normally no-access.
2016-07-01 13:54:09 +02:00
b87804c05b Removed unnecessary null check 2016-07-01 12:35:11 +02:00
55e93de571 Fix null pointer dereference 2016-07-01 12:33:30 +02:00
8fe2aebf05 Remove commented out code 2016-06-30 11:06:13 +02:00
40412c5d45 Use EMFILE instead of 24 & E2BIG instead of 7 2016-06-29 17:16:53 +02:00
45f88eb8e5 Remove EHITPROCESSLIMIT. g_errno is never set to EHITPROCESSLIMIT 2016-06-29 16:56:53 +02:00
afdd437956 Remove EHITCRAWLLIMIT. g_errno is never set to EHITCRAWLLIMIT 2016-06-29 16:54:18 +02:00
76dc08637f Code style changes 2016-06-29 12:05:02 +02:00
7f86968533 Fix valgrind uninitialized data warning 2016-06-28 15:49:23 +02:00
14591acae7 Moved up NULL-check to before first use in XmlDoc::set2 2016-06-28 14:12:53 +02:00
bd9ff20d78 Fix unreachable-code warnings 2016-06-28 11:48:30 +02:00
992631a482 Remove unused rootlang 2016-06-24 17:18:14 +02:00
ee0c1d029c Remove commented out code 2016-06-24 14:39:32 +02:00
81e3af7cc3 Fix compilation warnings 2016-06-22 15:22:40 +02:00
56a81b765a Fix misleading-indentation warning 2016-06-22 10:40:55 +00:00