136 Commits

Author SHA1 Message Date
217eace473 Fix compilation warning (warning: enum constant in boolean context) 2017-07-20 11:44:30 +02:00
4027028bbd Fix comments where an acient mass-replace had chagned 'int' and 'long' to int32_t in comments 2017-04-11 14:18:45 +02:00
cdaacd1bb5 Fix buffer overflow when adding a long UTF8 url 2017-03-17 13:18:08 +01:00
b54cd86191 Remove privacore specific tld checks 2017-03-06 16:13:00 +01:00
8183e3cfa3 Move some string util function to GbUtils 2017-03-06 16:10:58 +01:00
8a5a906f4e Added more safety checks in Url::isAdult()
Somehow invalid URLs ended up in titlerec and eventualyl made it to Url which handles them badly.
2017-03-03 13:06:19 +01:00
8fea89b797 Add url block list 2017-01-18 17:17:33 +01:00
29feddf799 Don't use bitwise operators on booleans.
&= doesn't provide short-circuit evaluation which seems to be the intent here.
2017-01-17 00:57:01 +01:00
ec30353afe Remove stripPound variable (default to always true now). Don't store url fragments. Fix relative url resolution for url fragments (we drop them completely now). 2016-12-22 12:27:32 +01:00
d812cb7465 Code style changes 2016-12-22 12:27:32 +01:00
5ae21e0296 Fix relative url resolution except url fragments (#) 2016-12-22 12:27:32 +01:00
37fd0907b5 Only copy url when we need to 2016-12-22 12:27:32 +01:00
181922298c Remove ftp default port as we don't index ftp sites anyway 2016-12-22 12:27:32 +01:00
456812cafb Code style changes 2016-12-22 12:27:32 +01:00
7a320c773b NUL-terminate the url correctly 2016-12-16 15:35:03 +01:00
8dd0036b8b nul-terminate temporary buffer in Url:.set() 2016-12-08 15:47:50 +01:00
ec516df857 Removed nul-termination-hack from Url::set() 2016-12-08 15:44:57 +01:00
ae7dae3124 Moved <base> handling logic to Url::calculateBaseUrl() 2016-12-01 12:48:32 +01:00
8cf0b4990c #include cleanup in Url.cpp 2016-12-01 11:47:44 +01:00
8cea6ff906 Added adult checks on TLDs themselves (eg .xxx) 2016-11-29 15:55:23 +01:00
07b6909676 Renamed Url::isSpam() to isAdult() 2016-11-29 15:06:14 +01:00
d3995a07e4 Moved Url::isSpam() logic to AdultCheck.cpp: isAdultUrl() 2016-11-29 15:01:31 +01:00
416db85d5a Moved isAdult() from Lang.cpp to AdultCheck.cpp 2016-11-29 14:50:17 +01:00
4cc6435fed Made Url::isSpam (private version) static 2016-11-29 14:33:54 +01:00
e74665cc78 Use tabs & print probabledocids as well 2016-11-17 12:35:55 +01:00
5a596df57d Add print_urlinfo tool to print out hashes 2016-11-16 14:58:00 +01:00
d8d6e39441 More constness in Url.* 2016-11-13 14:44:30 +01:00
401c751a83 More constness in Url.* 2016-11-13 14:36:08 +01:00
84acb2d50d Eliminated goto from Url.cpp 2016-11-12 21:06:30 +01:00
d52dabc828 #include cleanup of Url.h 2016-11-12 20:01:44 +01:00
f8f230be34 minor code simplifications in Url 2016-10-22 20:39:21 +02:00
765b7094cc init class members in Url 2016-09-30 14:23:18 +02:00
2b84ca4c0b Remove use of ?: which is a gcc extension 2016-09-20 13:41:23 +02:00
ea5fcd4c7e #include cleanup in Speller.* 2016-09-07 18:46:28 +02:00
d602bd59ba Make url extension initialization thread-safe 2016-08-19 16:09:34 +02:00
13840a06f7 Add constness to some tld/domain/url functions 2016-08-11 12:58:51 +02:00
3c2773a928 Removed gbstrlen()
gbstrlen() just checked for NULL and called gbshutdownAbort(). Dereferncing NULL on mordern platforms cases a SIGSEGV which is cought by our signal handler and .... gbshutdownAbort() is called. So gbstrlen() was superfluous and complicated static analysis.
2016-07-28 17:04:35 +02:00
37d203f540 Add privacore tld blacklist 2016-07-26 15:19:01 +02:00
4dccc715bd #include cleanup 2016-07-11 13:53:44 +02:00
4d01557b00 Fix bug when handling extra long url (url longer than MAX_URL_LEN) 2016-07-05 12:19:33 +02:00
f3f5eefcb6 First batch of changes streamlining emergency shutdown code 2016-06-20 12:30:26 +02:00
994e0777b5 Remove redundent ismedia url filter. we're not inserting media url anymore into spiderdb 2016-06-08 11:32:55 +02:00
a908e4a6e9 Add constness to getHostFast 2016-06-01 15:35:25 +02:00
5d24124a78 Fix UrlParser logic to be more similar to Url 2016-05-31 12:04:33 +02:00
190d68512d Fix conversion from string literal to 'char *' for SiteGetter 2016-05-31 11:26:01 +02:00
228fd7bd53 Fixes for new tlds.
They can now contain '-' and numbers.
Fix punycode url encoding: set max length before encoding each url chunk.

Conflicts:
	Url.cpp
2016-05-30 15:26:44 +02:00
6a63a2ce89 Add criteria for stripping session parameter. Add missing tests. 2016-05-30 12:49:43 +02:00
2adc21655e Fix bug in UrlComponent where allow criteria is wrong. Fix code style to make it clearer. 2016-05-30 12:49:43 +02:00
d1c9e5a0ea constness in Url 2016-05-27 18:20:01 +02:00
593b9922aa constness in Url 2016-05-27 18:07:43 +02:00