No longer maintained. Please read our shutdown message.
Go to file
2023-12-03 23:55:58 -05:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
doxygen Try with wildcard for doxygen symbols 2016-09-23 22:29:38 +02:00
html added more options to rank test page 2017-12-15 16:17:36 +01:00
misc Remove now unused Msg1 2017-04-04 16:22:54 +02:00
sto lemma: generate lemmas for proper nouns too (normally genetive-case -> unmarked-case) 2018-07-06 14:06:28 +02:00
test Fix RdbBucketsTest seg fault (missing return statements), call pytest instead of py.test in system test Makefile 2023-12-03 23:55:58 -05:00
third-party Updat sparsepp to use latest commit 2017-10-23 12:13:55 +02:00
tokenizer tokenizer compilation fix 2018-07-26 17:31:36 +02:00
tools Fix compliation of filter_titledb 2018-08-31 12:11:16 +02:00
ucdata Added missing ucdata/ files and updated .gitignore to not ignore ucdata/*.dat 2018-03-20 14:19:25 +01:00
unicode Spelling fix unicode readme 2018-07-24 17:21:54 +02:00
word_variations added missing utf8_fast.h 2018-08-09 20:51:23 +02:00
.clang-format Add clang format file (initial edition) 2015-12-22 10:30:57 +01:00
.gitignore Merge branch 'master' into tokenizer 2018-04-20 14:28:47 +02:00
.gitmodules Add sparsepp as third-party project 2017-09-18 16:22:11 +02:00
.travis.yml Fix compilation issue on travis 2018-02-14 12:40:55 +01:00
.valgrindrc Add valgrind rc file for default configuration 2016-05-19 11:00:04 +02:00
Abbreviations.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Abbreviations.h unicode: Preliminary commit, work-in-progress 2018-02-06 17:06:35 +01:00
adultphrases.txt.example new adult detection code 2017-10-26 12:20:04 +02:00
adultwords.txt.example new adult detection code 2017-10-26 12:20:04 +02:00
antiword fix calls to antiword and pdftohtml etc. 2014-06-15 17:44:52 -07:00
BaseScoringParameters.cpp Merge branch 'master' into lemma 2018-06-22 16:01:03 +02:00
BaseScoringParameters.h Merge branch 'master' into lemma 2018-06-22 16:01:03 +02:00
BigFile.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
BigFile.h Removed last use of O_NONBLOCK in BigFile 2016-12-20 14:21:23 +01:00
BitOperations.h Add constness 2016-10-07 23:14:37 +02:00
Bits.cpp bugfix 183b99c925: Bits::setInLinkBits() was coredumping 2018-09-06 13:11:24 +02:00
Bits.h const for Bits:.setInLinkBits() 2018-09-04 16:18:29 +02:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
browser.py Add code to run through external filter when the html document have low word count (configurable) 2017-11-09 17:14:17 +01:00
ByteOrderMark.cpp unicode: Preliminary commit, work-in-progress 2018-02-06 17:06:35 +01:00
ByteOrderMark.h unicode: Preliminary commit, work-in-progress 2018-02-06 17:06:35 +01:00
Clusterdb.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Clusterdb.h Made ClusterDb::get*() take properly typed key96_t* 2018-07-02 16:46:48 +02:00
cmpversiongte Fix GCC version checks. Was resultign int 'syntax error' from bc 2017-09-22 13:58:26 +02:00
Collectiondb.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
Collectiondb.h Parmaeter for 'dedup URLs by default' (ddud) so we can control it per collection 2018-05-01 16:13:42 +02:00
collnum_t.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
Conf.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Conf.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
ContentMatchList.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
ContentMatchList.h Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
ContentTypeBlockList.cpp Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
ContentTypeBlockList.h Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
control.deb package bldg updates 2014-06-16 21:50:32 -06:00
ConvertSpiderdb.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
ConvertSpiderdb.h Added spiderdb->sqlite conversion command 2017-10-09 17:15:58 +02:00
copyright.head package bldg updates 2014-06-16 21:50:32 -06:00
copyright.tail package bldg updates 2014-06-16 21:50:32 -06:00
CountryCode.cpp Initial implementation of CountryLanguage 2018-07-25 13:35:21 +02:00
CountryCode.h Initial implementation of CountryLanguage 2018-07-25 13:35:21 +02:00
CountryLanguage.cpp Use host instead of url to get country tld 2018-07-25 14:03:24 +02:00
CountryLanguage.h Use host instead of url to get country tld 2018-07-25 14:03:24 +02:00
DailyMerge.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
DailyMerge.h char -> bool in DailyMerge 2016-10-20 11:16:12 +02:00
default.css Support rebulding spiderdb from titledb documnet URLs only 2017-10-31 15:48:49 +01:00
Dir.cpp Disable gcc deprecation warnings in places where we can't don anything about it 2018-10-09 15:41:00 +02:00
Dir.h Remove unused DirIterator 2017-04-06 19:45:27 +02:00
Dns_internals.h Moved implementation details in Dns out of header file 2016-11-12 17:08:38 +01:00
Dns.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Dns.h Renamed Dns::getResponsibleHost() to getIPLookupHost() 2016-11-24 16:31:08 +01:00
DnsBlockList.cpp Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
DnsBlockList.h Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
DnsProtocol.h More #include cleanup in UdpServer.h 2016-09-26 15:37:13 +02:00
DocDelete.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
DocDelete.h Split up parameter for DocProcess delay as well 2018-04-13 11:54:31 +02:00
Docid2Siteflags.cpp Assume docid2flagsandsitemap.dat is sorted 2018-07-12 17:05:28 +02:00
Docid2Siteflags.h Assume docid2flagsandsitemap.dat is sorted 2018-07-12 17:05:28 +02:00
Docid.cpp Support dynamic TLD list 2018-08-31 13:32:06 +02:00
Docid.h Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
DocProcess.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
DocProcess.h Add a tmperr file so we can keep track of tmp errors that happen during docprocess 2018-04-13 13:19:17 +02:00
DocRebuild.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
DocRebuild.h Split up parameter for DocProcess delay as well 2018-04-13 11:54:31 +02:00
DocReindex.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
DocReindex.h Split up parameter for DocProcess delay as well 2018-04-13 11:54:31 +02:00
DocumentIndexChecker.h added getFileNum method to DocumentIndexChecker 2017-04-21 17:21:53 +02:00
Doledb.cpp Added SpiderLoop::nukeWinnerListCache() so Doledb doesn't have to poke directly inside spiderloop 2017-10-20 14:25:07 +02:00
Doledb.h Nuke doledb periodically to hide the bugs/limitations in the spidering stuff 2017-08-07 15:26:29 +02:00
Domains.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Domains.h Support dynamic TLD list 2018-08-31 13:32:06 +02:00
DumpSpiderdbSqlite.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
DumpSpiderdbSqlite.h Readd ./gb dump s command 2018-02-15 17:21:20 +01:00
EGStack.cpp work-in-progress: new tokenizer 2018-03-01 16:38:19 +01:00
EGStack.h work-in-progress: new tokenizer 2018-03-01 16:38:19 +01:00
Entities.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Entities.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
entities.json entities.* fixes 2016-06-09 13:43:43 +02:00
Errno.cpp Add check for ipblocklist while spidering 2018-02-05 15:23:04 +01:00
Errno.h Add check for ipblocklist while spidering 2018-02-05 15:23:04 +01:00
fctypes.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
fctypes.h #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
File.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
File.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
FxBlobCache.cpp Fix clang++ warning: assigning field to itself 2018-02-28 11:19:52 +01:00
FxBlobCache.h Changed spiderloop:winnerlistcache from RdbCache to FxBlobCache 2017-10-23 16:52:09 +02:00
FxBlobCacheInstantiation.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
FxCheckAdult.cpp Simplified Phrases class a bit 2018-03-12 16:18:17 +01:00
FxCheckAdult.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
FxCheckSpam.cpp Simplified Phrases class a bit 2018-03-12 16:18:17 +01:00
FxCheckSpam.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
FxClient.cpp Disable m_max_outstanding when it's 0 (as described in configuration) 2018-06-06 16:05:45 +02:00
FxClient.h Add reinitializeSettings to QueryLanguage 2018-04-05 14:01:09 +02:00
FxExplicitKeywords.cpp Removed debuglog left over in FxExplicitKeywords.cpp 2018-01-19 14:30:43 +01:00
FxExplicitKeywords.h Support extra keywords/temrs on pages 2018-01-19 14:24:01 +01:00
FxLanguage.cpp Add bestEffort flag 2018-03-05 17:29:48 +01:00
FxLanguage.h Add bestEffort flag 2018-03-05 17:29:48 +01:00
FxTermCheckList.cpp Changed Phrases::getPhraseIds2() to a simpler and looser-coupled getPhraseId() 2018-03-19 10:15:38 +01:00
FxTermCheckList.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
g_hashtab.inc Use static const g_hashtab instead of calculating/generating it upon each startup 2016-09-20 14:21:03 +02:00
gb-1.0.spec make it so we don't need --nodeps with 2014-05-25 22:08:46 -04:00
gb.deb.rules if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
gb.pem add old gb.pem file, not used by gigablast 2013-10-09 17:37:01 -06:00
GbCache.h Fix compilation of unit test 2018-07-05 14:38:09 +02:00
gbclean.sh Add x bit on gbclean.sh 2018-04-04 12:18:03 +02:00
GbCompress.cpp Moved gbcompress/gbuncompress from XmlDoc.* to separate file 2017-01-09 14:00:34 +01:00
GbCompress.h Moved gbcompress/gbuncompress from XmlDoc.* to separate file 2017-01-09 14:00:34 +01:00
gbconvert.sh Add code to run through external filter when the html document have low word count (configurable) 2017-11-09 17:14:17 +01:00
GbCopyFile.cpp Add checks for when src & dst have different sizes after moving/copying 2017-03-14 15:32:32 +01:00
GbCopyFile.h Added copyFile() function 2016-10-24 14:52:25 +02:00
GbDns.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
GbDns.h Delay reload of dns settings until there is no pending requests, and at the same time pause requests 2018-02-08 16:32:55 +01:00
GbEncoding.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
GbEncoding.h Move getCharsetFast to GbEncoding 2017-06-26 14:16:47 +02:00
GbFormat.h Moved format #define out from HttpRequest.h so more code doesn't need to know the innaers of HttpRequest 2016-08-31 19:14:16 +02:00
GbMakePath.cpp bugfix makePath(): was referring to 1 beyond the end 2016-10-24 13:29:02 +02:00
GbMakePath.h Added makePath() function 2016-10-21 13:21:49 +02:00
gbmemcpy.h Moved 'gbmemcpy' macro to separate file 2018-07-26 17:01:42 +02:00
GbMoveFile2.cpp bugfix moveFile2Phase2() 2016-10-25 11:53:09 +02:00
GbMoveFile2.h Added moveFile2Phase1()/moveFile2Phase2() 2016-10-24 15:15:35 +02:00
GbMoveFile.cpp Add checks for when src & dst have different sizes after moving/copying 2017-03-14 15:32:32 +01:00
GbMoveFile.h Added GbMoveFile.* 2016-10-18 13:59:44 +02:00
GbMutex.cpp Added GbMutex class 2016-08-15 16:24:56 +02:00
GbMutex.h Added GbMutex class 2016-08-15 16:24:56 +02:00
GbRegex.cpp Fix clang warning: no newline at end of file 2017-03-27 14:57:20 +02:00
GbRegex.h Add thin wrapper around pcre.h 2017-02-07 16:38:54 +01:00
GbSignature.cpp Fix clang++ warning: function 'signature_verification_failed' could be declared with attribute 'noreturn' 2018-02-28 11:26:38 +01:00
GbSignature.h Fix clang++ warning: function 'signature_verification_failed' could be declared with attribute 'noreturn' 2018-02-28 11:26:38 +01:00
gbstart.sh Add hostid to eventlog 2018-07-23 14:57:09 +02:00
GbThreadQueue.cpp Fix unit test after merging 2017-05-08 14:10:10 +02:00
GbThreadQueue.h Make GbThreadQueue::m_stop atomic 2017-04-25 11:22:36 +02:00
GbUtil.cpp Fix split function 2018-07-10 16:27:28 +02:00
GbUtil.h Split string by string delimiter 2018-07-06 14:23:16 +02:00
generate_entities.py Update generate_entities.py to run in python 3 2023-12-03 19:51:02 -05:00
generate_query_stop_word_languages.sh Improved qstopword generation + dependencies 2017-12-29 14:56:10 +01:00
generate_query_stop_words.sh NULL-terminate query-stop-word tables 2017-12-22 15:48:03 +01:00
generate_tld_list.sh Modifications to tld list 2018-07-17 13:07:35 +02:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
GigablastRequest.h Add constructor for GigablastRequest instead of memset which will break a preallocated vector in Msg4 2017-08-30 16:41:37 +02:00
hash.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
hash.h #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
HashTable.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
HashTable.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
HashTableT.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
HashTableT.h Rename getNumSlotsUsed to getNumUsedSlots to keep things consistent 2017-04-07 15:55:02 +02:00
HashTableX.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
HashTableX.h memcpy.->memmove 2018-07-26 14:34:06 +02:00
HighFrequencyTermShortcuts.cpp Fix allocation of working-area in PosdbTable::intersectLists10_r() 2017-06-12 16:14:28 +02:00
HighFrequencyTermShortcuts.h If a word in a query is a high-freq-term then ignore it if possible 2017-02-13 16:44:34 +01:00
Highlight.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Highlight.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
Hostdb.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Hostdb.h Size file/pathbuffers correctly 2018-10-09 15:39:31 +02:00
HostFlags.cpp Move declaration of g_recoveryMode to HostFlags.cpp 2017-05-09 10:57:40 +02:00
HostFlags.h Keep our own Host::m_flags up-to-date 2017-05-04 14:37:19 +02:00
HttpMime.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
HttpMime.h Remove getContentTypePrivate and move logic into parseContentType function 2017-11-19 10:25:28 +01:00
HttpRequest.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
HttpRequest.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
HttpServer.cpp Removed non-const version of Url::getHost() 2018-08-24 13:13:09 +02:00
HttpServer.h const for getMsgSize() 2017-03-10 11:52:34 +01:00
iana_charset.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
iana_charset.h Align misc/parse_iana_charsets.pl with modified iana_charset.* 2016-11-11 15:31:23 +01:00
Images.cpp Removed non-const pointer-returning methods from Url class 2018-08-24 13:37:43 +02:00
Images.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
init.gb.conf minor make install changes 2014-05-22 18:46:38 -07:00
InstanceInfoExchange.cpp Log an all-clear message once when all instances are alive 2017-08-11 14:50:24 +02:00
InstanceInfoExchange.h Removed parameter from InstanceInfoExchange::weAreAlive() 2017-05-02 11:57:22 +02:00
IOBuffer.h swap(IOBuffer,IOBuffer) should be inline 2017-05-01 15:58:37 +02:00
ip.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
ip.h Removed non-renetrant version of iptoa() 2017-05-10 17:54:00 +02:00
IPAddressChecks.cpp bugfix is_internal_net_ip/is_trusted_protocol_ip (harmless) 2017-01-30 11:24:29 +01:00
IPAddressChecks.h Revert "removed duplicate function is_trusted_protocol_ip from IPAddressChecks" 2016-10-20 15:34:20 +02:00
IpBlockList.cpp Fix bug where wrong function name is used to overload parent function 2018-06-21 11:02:59 +02:00
IpBlockList.h Missed in previous commit 2018-06-21 11:08:53 +02:00
Jenkinsfile Archive gbclean.sh script as well 2018-04-04 12:06:33 +02:00
JobScheduler.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
JobScheduler.h Add http interface to docprocess for ease of testing 2017-12-01 17:59:55 +01:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Json.h #include cleanup in Json.* 2018-07-20 16:03:46 +02:00
Lang.cpp Cater for languages not in gb (eg: nynorsk/bokmaal) 2018-03-09 10:24:22 +01:00
Lang.h Use lang_t enum more than just plain uint8_t 2017-11-21 16:18:45 +01:00
LanguageResultOverride.cpp Add ResultOverride for exact url 2017-10-26 11:09:44 +02:00
LanguageResultOverride.h Fix clang++ warning: 'ResultOverride' defined as a class here but previously declared as a struct 2018-02-28 11:30:06 +01:00
Lemma.cpp Only load sto lexicon once 2018-07-17 13:49:53 +02:00
Lemma.h Only load sto lexicon once 2018-07-17 13:49:53 +02:00
Lexicons.cpp support older version of unique_ptr<> 2018-07-17 14:32:41 +02:00
Lexicons.h Only load sto lexicon once 2018-07-17 13:49:53 +02:00
libiconv64.a added 64 bit libiconv64.a 2014-11-14 17:34:11 -08:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libjpeg.so.62 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libnetpbm.so.10 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpng12.so.0 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libtiff.so.4 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
LICENSE License update 2017-10-26 10:31:55 +02:00
Linkdb.cpp #include cleanup in linkspam.h 2018-07-20 16:12:23 +02:00
Linkdb.h Removed non-const pointer-returning methods from Url class 2018-08-24 13:37:43 +02:00
linkspam.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
linkspam.h #include cleanup in linkspam.h 2018-07-20 16:12:23 +02:00
Log.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
Log.h Add logHexTrace 2017-10-11 11:23:57 +02:00
Loop.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Loop.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
main.cpp Revert "Fixed long time bug regarding startup on clean DB" 2018-10-12 15:39:30 +02:00
Makefile Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
matches2.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
matches2.h Add const 2017-08-17 11:04:28 +02:00
Matches.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
Matches.h Fixed signed/unsigned comparison resulting from tokenizer known there cannot be negativer number of tokens 2018-03-19 15:33:26 +01:00
MatchList.cpp Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
MatchList.h Add g_contentRetryProxyList & rename BlockList to MatchList 2018-05-31 12:44:20 +02:00
max_coll_len.h Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files 2016-11-12 20:44:42 +01:00
max_hosts.h increased max hosts from 512 to 768 2017-01-25 14:18:28 +01:00
max_niceness.h Moved MAX_NICENESS from Threads.h to a separate max_niceness.h file 2016-04-28 14:28:14 +02:00
max_url_len.h Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files 2016-11-12 20:44:42 +01:00
max_words.h #include cleanup in Phrases.* 2016-08-05 14:31:28 +02:00
Mem.cpp Disable gcc deprecation warnings in places where we can't don anything about it 2018-10-09 15:41:00 +02:00
Mem.h Made Mem more private 2017-02-19 16:41:55 +01:00
MemoryMappedFile.cpp Renamed MemoryMappedFile.cc to MemoryMappedFile.cpp (for consistent extensions) 2018-07-20 15:59:06 +02:00
MemoryMappedFile.h Assume docid2flagsandsitemap.dat is sorted 2018-07-12 17:05:28 +02:00
MergeSpaceCoordinator.cpp Use a thread name for mergecoordinator-hold-lock thread 2017-10-09 12:14:46 +02:00
MergeSpaceCoordinator.h Check disk free bytes in mergespacecoordinator 2016-10-18 15:23:27 +02:00
Msg0.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Msg0.h Encapsulate Msg0 better 2017-11-03 15:30:32 +01:00
Msg2.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Msg2.h Tracked down one source of double callback calls 2016-08-30 16:37:53 +02:00
Msg3.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Msg3.h Use fileId instead of fileNum (which is an index info m_fileInfo) 2017-05-10 16:25:46 +02:00
Msg3a.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Msg3a.h Fix race condition in Msg3a 2018-05-22 14:28:46 +02:00
Msg4In.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Msg4In.h Move Msg4In related functions to Msg4In namespace 2017-04-26 11:25:29 +02:00
Msg4Out.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Msg4Out.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
Msg5.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Msg5.h Fix merge list when posdb is merging 2018-01-05 11:21:51 +01:00
Msg13.cpp Moved serialization/deserialization functions to separate file 2018-07-30 13:07:02 +02:00
Msg13.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
Msg20.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Msg20.h Remove hop count (not stored in sqlite based spiderdb) 2018-02-02 15:50:03 +01:00
Msg22.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Msg22.h Removed unused Msg22::getAvailDocIdOnly() 2018-02-12 15:08:51 +01:00
Msg25.cpp Removed non-const pointer-returning methods from Url class 2018-08-24 13:37:43 +02:00
Msg25.h Remove unused Msg25::m_adBanTable 2018-07-04 11:48:24 +02:00
Msg39.cpp Moved serialization/deserialization functions to separate file 2018-07-30 13:07:02 +02:00
Msg39.h Removed Msg39Request::m_stripe 2018-04-26 16:16:10 +02:00
Msg40.cpp Removed obsolete comment about gigabits 2018-10-04 15:41:24 +02:00
Msg40.h Removed unused Msg40::printSearchResult9() 2018-01-29 16:27:53 +01:00
Msg51.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Msg51.h Removed the msg5* parameter to Msg0::getList() 2017-11-03 15:13:08 +01:00
MsgC.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
MsgC.h Make MsgC/Msge1 coorperation more clear 2016-11-24 15:33:16 +01:00
Msge0.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Msge0.h #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
Msge1.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
Msge1.h Remove declared but not defined functions 2017-07-11 15:00:12 +02:00
msgtype_t.h Dropped PingInfo + PingServer entierely. 2017-05-05 15:00:31 +02:00
Multicast.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Multicast.h Fix Msg0/Multicast to handle reads from spiderdb when some hosts have spidering disabled 2017-10-26 14:22:18 +02:00
mysynonyms.txt mysyn fixes 2015-04-22 08:34:29 -06:00
nodeid_t.h MOved BACKBIT/BACKBITCOMP from Words.h to nodeid_t.h 2018-03-02 17:36:31 +01:00
PageAddColl.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageAddUrl.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageBasic.cpp Resurrect /admin/status (it is useful for automated tests) 2018-01-12 15:24:24 +01:00
PageCrawlBot.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
PageCrawlBot.h Remove diffbot specific CollectionRec::m_notifyUrl 2016-07-22 13:59:55 +02:00
PageDocProcess.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageDoledbIPTable.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageGet.cpp Removed 'strip' parameter from stripHtml() 2018-08-07 14:59:12 +02:00
PageHealthCheck.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageHosts.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageInject.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
PageInject.h Made setInjectionRequestFromParms() local to PageInject.cpp 2018-07-30 11:12:04 +02:00
PageLinkdbLookup.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
PageParser.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageParser.h Remove unused /print api & cleanup PageParser 2018-05-23 14:15:36 +02:00
PagePerf.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageReindex.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
PageReindex.h Use lang_t enum more than just plain uint8_t 2017-11-21 16:18:45 +01:00
PageResults.cpp Removed non-functional ban-all-these-domains functionality in PageResults 2018-08-30 16:49:50 +02:00
PageResults.h Use multiple/all langauge weigths returned by the external query-language-server 2018-05-15 16:30:52 +02:00
PageRoot.cpp Removed freestanding dequote() 2018-08-07 15:19:40 +02:00
PageRoot.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Pages.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Pages.h Remove unused variable 2018-03-05 11:30:39 +01:00
PageSockets.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageSpider.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
PageSpiderdbLookup.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
PageStats.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
PageTemperatureRegistry.cpp use mmap'ping for page temperatures, and exploit that it is sorted 2018-07-13 13:41:45 +02:00
PageTemperatureRegistry.h use mmap'ping for page temperatures, and exploit that it is sorted 2018-07-13 13:41:45 +02:00
PageThreads.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PageTitledb.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Parms.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
Parms.h New site median page temperature client & settings 2018-06-15 11:38:32 +02:00
Phrases.cpp bugfix/workaround for bigram hashes 2018-08-02 13:14:44 +02:00
Phrases.h Removed obsolete, incorrect or unuseful comments from Phrases.h 2018-03-19 10:15:38 +01:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Pops.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Pos.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Pos.h Fixed signed/unsigned comparison resulting from tokenizer known there cannot be negativer number of tokens 2018-03-19 15:33:26 +01:00
Posdb.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Posdb.h Merge branch 'master' into lemma 2018-06-25 15:36:39 +02:00
PosdbTable.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
PosdbTable.h Changed posdbtable to use enum lang_t 2018-05-03 12:00:25 +02:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
Process.h Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header) 2017-05-30 12:12:32 +02:00
Profiler.cpp Made Profiler.cpp compile with newer glibc 2018-10-09 14:46:21 +02:00
Profiler.h Got rid of PTRFMT/PTRTYPE (except in Mem.cpp), and use %p instead 2018-07-23 15:24:16 +02:00
Proxy.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
Proxy.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
Punycode.cpp Start to detect non-asci urls and encode them to ascii. 2015-09-12 15:47:33 -06:00
Punycode.h Standardize header guards 2016-03-08 22:16:02 +01:00
query_stop_words.da.txt Extended query-stop-words danish with interrogative relative pronouns and query-general-adverbs 2017-12-29 15:58:34 +01:00
query_stop_words.de.txt MOved query-stop-words arrays out from StopWords.cpp to textfiles and generate the static arrays at build time 2017-12-22 15:43:59 +01:00
query_stop_words.en.txt MOved query-stop-words arrays out from StopWords.cpp to textfiles and generate the static arrays at build time 2017-12-22 15:43:59 +01:00
query_stop_words.xx.txt Split out danish query-stop-words 2017-12-29 15:04:15 +01:00
Query.cpp Closed off Bits member access so we know what can be changed freely 2018-09-04 14:50:38 +02:00
Query.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
QueryLanguage.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
QueryLanguage.h qlangserver: return weights as doubles instead of ints 2018-05-15 15:09:51 +02:00
Rdb.cpp Size file/pathbuffers correctly 2018-10-09 15:39:31 +02:00
Rdb.h Moved Rdb initialization from main to Rdb.cpp in new function initialiseAllPrimaryRdbs() 2017-09-26 16:29:49 +02:00
RdbBase.cpp Size file/pathbuffers correctly 2018-10-09 15:39:31 +02:00
RdbBase.h Added RdbBase::unlink() 2017-10-30 14:02:28 +01:00
RdbBuckets.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbBuckets.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
RdbCache.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
RdbCache.h #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
RdbDump.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbDump.h Move addList to a thread for RdbDump 2017-06-09 15:02:18 +02:00
rdbid_t.h Send site-default-pagem-temperature from spidering to query-host 2018-02-22 15:07:08 +01:00
RdbIndex.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
RdbIndex.h Uhm, dependencies didn't catch the compilation error? 2017-10-06 14:51:14 +02:00
RdbIndexQuery.cpp Cater for newly dumped file in a different way while merging list 2017-05-31 16:33:23 +02:00
RdbIndexQuery.h Cater for newly dumped file in a different way while merging list 2017-05-31 16:33:23 +02:00
RdbList.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
RdbList.h Fix merge list when posdb is merging 2018-01-05 11:21:51 +01:00
RdbMap.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
RdbMap.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbMem.cpp Fix clang++ warning: no newline at end of file 2018-02-28 11:19:52 +01:00
RdbMem.h Add lock to RdbMem 2017-05-11 14:10:11 +02:00
RdbMerge.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbMerge.h Fix typo in variable 2017-06-06 15:19:34 +02:00
RdbScan.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbScan.h Removed unused 'allowPageCache' parameter to RdbScan::setRead() 2017-01-29 01:23:39 +01:00
RdbTree.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
RdbTree.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
README.md Update README.md 2018-11-08 13:40:03 +01:00
Rebalance.cpp #include cleanup in Rebalance.* 2018-07-20 16:05:57 +02:00
Rebalance.h #include cleanup in Rebalance.* 2018-07-20 16:05:57 +02:00
repair_mode.h Use repair_mode_t instead of magic constants 2017-01-02 14:48:04 +01:00
Repair.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Repair.h Support rebulding spiderdb from titledb documnet URLs only 2017-10-31 15:48:49 +01:00
ResultOverride.cpp Add custom title/summary for results that are blocked by robots.txt 2017-10-24 10:41:19 +02:00
ResultOverride.h Add custom title/summary for results that are blocked by robots.txt 2017-10-24 10:41:19 +02:00
RobotRule.cpp Fix cppcheck warning -> Array index 'needlePos' is used before limits check 2016-08-08 14:23:37 +02:00
RobotRule.h Normalize url. Don't encode character which are not suppose to be encoded. 2016-05-10 16:15:31 +02:00
Robots.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
Robots.h Code style changes 2016-12-02 16:29:31 +01:00
RobotsBlockedResultOverride.cpp Initialize m_loading 2017-11-03 23:22:50 +01:00
RobotsBlockedResultOverride.h Move reloading of RobotsBlockedResultOverride to a separate thread 2017-11-02 16:19:44 +01:00
robotsblockedresultoverride.txt Add custom title/summary for results that are blocked by robots.txt 2017-10-24 10:41:19 +02:00
RobotsCheckList.cpp Move reloading of RobotsCheckList to a separate thread 2017-11-02 16:19:44 +01:00
RobotsCheckList.h Move reloading of RobotsCheckList to a separate thread 2017-11-02 16:19:44 +01:00
runCoverityAnalysis.sh Don't cat logs anymore 2017-03-22 13:47:47 +01:00
runSonarQubeAnalysis.sh Add coverity scan to travis 2016-11-08 15:59:24 +01:00
S99gb added S99gb for loading at boot. 2014-06-23 07:32:38 -06:00
SafeBuf.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
SafeBuf.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Sanity.cpp Move g_errno_init into g_errno_location itself 2016-07-28 11:23:30 +02:00
Sanity.h pclint plus beta 9 knows about [[noreturn]] 2017-01-08 14:11:10 +01:00
ScalingFunctions.cpp Added scale_logarithmically() 2017-04-28 15:26:52 +02:00
ScalingFunctions.h Added scale_logarithmically() 2017-04-28 15:26:52 +02:00
ScopedLock.h Added GbMutex class 2016-08-15 16:24:56 +02:00
ScoringWeights.cpp Merge branch 'master' into lemma 2018-06-22 16:01:03 +02:00
ScoringWeights.h Merge branch 'master' into lemma 2018-06-22 16:01:03 +02:00
SearchInput.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
SearchInput.h Use multiple/all langauge weigths returned by the external query-language-server 2018-05-15 16:30:52 +02:00
Sections.cpp Closed off Bits member access so we know what can be changed freely 2018-09-04 14:50:38 +02:00
Sections.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
Serialize.cpp Moved serialization/deserialization functions to separate file 2018-07-30 13:07:02 +02:00
Serialize.h Moved serialization/deserialization functions to separate file 2018-07-30 13:07:02 +02:00
SiteGetter.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
SiteGetter.h #include cleanup in SiteGetter.h 2018-07-20 16:08:27 +02:00
sitelinks.txt fixed missing sites in sitelinks.txt 2015-03-05 20:32:01 -08:00
SiteMedianPageTemperature.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
SiteMedianPageTemperature.h New site median page temperature client & settings 2018-06-15 11:38:32 +02:00
SiteMedianPageTemperatureRegistry.cpp bugfix SiteMedianPageTemperatureRegistry::add(), incorrect condition in if / wrong use of iterator 2018-04-06 13:22:01 +02:00
SiteMedianPageTemperatureRegistry.h Reworked SiteMedianPageTemperatureRegistry to how it will work in the future 2018-02-16 17:09:18 +01:00
SiteNumInlinks.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
SiteNumInlinks.h Initial implementation of SiteNumInlinks client 2018-05-25 16:48:34 +02:00
sonar-project.properties Add output directory for sonarqube build-wrapper 2016-11-07 12:22:18 +01:00
sort.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
sort.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Speller.cpp Removed check for apporved checksum in Speller.cpp and Wiktionary.cpp 2018-04-24 12:28:54 +02:00
Speller.h #include cleanup in HashTableX.h 2018-07-20 16:29:06 +02:00
spider_status_t.h Made spider status ¤defines into strongly typed enum 2017-09-26 13:29:36 +02:00
Spider.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
Spider.h #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
SpiderCache.cpp Got rid of PTRFMT/PTRTYPE (except in Mem.cpp), and use %p instead 2018-07-23 15:24:16 +02:00
SpiderCache.h Uhm, dependencies didn't catch the compilation error? 2017-10-06 14:51:14 +02:00
SpiderColl.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
SpiderColl.h Run SpiderdbRdbSqliteBridge::getList in a thread 2018-03-19 10:15:38 +01:00
SpiderdbRdbSqliteBridge.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
SpiderdbRdbSqliteBridge.h Split out logic used for waiting tree list into SpiderdbRdbSqliteBridge::getFirstIps and use SELECT DISTINCT instead. (Don't remove multi ip select this time) 2018-02-22 17:37:22 +01:00
SpiderdbSqlite.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
SpiderdbSqlite.h Make sure the we have spiderdb sqlite if we still have spiderdb rdb files 2018-02-16 11:23:08 +01:00
SpiderdbUtil.cpp Added ScopedSqlitedbLock 2017-11-03 13:58:42 +01:00
SpiderdbUtil.h Deleted unwanted spiderrequests as we scan through them 2017-10-27 14:17:53 +02:00
SpiderLoop.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
SpiderLoop.h Changed spiderloop:winnerlistcache from RdbCache to FxBlobCache 2017-10-23 16:52:09 +02:00
SpiderProxy.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
SpiderProxy.h #include cleanup in Msg13.* 2016-11-11 15:51:30 +01:00
Statistics.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
Statistics.h Add Url validity check (currently disabled) 2017-09-26 22:25:19 +02:00
Stats.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
Stats.h Removed PingInfo::m_socketsClosedFromHittingLimit 2017-05-01 12:08:24 +02:00
StopWords.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
StopWords.h #include cleanup of StopWords.h 2016-08-11 16:43:25 +02:00
Summary.cpp Fix typos in comment 2018-09-06 14:42:00 +02:00
Summary.h Optimized avoid-cookies-warning in summary generation 2018-08-31 16:11:29 +02:00
SummaryCache.cpp Sync lock changes from nomerge2 to master 2017-03-24 14:05:44 +01:00
SummaryCache.h Sync lock changes from nomerge2 to master 2017-03-24 14:05:44 +01:00
Synonyms.cpp bugfix/workaround for bigram hashes 2018-08-02 13:14:44 +02:00
Synonyms.h tokenizer: first shot at somethign that appears to work 2018-03-09 16:24:39 +01:00
Tagdb.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
Tagdb.h Removed non-const version of Url::getHost() 2018-08-24 13:13:09 +02:00
TcpServer.cpp Use openssl's ERR_remove_thread_state() or nothing at all depending on version 2018-10-09 15:40:23 +02:00
TcpServer.h Remove commented out code 2017-11-16 17:19:43 +01:00
TcpSocket.h Make sure EDOCBADCONTENTTYPE doesn't return as EDOCTOOBIG 2017-11-20 15:22:47 +01:00
termid_mask.h Split our TERMID_MASK definition 2016-09-06 12:07:02 +02:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp Select best of og:title, og_site_name, <title> and meta title 2018-09-06 16:52:15 +02:00
Title.h Select best of og:title, og_site_name, <title> and meta title 2018-09-06 16:52:15 +02:00
Titledb.cpp Added w more start-up tests for Url::getDomain() and getDomFast() 2018-08-23 15:44:26 +02:00
Titledb.h #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
TitleRecVersion.h Make sure we strip session id even if query parameter is separated by '?' 2018-04-12 15:24:55 +02:00
TitleSummaryCodepointFilter.h Moved isUtf8UnwantedSymbols() to separate header 2018-02-03 20:58:45 +01:00
tlds-additional-2nd-level-domains.txt Modifications to tld list 2018-07-17 13:07:35 +02:00
tlds-alpha-by-domain.txt Moved hardcoded TLD list from Doamins.cpp to external text files 2018-04-20 13:43:49 +02:00
tlds-official-2nd-level-domains.txt Modifications to tld list 2018-07-17 13:07:35 +02:00
TopTree.cpp Moved Titledb::...ProbableDocId... methods to separate namespace 2018-08-31 12:11:16 +02:00
TopTree.h Cleanup after gbsortbyint/gbrevsortbyint are no longer supported 2018-01-12 15:04:24 +01:00
types.h Moved collnum_t type to separate header file 2017-10-06 14:32:45 +02:00
UdpProtocol.h Use key96_t instead of key_t and redefining std lib key_t (which breaks std lib functionality that uses key_t) 2016-09-02 14:49:06 +02:00
UdpServer.cpp Removed local/global time distinction 2018-08-07 14:38:37 +02:00
UdpServer.h Add sanity for UdpSlot, message size shouldn't change across datagrams 2017-05-31 15:01:19 +02:00
UdpSlot.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
UdpSlot.h Comment to avoid confusion again 2017-05-26 13:46:33 +02:00
UdpStatistic.cpp Renamed macro/constant RDBIDOFFSET to MSG0RDBIDOFFSET (which is what it is) 2017-10-26 13:51:32 +02:00
UdpStatistic.h Rename MsgType.h to msgtype_t.h (to keep things consistent) 2016-11-28 10:48:25 +01:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Removed unused Url::getIp() method 2018-08-27 14:35:30 +02:00
Url.h Removed unused Url::getIp() method 2018-08-27 14:35:30 +02:00
UrlBlockCheck.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
UrlBlockCheck.h Add dump unwanted spiderdb records. Clean up unwanted spiderdb records during merge 2017-08-30 13:19:11 +02:00
UrlComponent.cpp #include cleanup in fctypes.* 2018-08-07 15:44:56 +02:00
UrlComponent.h Make sure we strip session id even if query parameter is separated by '?' 2018-04-12 15:24:55 +02:00
UrlMatch.cpp Add logic for matchpartial, matchsuffix, matchprefix to host, domain, path 2018-07-12 13:40:43 +02:00
UrlMatch.h Add logic for matchpartial, matchsuffix, matchprefix to host, domain, path 2018-07-12 13:40:43 +02:00
UrlMatchList.cpp Remove unused UrlMatchCriterias 2018-07-26 17:38:03 +02:00
UrlMatchList.h Retry when redirected url match urlretryproxylist 2018-05-31 11:04:42 +02:00
urlmatchlist.txt.example Update urlmatchlist.txt.example 2018-07-26 17:44:05 +02:00
UrlParser.cpp Made a const version of getDomainOfIp() to avoid const casts elsewhere 2018-08-31 12:11:16 +02:00
UrlParser.h Add pathparam to UrlMatchList 2018-02-27 11:48:16 +01:00
UrlRealtimeClassification.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
UrlRealtimeClassification.h Split out generic server communication from UrlRealtimeClassification into FxClient 2018-02-27 15:28:12 +01:00
UrlResultOverride.cpp Move reloading of UrlResultOverride to a separate thread 2017-11-02 16:19:44 +01:00
UrlResultOverride.h Move reloading of UrlResultOverride to a separate thread 2017-11-02 16:19:44 +01:00
urlresultoverride.txt.example Add ResultOverride for exact url 2017-10-26 11:09:44 +02:00
utf8_convert.cpp Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works) 2023-12-03 19:52:06 -05:00
utf8_convert.h Add more encodings to dump_wrong_encoding 2018-04-16 11:34:04 +02:00
utf8_fast.cpp Detect a few cases when generating bigram across '.' shouldn't be done 2018-09-04 13:05:24 +02:00
utf8_fast.h Detect a few cases when generating bigram across '.' shouldn't be done 2018-09-04 13:05:24 +02:00
utf8.cpp tokenizer: uise phase-2 tokens 2018-03-13 17:12:42 +01:00
utf8.h tokenizer: uise phase-2 tokens 2018-03-13 17:12:42 +01:00
valgrind.cfg valgrind: Suppress a bunch and memcheck warnings originating from cld3->protobuf. Posisbly inappropriate suppressions but easily deleted when full-run leak diagnostics are wanted 2017-09-12 15:17:33 +02:00
Version.cpp Add print version to tools 2018-04-24 15:39:40 +02:00
Version.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
WantedChecker.cpp wantedcheck shlib: check single content, example with cellery 2017-09-12 16:24:40 +02:00
WantedChecker.h wantedcheck shlib: check single content, example with cellery 2017-09-12 16:24:40 +02:00
WantedCheckerApi.h wantedcheck shlib: check single content, example with cellery 2017-09-12 16:24:40 +02:00
WantedCheckExampleLib.cpp wantedcheck shlib: check single content, example with cellery 2017-09-12 16:24:40 +02:00
Wiki.cpp Size file/pathbuffers correctly 2018-10-09 15:39:31 +02:00
Wiki.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-lang.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-syns.dat when user searches for a word without the 2014-06-01 09:37:00 -07:00
Wiktionary.cpp Size file/pathbuffers correctly 2018-10-09 15:39:31 +02:00
Wiktionary.h More const in Synonyms and Wiktionary 2016-12-22 12:52:27 +01:00
WordVariationsConfig.h hackish implementation for lexicon-based lemmatization 2018-05-25 14:51:56 +02:00
Xml.cpp #include cleanup of Titledb.h 2018-08-31 12:11:16 +02:00
Xml.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
XmlDoc_Indexing.cpp Removed non-const pointer-returning methods from Url class 2018-08-24 13:37:43 +02:00
XmlDoc.cpp Select best of og:title, og_site_name, <title> and meta title 2018-09-06 16:52:15 +02:00
XmlDoc.h Index lemmas only once per document 2018-06-25 16:10:24 +02:00
XmlNode.cpp Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
XmlNode.h Got rid of gb-include.h 2018-07-26 17:29:51 +02:00
zconf.h updated to a new libz64.a. updated zconf.h and 2014-11-17 14:53:15 -08:00

Warning: Do not use this code.

Findx is shutting down. Please read https://privacore.github.io/

Gigablast - an open source search engine

An open source web and enterprise search engine and spider/crawler.

This is a fork of the original Gigablast project available at https://github.com/gigablast/open-source-search-engine/. This version is heavily modified by Privacore, and tailored for our use. It is not a drop-in replacement for the original Gigablast.

Modifications by Privacore

Our aim is not to maintain backwards compatibility with the original Gigablast data files.

Feature Description
Multi-threading Many improvements have been made with regards to multi-threading and general optimizations.
Stability Numerous general bugfixes and major improvements in thread safety.
Data formats Posdb is being changed to store the entries for a page in a single Posdb file, rather than spreading out a the entries across multiple files and merging the data in memory + handling delete keys at query time. A new index file will point to the file containing the newest version of a document.
Spiderdb is modified to use sqlite3 database instead of RDB format.
Data file merging Our version use a dedicated drive for merging, instead of merging + deleting part files on-the-fly on the same data drive. We will create a completely merged file on the merge drive, temporarily make GB use that file for queries, delete the original files, copy the newly merged file back to the 'production drive', switch back query handling to that drive and delete the temporary file. The merge drive must be big enough to hold at least 1 instance's posdb data.
Alerting Start script improved to send alerts if GB crashes (and avoid successive coredumps, but stay down for analysis).
Trace log Lots of options to add very detailed trace log to different parts of the code.
Summaries Improvements in search results summary generation.
Language detection Google's CLD2 library integrated to improve language detection.
Code removed About half of the original source has been removed, e.g. diffbot/eventguru/buzzlogic/seo specific integrations.
Disk space Lots of 'junk' removed from the Posdb data files, reducing space usage significantly. This means that if you use our version with old Gigablast data files, data will not be deleted up correctly when re-indexing a page. You will need to rebuild the Posdb data files.
Ranking Ranking weights made configurable.
... and much more...

Migrating Gigablast to our fork

Step Description
Backup! There, you have been warned..
Build git clone https://github.com/privacore/open-source-search-engine.git
git submodule init
git submodule update
make -j4
make dist
Copy Stop your running GB instances. Copy the files contained in the new gb-[date]-[rev].tar.gz file to your GB instance 0.
Install Go to your GB instance 0 and do a './gb install' to copy the binary and needed files to all instances.
Remove files Remove the posdb files from your collections
Convert files Convert the spiderdb files to sqlite3 format by using './gb convertspiderdb'
Start './gb start' from your instance 0 and you should be on your way.
Rebuild Rebuild the posdb data files through the web UI. This is needed because we store less data in posdb than the original version, and GB cannot clean this 'junk' data up when re-indexing pages.

SUPPORTED PLATFORMS

Primary:

  • Ubuntu 16.04, g++ 5.4.0, Python 2.7.6

Secondary:

  • OpenSuSE 13.2, GCC 4.8.3
  • OpenSuSE 42.2, GCC 6.2.1
  • Fedora 25, GCC 6.3.1

DEPENDENCIES

Compilation

Ubuntu

  • g++
  • make
  • cmake
  • python
  • libpcre3-dev
  • libssl-dev
  • libprotobuf-dev
  • protobuf-compiler
  • libsqlite3-dev

OpenSuse

  • g++
  • make
  • cmake
  • python
  • pcre-devel
  • libssl-dev
  • protobuf-devel
  • libprotobuf13

Fedora

  • g++
  • make
  • cmake
  • python
  • pcre-devel
  • openssl-devel
  • protobuf-devel
  • protobuf-compiler
  • sqlite-devel

Runtime

  • Multi-instance installations require Vagus for keeping track of which instances are dead and alive.

Ubuntu

  • libssl1.0.0
  • libpcre3
  • libprotobuf9v5

RUNNING GIGABLAST

See html/faq.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/faq.html

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

SUPPORT

Privacore does not provide paid support for Gigablast. We refer you to the original project at https://github.com/gigablast/open-source-search-engine/ and the owner Matt Wells. He has a Pro version you can buy which include support options.

We provide limited support for our fork, primarily for active contributors.