antiword-dir
Initial file population.
2013-08-02 13:12:24 -07:00
doxygen
Try with wildcard for doxygen symbols
2016-09-23 22:29:38 +02:00
html
added more options to rank test page
2017-12-15 16:17:36 +01:00
misc
Remove now unused Msg1
2017-04-04 16:22:54 +02:00
sto
lemma: generate lemmas for proper nouns too (normally genetive-case -> unmarked-case)
2018-07-06 14:06:28 +02:00
test
Fix RdbBucketsTest seg fault (missing return statements), call pytest instead of py.test in system test Makefile
2023-12-03 23:55:58 -05:00
third-party
Updat sparsepp to use latest commit
2017-10-23 12:13:55 +02:00
tokenizer
tokenizer compilation fix
2018-07-26 17:31:36 +02:00
tools
Fix compliation of filter_titledb
2018-08-31 12:11:16 +02:00
ucdata
Added missing ucdata/ files and updated .gitignore to not ignore ucdata/*.dat
2018-03-20 14:19:25 +01:00
unicode
Spelling fix unicode readme
2018-07-24 17:21:54 +02:00
word_variations
added missing utf8_fast.h
2018-08-09 20:51:23 +02:00
.clang-format
Add clang format file (initial edition)
2015-12-22 10:30:57 +01:00
.gitignore
Merge branch 'master' into tokenizer
2018-04-20 14:28:47 +02:00
.gitmodules
Add sparsepp as third-party project
2017-09-18 16:22:11 +02:00
.travis.yml
Fix compilation issue on travis
2018-02-14 12:40:55 +01:00
.valgrindrc
Add valgrind rc file for default configuration
2016-05-19 11:00:04 +02:00
Abbreviations.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Abbreviations.h
unicode: Preliminary commit, work-in-progress
2018-02-06 17:06:35 +01:00
adultphrases.txt.example
new adult detection code
2017-10-26 12:20:04 +02:00
adultwords.txt.example
new adult detection code
2017-10-26 12:20:04 +02:00
antiword
fix calls to antiword and pdftohtml etc.
2014-06-15 17:44:52 -07:00
BaseScoringParameters.cpp
Merge branch 'master' into lemma
2018-06-22 16:01:03 +02:00
BaseScoringParameters.h
Merge branch 'master' into lemma
2018-06-22 16:01:03 +02:00
BigFile.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
BigFile.h
Removed last use of O_NONBLOCK in BigFile
2016-12-20 14:21:23 +01:00
BitOperations.h
Add constness
2016-10-07 23:14:37 +02:00
Bits.cpp
bugfix 183b99c925
: Bits::setInLinkBits() was coredumping
2018-09-06 13:11:24 +02:00
Bits.h
const for Bits:.setInLinkBits()
2018-09-04 16:18:29 +02:00
bmptopnm
Initial file population.
2013-08-02 13:12:24 -07:00
browser.py
Add code to run through external filter when the html document have low word count (configurable)
2017-11-09 17:14:17 +01:00
ByteOrderMark.cpp
unicode: Preliminary commit, work-in-progress
2018-02-06 17:06:35 +01:00
ByteOrderMark.h
unicode: Preliminary commit, work-in-progress
2018-02-06 17:06:35 +01:00
Clusterdb.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Clusterdb.h
Made ClusterDb::get*() take properly typed key96_t*
2018-07-02 16:46:48 +02:00
cmpversiongte
Fix GCC version checks. Was resultign int 'syntax error' from bc
2017-09-22 13:58:26 +02:00
Collectiondb.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
Collectiondb.h
Parmaeter for 'dedup URLs by default' (ddud) so we can control it per collection
2018-05-01 16:13:42 +02:00
collnum_t.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
Conf.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Conf.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
ContentMatchList.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
ContentMatchList.h
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
ContentTypeBlockList.cpp
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
ContentTypeBlockList.h
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
control.deb
package bldg updates
2014-06-16 21:50:32 -06:00
ConvertSpiderdb.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
ConvertSpiderdb.h
Added spiderdb->sqlite conversion command
2017-10-09 17:15:58 +02:00
copyright.head
package bldg updates
2014-06-16 21:50:32 -06:00
copyright.tail
package bldg updates
2014-06-16 21:50:32 -06:00
CountryCode.cpp
Initial implementation of CountryLanguage
2018-07-25 13:35:21 +02:00
CountryCode.h
Initial implementation of CountryLanguage
2018-07-25 13:35:21 +02:00
CountryLanguage.cpp
Use host instead of url to get country tld
2018-07-25 14:03:24 +02:00
CountryLanguage.h
Use host instead of url to get country tld
2018-07-25 14:03:24 +02:00
DailyMerge.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
DailyMerge.h
char -> bool in DailyMerge
2016-10-20 11:16:12 +02:00
default.css
Support rebulding spiderdb from titledb documnet URLs only
2017-10-31 15:48:49 +01:00
Dir.cpp
Disable gcc deprecation warnings in places where we can't don anything about it
2018-10-09 15:41:00 +02:00
Dir.h
Remove unused DirIterator
2017-04-06 19:45:27 +02:00
Dns_internals.h
Moved implementation details in Dns out of header file
2016-11-12 17:08:38 +01:00
Dns.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Dns.h
Renamed Dns::getResponsibleHost() to getIPLookupHost()
2016-11-24 16:31:08 +01:00
DnsBlockList.cpp
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
DnsBlockList.h
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
DnsProtocol.h
More #include cleanup in UdpServer.h
2016-09-26 15:37:13 +02:00
DocDelete.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
DocDelete.h
Split up parameter for DocProcess delay as well
2018-04-13 11:54:31 +02:00
Docid2Siteflags.cpp
Assume docid2flagsandsitemap.dat is sorted
2018-07-12 17:05:28 +02:00
Docid2Siteflags.h
Assume docid2flagsandsitemap.dat is sorted
2018-07-12 17:05:28 +02:00
Docid.cpp
Support dynamic TLD list
2018-08-31 13:32:06 +02:00
Docid.h
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
DocProcess.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
DocProcess.h
Add a tmperr file so we can keep track of tmp errors that happen during docprocess
2018-04-13 13:19:17 +02:00
DocRebuild.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
DocRebuild.h
Split up parameter for DocProcess delay as well
2018-04-13 11:54:31 +02:00
DocReindex.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
DocReindex.h
Split up parameter for DocProcess delay as well
2018-04-13 11:54:31 +02:00
DocumentIndexChecker.h
added getFileNum method to DocumentIndexChecker
2017-04-21 17:21:53 +02:00
Doledb.cpp
Added SpiderLoop::nukeWinnerListCache() so Doledb doesn't have to poke directly inside spiderloop
2017-10-20 14:25:07 +02:00
Doledb.h
Nuke doledb periodically to hide the bugs/limitations in the spidering stuff
2017-08-07 15:26:29 +02:00
Domains.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Domains.h
Support dynamic TLD list
2018-08-31 13:32:06 +02:00
DumpSpiderdbSqlite.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
DumpSpiderdbSqlite.h
Readd ./gb dump s command
2018-02-15 17:21:20 +01:00
EGStack.cpp
work-in-progress: new tokenizer
2018-03-01 16:38:19 +01:00
EGStack.h
work-in-progress: new tokenizer
2018-03-01 16:38:19 +01:00
Entities.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Entities.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
entities.json
entities.* fixes
2016-06-09 13:43:43 +02:00
Errno.cpp
Add check for ipblocklist while spidering
2018-02-05 15:23:04 +01:00
Errno.h
Add check for ipblocklist while spidering
2018-02-05 15:23:04 +01:00
fctypes.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
fctypes.h
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
File.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
File.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
FxBlobCache.cpp
Fix clang++ warning: assigning field to itself
2018-02-28 11:19:52 +01:00
FxBlobCache.h
Changed spiderloop:winnerlistcache from RdbCache to FxBlobCache
2017-10-23 16:52:09 +02:00
FxBlobCacheInstantiation.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
FxCheckAdult.cpp
Simplified Phrases class a bit
2018-03-12 16:18:17 +01:00
FxCheckAdult.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
FxCheckSpam.cpp
Simplified Phrases class a bit
2018-03-12 16:18:17 +01:00
FxCheckSpam.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
FxClient.cpp
Disable m_max_outstanding when it's 0 (as described in configuration)
2018-06-06 16:05:45 +02:00
FxClient.h
Add reinitializeSettings to QueryLanguage
2018-04-05 14:01:09 +02:00
FxExplicitKeywords.cpp
Removed debuglog left over in FxExplicitKeywords.cpp
2018-01-19 14:30:43 +01:00
FxExplicitKeywords.h
Support extra keywords/temrs on pages
2018-01-19 14:24:01 +01:00
FxLanguage.cpp
Add bestEffort flag
2018-03-05 17:29:48 +01:00
FxLanguage.h
Add bestEffort flag
2018-03-05 17:29:48 +01:00
FxTermCheckList.cpp
Changed Phrases::getPhraseIds2() to a simpler and looser-coupled getPhraseId()
2018-03-19 10:15:38 +01:00
FxTermCheckList.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
g_hashtab.inc
Use static const g_hashtab instead of calculating/generating it upon each startup
2016-09-20 14:21:03 +02:00
gb-1.0.spec
make it so we don't need --nodeps with
2014-05-25 22:08:46 -04:00
gb.deb.rules
if netpbm pkg already installed use it.
2014-07-06 09:54:28 -07:00
gb.pem
add old gb.pem file, not used by gigablast
2013-10-09 17:37:01 -06:00
GbCache.h
Fix compilation of unit test
2018-07-05 14:38:09 +02:00
gbclean.sh
Add x bit on gbclean.sh
2018-04-04 12:18:03 +02:00
GbCompress.cpp
Moved gbcompress/gbuncompress from XmlDoc.* to separate file
2017-01-09 14:00:34 +01:00
GbCompress.h
Moved gbcompress/gbuncompress from XmlDoc.* to separate file
2017-01-09 14:00:34 +01:00
gbconvert.sh
Add code to run through external filter when the html document have low word count (configurable)
2017-11-09 17:14:17 +01:00
GbCopyFile.cpp
Add checks for when src & dst have different sizes after moving/copying
2017-03-14 15:32:32 +01:00
GbCopyFile.h
Added copyFile() function
2016-10-24 14:52:25 +02:00
GbDns.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
GbDns.h
Delay reload of dns settings until there is no pending requests, and at the same time pause requests
2018-02-08 16:32:55 +01:00
GbEncoding.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
GbEncoding.h
Move getCharsetFast to GbEncoding
2017-06-26 14:16:47 +02:00
GbFormat.h
Moved format #define out from HttpRequest.h so more code doesn't need to know the innaers of HttpRequest
2016-08-31 19:14:16 +02:00
GbMakePath.cpp
bugfix makePath(): was referring to 1 beyond the end
2016-10-24 13:29:02 +02:00
GbMakePath.h
Added makePath() function
2016-10-21 13:21:49 +02:00
gbmemcpy.h
Moved 'gbmemcpy' macro to separate file
2018-07-26 17:01:42 +02:00
GbMoveFile2.cpp
bugfix moveFile2Phase2()
2016-10-25 11:53:09 +02:00
GbMoveFile2.h
Added moveFile2Phase1()/moveFile2Phase2()
2016-10-24 15:15:35 +02:00
GbMoveFile.cpp
Add checks for when src & dst have different sizes after moving/copying
2017-03-14 15:32:32 +01:00
GbMoveFile.h
Added GbMoveFile.*
2016-10-18 13:59:44 +02:00
GbMutex.cpp
Added GbMutex class
2016-08-15 16:24:56 +02:00
GbMutex.h
Added GbMutex class
2016-08-15 16:24:56 +02:00
GbRegex.cpp
Fix clang warning: no newline at end of file
2017-03-27 14:57:20 +02:00
GbRegex.h
Add thin wrapper around pcre.h
2017-02-07 16:38:54 +01:00
GbSignature.cpp
Fix clang++ warning: function 'signature_verification_failed' could be declared with attribute 'noreturn'
2018-02-28 11:26:38 +01:00
GbSignature.h
Fix clang++ warning: function 'signature_verification_failed' could be declared with attribute 'noreturn'
2018-02-28 11:26:38 +01:00
gbstart.sh
Add hostid to eventlog
2018-07-23 14:57:09 +02:00
GbThreadQueue.cpp
Fix unit test after merging
2017-05-08 14:10:10 +02:00
GbThreadQueue.h
Make GbThreadQueue::m_stop atomic
2017-04-25 11:22:36 +02:00
GbUtil.cpp
Fix split function
2018-07-10 16:27:28 +02:00
GbUtil.h
Split string by string delimiter
2018-07-06 14:23:16 +02:00
generate_entities.py
Update generate_entities.py to run in python 3
2023-12-03 19:51:02 -05:00
generate_query_stop_word_languages.sh
Improved qstopword generation + dependencies
2017-12-29 14:56:10 +01:00
generate_query_stop_words.sh
NULL-terminate query-stop-word tables
2017-12-22 15:48:03 +01:00
generate_tld_list.sh
Modifications to tld list
2018-07-17 13:07:35 +02:00
giftopnm
Initial file population.
2013-08-02 13:12:24 -07:00
GigablastRequest.h
Add constructor for GigablastRequest instead of memset which will break a preallocated vector in Msg4
2017-08-30 16:41:37 +02:00
hash.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
hash.h
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
HashTable.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
HashTable.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
HashTableT.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
HashTableT.h
Rename getNumSlotsUsed to getNumUsedSlots to keep things consistent
2017-04-07 15:55:02 +02:00
HashTableX.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
HashTableX.h
memcpy.->memmove
2018-07-26 14:34:06 +02:00
HighFrequencyTermShortcuts.cpp
Fix allocation of working-area in PosdbTable::intersectLists10_r()
2017-06-12 16:14:28 +02:00
HighFrequencyTermShortcuts.h
If a word in a query is a high-freq-term then ignore it if possible
2017-02-13 16:44:34 +01:00
Highlight.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Highlight.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
Hostdb.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Hostdb.h
Size file/pathbuffers correctly
2018-10-09 15:39:31 +02:00
HostFlags.cpp
Move declaration of g_recoveryMode to HostFlags.cpp
2017-05-09 10:57:40 +02:00
HostFlags.h
Keep our own Host::m_flags up-to-date
2017-05-04 14:37:19 +02:00
HttpMime.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
HttpMime.h
Remove getContentTypePrivate and move logic into parseContentType function
2017-11-19 10:25:28 +01:00
HttpRequest.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
HttpRequest.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
HttpServer.cpp
Removed non-const version of Url::getHost()
2018-08-24 13:13:09 +02:00
HttpServer.h
const for getMsgSize()
2017-03-10 11:52:34 +01:00
iana_charset.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
iana_charset.h
Align misc/parse_iana_charsets.pl with modified iana_charset.*
2016-11-11 15:31:23 +01:00
Images.cpp
Removed non-const pointer-returning methods from Url class
2018-08-24 13:37:43 +02:00
Images.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
init.gb.conf
minor make install changes
2014-05-22 18:46:38 -07:00
InstanceInfoExchange.cpp
Log an all-clear message once when all instances are alive
2017-08-11 14:50:24 +02:00
InstanceInfoExchange.h
Removed parameter from InstanceInfoExchange::weAreAlive()
2017-05-02 11:57:22 +02:00
IOBuffer.h
swap(IOBuffer,IOBuffer) should be inline
2017-05-01 15:58:37 +02:00
ip.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
ip.h
Removed non-renetrant version of iptoa()
2017-05-10 17:54:00 +02:00
IPAddressChecks.cpp
bugfix is_internal_net_ip/is_trusted_protocol_ip (harmless)
2017-01-30 11:24:29 +01:00
IPAddressChecks.h
Revert "removed duplicate function is_trusted_protocol_ip from IPAddressChecks"
2016-10-20 15:34:20 +02:00
IpBlockList.cpp
Fix bug where wrong function name is used to overload parent function
2018-06-21 11:02:59 +02:00
IpBlockList.h
Missed in previous commit
2018-06-21 11:08:53 +02:00
Jenkinsfile
Archive gbclean.sh script as well
2018-04-04 12:06:33 +02:00
JobScheduler.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
JobScheduler.h
Add http interface to docprocess for ease of testing
2017-12-01 17:59:55 +01:00
jpegtopnm
Initial file population.
2013-08-02 13:12:24 -07:00
Json.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Json.h
#include cleanup in Json.*
2018-07-20 16:03:46 +02:00
Lang.cpp
Cater for languages not in gb (eg: nynorsk/bokmaal)
2018-03-09 10:24:22 +01:00
Lang.h
Use lang_t enum more than just plain uint8_t
2017-11-21 16:18:45 +01:00
LanguageResultOverride.cpp
Add ResultOverride for exact url
2017-10-26 11:09:44 +02:00
LanguageResultOverride.h
Fix clang++ warning: 'ResultOverride' defined as a class here but previously declared as a struct
2018-02-28 11:30:06 +01:00
Lemma.cpp
Only load sto lexicon once
2018-07-17 13:49:53 +02:00
Lemma.h
Only load sto lexicon once
2018-07-17 13:49:53 +02:00
Lexicons.cpp
support older version of unique_ptr<>
2018-07-17 14:32:41 +02:00
Lexicons.h
Only load sto lexicon once
2018-07-17 13:49:53 +02:00
libiconv64.a
added 64 bit libiconv64.a
2014-11-14 17:34:11 -08:00
libiconv.a
Initial file population.
2013-08-02 13:12:24 -07:00
libiconv.la
Initial file population.
2013-08-02 13:12:24 -07:00
libjpeg.so.62
thumbnail generation support back in.
2014-04-24 10:13:45 -07:00
libnetpbm.so.10
thumbnail generation support back in.
2014-04-24 10:13:45 -07:00
libpng12.so.0
thumbnail generation support back in.
2014-04-24 10:13:45 -07:00
libtiff.so.4
thumbnail generation support back in.
2014-04-24 10:13:45 -07:00
LICENSE
License update
2017-10-26 10:31:55 +02:00
Linkdb.cpp
#include cleanup in linkspam.h
2018-07-20 16:12:23 +02:00
Linkdb.h
Removed non-const pointer-returning methods from Url class
2018-08-24 13:37:43 +02:00
linkspam.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
linkspam.h
#include cleanup in linkspam.h
2018-07-20 16:12:23 +02:00
Log.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
Log.h
Add logHexTrace
2017-10-11 11:23:57 +02:00
Loop.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Loop.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
main.cpp
Revert "Fixed long time bug regarding startup on clean DB"
2018-10-12 15:39:30 +02:00
Makefile
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
matches2.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
matches2.h
Add const
2017-08-17 11:04:28 +02:00
Matches.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
Matches.h
Fixed signed/unsigned comparison resulting from tokenizer known there cannot be negativer number of tokens
2018-03-19 15:33:26 +01:00
MatchList.cpp
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
MatchList.h
Add g_contentRetryProxyList & rename BlockList to MatchList
2018-05-31 12:44:20 +02:00
max_coll_len.h
Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files
2016-11-12 20:44:42 +01:00
max_hosts.h
increased max hosts from 512 to 768
2017-01-25 14:18:28 +01:00
max_niceness.h
Moved MAX_NICENESS from Threads.h to a separate max_niceness.h file
2016-04-28 14:28:14 +02:00
max_url_len.h
Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files
2016-11-12 20:44:42 +01:00
max_words.h
#include cleanup in Phrases.*
2016-08-05 14:31:28 +02:00
Mem.cpp
Disable gcc deprecation warnings in places where we can't don anything about it
2018-10-09 15:41:00 +02:00
Mem.h
Made Mem more private
2017-02-19 16:41:55 +01:00
MemoryMappedFile.cpp
Renamed MemoryMappedFile.cc to MemoryMappedFile.cpp (for consistent extensions)
2018-07-20 15:59:06 +02:00
MemoryMappedFile.h
Assume docid2flagsandsitemap.dat is sorted
2018-07-12 17:05:28 +02:00
MergeSpaceCoordinator.cpp
Use a thread name for mergecoordinator-hold-lock thread
2017-10-09 12:14:46 +02:00
MergeSpaceCoordinator.h
Check disk free bytes in mergespacecoordinator
2016-10-18 15:23:27 +02:00
Msg0.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Msg0.h
Encapsulate Msg0 better
2017-11-03 15:30:32 +01:00
Msg2.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Msg2.h
Tracked down one source of double callback calls
2016-08-30 16:37:53 +02:00
Msg3.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Msg3.h
Use fileId instead of fileNum (which is an index info m_fileInfo)
2017-05-10 16:25:46 +02:00
Msg3a.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Msg3a.h
Fix race condition in Msg3a
2018-05-22 14:28:46 +02:00
Msg4In.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Msg4In.h
Move Msg4In related functions to Msg4In namespace
2017-04-26 11:25:29 +02:00
Msg4Out.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Msg4Out.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
Msg5.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Msg5.h
Fix merge list when posdb is merging
2018-01-05 11:21:51 +01:00
Msg13.cpp
Moved serialization/deserialization functions to separate file
2018-07-30 13:07:02 +02:00
Msg13.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
Msg20.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Msg20.h
Remove hop count (not stored in sqlite based spiderdb)
2018-02-02 15:50:03 +01:00
Msg22.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Msg22.h
Removed unused Msg22::getAvailDocIdOnly()
2018-02-12 15:08:51 +01:00
Msg25.cpp
Removed non-const pointer-returning methods from Url class
2018-08-24 13:37:43 +02:00
Msg25.h
Remove unused Msg25::m_adBanTable
2018-07-04 11:48:24 +02:00
Msg39.cpp
Moved serialization/deserialization functions to separate file
2018-07-30 13:07:02 +02:00
Msg39.h
Removed Msg39Request::m_stripe
2018-04-26 16:16:10 +02:00
Msg40.cpp
Removed obsolete comment about gigabits
2018-10-04 15:41:24 +02:00
Msg40.h
Removed unused Msg40::printSearchResult9()
2018-01-29 16:27:53 +01:00
Msg51.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Msg51.h
Removed the msg5* parameter to Msg0::getList()
2017-11-03 15:13:08 +01:00
MsgC.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
MsgC.h
Make MsgC/Msge1 coorperation more clear
2016-11-24 15:33:16 +01:00
Msge0.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Msge0.h
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
Msge1.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
Msge1.h
Remove declared but not defined functions
2017-07-11 15:00:12 +02:00
msgtype_t.h
Dropped PingInfo + PingServer entierely.
2017-05-05 15:00:31 +02:00
Multicast.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Multicast.h
Fix Msg0/Multicast to handle reads from spiderdb when some hosts have spidering disabled
2017-10-26 14:22:18 +02:00
mysynonyms.txt
mysyn fixes
2015-04-22 08:34:29 -06:00
nodeid_t.h
MOved BACKBIT/BACKBITCOMP from Words.h to nodeid_t.h
2018-03-02 17:36:31 +01:00
PageAddColl.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageAddUrl.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageBasic.cpp
Resurrect /admin/status (it is useful for automated tests)
2018-01-12 15:24:24 +01:00
PageCrawlBot.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
PageCrawlBot.h
Remove diffbot specific CollectionRec::m_notifyUrl
2016-07-22 13:59:55 +02:00
PageDocProcess.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageDoledbIPTable.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageGet.cpp
Removed 'strip' parameter from stripHtml()
2018-08-07 14:59:12 +02:00
PageHealthCheck.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageHosts.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageInject.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
PageInject.h
Made setInjectionRequestFromParms() local to PageInject.cpp
2018-07-30 11:12:04 +02:00
PageLinkdbLookup.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
PageParser.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageParser.h
Remove unused /print api & cleanup PageParser
2018-05-23 14:15:36 +02:00
PagePerf.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageReindex.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
PageReindex.h
Use lang_t enum more than just plain uint8_t
2017-11-21 16:18:45 +01:00
PageResults.cpp
Removed non-functional ban-all-these-domains functionality in PageResults
2018-08-30 16:49:50 +02:00
PageResults.h
Use multiple/all langauge weigths returned by the external query-language-server
2018-05-15 16:30:52 +02:00
PageRoot.cpp
Removed freestanding dequote()
2018-08-07 15:19:40 +02:00
PageRoot.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Pages.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Pages.h
Remove unused variable
2018-03-05 11:30:39 +01:00
PageSockets.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageSpider.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
PageSpiderdbLookup.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
PageStats.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
PageTemperatureRegistry.cpp
use mmap'ping for page temperatures, and exploit that it is sorted
2018-07-13 13:41:45 +02:00
PageTemperatureRegistry.h
use mmap'ping for page temperatures, and exploit that it is sorted
2018-07-13 13:41:45 +02:00
PageThreads.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PageTitledb.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Parms.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
Parms.h
New site median page temperature client & settings
2018-06-15 11:38:32 +02:00
Phrases.cpp
bugfix/workaround for bigram hashes
2018-08-02 13:14:44 +02:00
Phrases.h
Removed obsolete, incorrect or unuseful comments from Phrases.h
2018-03-19 10:15:38 +01:00
pngtopnm
Initial file population.
2013-08-02 13:12:24 -07:00
pnmscale
Initial file population.
2013-08-02 13:12:24 -07:00
Pops.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Pops.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Pos.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Pos.h
Fixed signed/unsigned comparison resulting from tokenizer known there cannot be negativer number of tokens
2018-03-19 15:33:26 +01:00
Posdb.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Posdb.h
Merge branch 'master' into lemma
2018-06-25 15:36:39 +02:00
PosdbTable.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
PosdbTable.h
Changed posdbtable to use enum lang_t
2018-05-03 12:00:25 +02:00
ppmtojpeg
Initial file population.
2013-08-02 13:12:24 -07:00
Process.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
Process.h
Add logging when loop callback hit time threshold. Remove some unused function, remove undefined function (only defined in header)
2017-05-30 12:12:32 +02:00
Profiler.cpp
Made Profiler.cpp compile with newer glibc
2018-10-09 14:46:21 +02:00
Profiler.h
Got rid of PTRFMT/PTRTYPE (except in Mem.cpp), and use %p instead
2018-07-23 15:24:16 +02:00
Proxy.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
Proxy.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
pstotext
Initial file population.
2013-08-02 13:12:24 -07:00
Punycode.cpp
Start to detect non-asci urls and encode them to ascii.
2015-09-12 15:47:33 -06:00
Punycode.h
Standardize header guards
2016-03-08 22:16:02 +01:00
query_stop_words.da.txt
Extended query-stop-words danish with interrogative relative pronouns and query-general-adverbs
2017-12-29 15:58:34 +01:00
query_stop_words.de.txt
MOved query-stop-words arrays out from StopWords.cpp to textfiles and generate the static arrays at build time
2017-12-22 15:43:59 +01:00
query_stop_words.en.txt
MOved query-stop-words arrays out from StopWords.cpp to textfiles and generate the static arrays at build time
2017-12-22 15:43:59 +01:00
query_stop_words.xx.txt
Split out danish query-stop-words
2017-12-29 15:04:15 +01:00
Query.cpp
Closed off Bits member access so we know what can be changed freely
2018-09-04 14:50:38 +02:00
Query.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
QueryLanguage.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
QueryLanguage.h
qlangserver: return weights as doubles instead of ints
2018-05-15 15:09:51 +02:00
Rdb.cpp
Size file/pathbuffers correctly
2018-10-09 15:39:31 +02:00
Rdb.h
Moved Rdb initialization from main to Rdb.cpp in new function initialiseAllPrimaryRdbs()
2017-09-26 16:29:49 +02:00
RdbBase.cpp
Size file/pathbuffers correctly
2018-10-09 15:39:31 +02:00
RdbBase.h
Added RdbBase::unlink()
2017-10-30 14:02:28 +01:00
RdbBuckets.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbBuckets.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
RdbCache.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
RdbCache.h
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
RdbDump.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbDump.h
Move addList to a thread for RdbDump
2017-06-09 15:02:18 +02:00
rdbid_t.h
Send site-default-pagem-temperature from spidering to query-host
2018-02-22 15:07:08 +01:00
RdbIndex.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
RdbIndex.h
Uhm, dependencies didn't catch the compilation error?
2017-10-06 14:51:14 +02:00
RdbIndexQuery.cpp
Cater for newly dumped file in a different way while merging list
2017-05-31 16:33:23 +02:00
RdbIndexQuery.h
Cater for newly dumped file in a different way while merging list
2017-05-31 16:33:23 +02:00
RdbList.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
RdbList.h
Fix merge list when posdb is merging
2018-01-05 11:21:51 +01:00
RdbMap.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
RdbMap.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbMem.cpp
Fix clang++ warning: no newline at end of file
2018-02-28 11:19:52 +01:00
RdbMem.h
Add lock to RdbMem
2017-05-11 14:10:11 +02:00
RdbMerge.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbMerge.h
Fix typo in variable
2017-06-06 15:19:34 +02:00
RdbScan.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbScan.h
Removed unused 'allowPageCache' parameter to RdbScan::setRead()
2017-01-29 01:23:39 +01:00
RdbTree.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
RdbTree.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
README.md
Update README.md
2018-11-08 13:40:03 +01:00
Rebalance.cpp
#include cleanup in Rebalance.*
2018-07-20 16:05:57 +02:00
Rebalance.h
#include cleanup in Rebalance.*
2018-07-20 16:05:57 +02:00
repair_mode.h
Use repair_mode_t instead of magic constants
2017-01-02 14:48:04 +01:00
Repair.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Repair.h
Support rebulding spiderdb from titledb documnet URLs only
2017-10-31 15:48:49 +01:00
ResultOverride.cpp
Add custom title/summary for results that are blocked by robots.txt
2017-10-24 10:41:19 +02:00
ResultOverride.h
Add custom title/summary for results that are blocked by robots.txt
2017-10-24 10:41:19 +02:00
RobotRule.cpp
Fix cppcheck warning -> Array index 'needlePos' is used before limits check
2016-08-08 14:23:37 +02:00
RobotRule.h
Normalize url. Don't encode character which are not suppose to be encoded.
2016-05-10 16:15:31 +02:00
Robots.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
Robots.h
Code style changes
2016-12-02 16:29:31 +01:00
RobotsBlockedResultOverride.cpp
Initialize m_loading
2017-11-03 23:22:50 +01:00
RobotsBlockedResultOverride.h
Move reloading of RobotsBlockedResultOverride to a separate thread
2017-11-02 16:19:44 +01:00
robotsblockedresultoverride.txt
Add custom title/summary for results that are blocked by robots.txt
2017-10-24 10:41:19 +02:00
RobotsCheckList.cpp
Move reloading of RobotsCheckList to a separate thread
2017-11-02 16:19:44 +01:00
RobotsCheckList.h
Move reloading of RobotsCheckList to a separate thread
2017-11-02 16:19:44 +01:00
runCoverityAnalysis.sh
Don't cat logs anymore
2017-03-22 13:47:47 +01:00
runSonarQubeAnalysis.sh
Add coverity scan to travis
2016-11-08 15:59:24 +01:00
S99gb
added S99gb for loading at boot.
2014-06-23 07:32:38 -06:00
SafeBuf.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
SafeBuf.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Sanity.cpp
Move g_errno_init into g_errno_location itself
2016-07-28 11:23:30 +02:00
Sanity.h
pclint plus beta 9 knows about [[noreturn]]
2017-01-08 14:11:10 +01:00
ScalingFunctions.cpp
Added scale_logarithmically()
2017-04-28 15:26:52 +02:00
ScalingFunctions.h
Added scale_logarithmically()
2017-04-28 15:26:52 +02:00
ScopedLock.h
Added GbMutex class
2016-08-15 16:24:56 +02:00
ScoringWeights.cpp
Merge branch 'master' into lemma
2018-06-22 16:01:03 +02:00
ScoringWeights.h
Merge branch 'master' into lemma
2018-06-22 16:01:03 +02:00
SearchInput.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
SearchInput.h
Use multiple/all langauge weigths returned by the external query-language-server
2018-05-15 16:30:52 +02:00
Sections.cpp
Closed off Bits member access so we know what can be changed freely
2018-09-04 14:50:38 +02:00
Sections.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
Serialize.cpp
Moved serialization/deserialization functions to separate file
2018-07-30 13:07:02 +02:00
Serialize.h
Moved serialization/deserialization functions to separate file
2018-07-30 13:07:02 +02:00
SiteGetter.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
SiteGetter.h
#include cleanup in SiteGetter.h
2018-07-20 16:08:27 +02:00
sitelinks.txt
fixed missing sites in sitelinks.txt
2015-03-05 20:32:01 -08:00
SiteMedianPageTemperature.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
SiteMedianPageTemperature.h
New site median page temperature client & settings
2018-06-15 11:38:32 +02:00
SiteMedianPageTemperatureRegistry.cpp
bugfix SiteMedianPageTemperatureRegistry::add(), incorrect condition in if / wrong use of iterator
2018-04-06 13:22:01 +02:00
SiteMedianPageTemperatureRegistry.h
Reworked SiteMedianPageTemperatureRegistry to how it will work in the future
2018-02-16 17:09:18 +01:00
SiteNumInlinks.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
SiteNumInlinks.h
Initial implementation of SiteNumInlinks client
2018-05-25 16:48:34 +02:00
sonar-project.properties
Add output directory for sonarqube build-wrapper
2016-11-07 12:22:18 +01:00
sort.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
sort.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Speller.cpp
Removed check for apporved checksum in Speller.cpp and Wiktionary.cpp
2018-04-24 12:28:54 +02:00
Speller.h
#include cleanup in HashTableX.h
2018-07-20 16:29:06 +02:00
spider_status_t.h
Made spider status ¤defines into strongly typed enum
2017-09-26 13:29:36 +02:00
Spider.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
Spider.h
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
SpiderCache.cpp
Got rid of PTRFMT/PTRTYPE (except in Mem.cpp), and use %p instead
2018-07-23 15:24:16 +02:00
SpiderCache.h
Uhm, dependencies didn't catch the compilation error?
2017-10-06 14:51:14 +02:00
SpiderColl.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
SpiderColl.h
Run SpiderdbRdbSqliteBridge::getList in a thread
2018-03-19 10:15:38 +01:00
SpiderdbRdbSqliteBridge.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
SpiderdbRdbSqliteBridge.h
Split out logic used for waiting tree list into SpiderdbRdbSqliteBridge::getFirstIps and use SELECT DISTINCT instead. (Don't remove multi ip select this time)
2018-02-22 17:37:22 +01:00
SpiderdbSqlite.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
SpiderdbSqlite.h
Make sure the we have spiderdb sqlite if we still have spiderdb rdb files
2018-02-16 11:23:08 +01:00
SpiderdbUtil.cpp
Added ScopedSqlitedbLock
2017-11-03 13:58:42 +01:00
SpiderdbUtil.h
Deleted unwanted spiderrequests as we scan through them
2017-10-27 14:17:53 +02:00
SpiderLoop.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
SpiderLoop.h
Changed spiderloop:winnerlistcache from RdbCache to FxBlobCache
2017-10-23 16:52:09 +02:00
SpiderProxy.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
SpiderProxy.h
#include cleanup in Msg13.*
2016-11-11 15:51:30 +01:00
Statistics.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
Statistics.h
Add Url validity check (currently disabled)
2017-09-26 22:25:19 +02:00
Stats.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
Stats.h
Removed PingInfo::m_socketsClosedFromHittingLimit
2017-05-01 12:08:24 +02:00
StopWords.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
StopWords.h
#include cleanup of StopWords.h
2016-08-11 16:43:25 +02:00
Summary.cpp
Fix typos in comment
2018-09-06 14:42:00 +02:00
Summary.h
Optimized avoid-cookies-warning in summary generation
2018-08-31 16:11:29 +02:00
SummaryCache.cpp
Sync lock changes from nomerge2 to master
2017-03-24 14:05:44 +01:00
SummaryCache.h
Sync lock changes from nomerge2 to master
2017-03-24 14:05:44 +01:00
Synonyms.cpp
bugfix/workaround for bigram hashes
2018-08-02 13:14:44 +02:00
Synonyms.h
tokenizer: first shot at somethign that appears to work
2018-03-09 16:24:39 +01:00
Tagdb.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
Tagdb.h
Removed non-const version of Url::getHost()
2018-08-24 13:13:09 +02:00
TcpServer.cpp
Use openssl's ERR_remove_thread_state() or nothing at all depending on version
2018-10-09 15:40:23 +02:00
TcpServer.h
Remove commented out code
2017-11-16 17:19:43 +01:00
TcpSocket.h
Make sure EDOCBADCONTENTTYPE doesn't return as EDOCTOOBIG
2017-11-20 15:22:47 +01:00
termid_mask.h
Split our TERMID_MASK definition
2016-09-06 12:07:02 +02:00
tifftopnm
Initial file population.
2013-08-02 13:12:24 -07:00
Title.cpp
Select best of og:title, og_site_name, <title> and meta title
2018-09-06 16:52:15 +02:00
Title.h
Select best of og:title, og_site_name, <title> and meta title
2018-09-06 16:52:15 +02:00
Titledb.cpp
Added w more start-up tests for Url::getDomain() and getDomFast()
2018-08-23 15:44:26 +02:00
Titledb.h
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
TitleRecVersion.h
Make sure we strip session id even if query parameter is separated by '?'
2018-04-12 15:24:55 +02:00
TitleSummaryCodepointFilter.h
Moved isUtf8UnwantedSymbols() to separate header
2018-02-03 20:58:45 +01:00
tlds-additional-2nd-level-domains.txt
Modifications to tld list
2018-07-17 13:07:35 +02:00
tlds-alpha-by-domain.txt
Moved hardcoded TLD list from Doamins.cpp to external text files
2018-04-20 13:43:49 +02:00
tlds-official-2nd-level-domains.txt
Modifications to tld list
2018-07-17 13:07:35 +02:00
TopTree.cpp
Moved Titledb::...ProbableDocId... methods to separate namespace
2018-08-31 12:11:16 +02:00
TopTree.h
Cleanup after gbsortbyint/gbrevsortbyint are no longer supported
2018-01-12 15:04:24 +01:00
types.h
Moved collnum_t type to separate header file
2017-10-06 14:32:45 +02:00
UdpProtocol.h
Use key96_t instead of key_t and redefining std lib key_t (which breaks std lib functionality that uses key_t)
2016-09-02 14:49:06 +02:00
UdpServer.cpp
Removed local/global time distinction
2018-08-07 14:38:37 +02:00
UdpServer.h
Add sanity for UdpSlot, message size shouldn't change across datagrams
2017-05-31 15:01:19 +02:00
UdpSlot.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
UdpSlot.h
Comment to avoid confusion again
2017-05-26 13:46:33 +02:00
UdpStatistic.cpp
Renamed macro/constant RDBIDOFFSET to MSG0RDBIDOFFSET (which is what it is)
2017-10-26 13:51:32 +02:00
UdpStatistic.h
Rename MsgType.h to msgtype_t.h (to keep things consistent)
2016-11-28 10:48:25 +01:00
unifiedDict.txt
Initial file population.
2013-08-02 13:12:24 -07:00
Url.cpp
Removed unused Url::getIp() method
2018-08-27 14:35:30 +02:00
Url.h
Removed unused Url::getIp() method
2018-08-27 14:35:30 +02:00
UrlBlockCheck.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
UrlBlockCheck.h
Add dump unwanted spiderdb records. Clean up unwanted spiderdb records during merge
2017-08-30 13:19:11 +02:00
UrlComponent.cpp
#include cleanup in fctypes.*
2018-08-07 15:44:56 +02:00
UrlComponent.h
Make sure we strip session id even if query parameter is separated by '?'
2018-04-12 15:24:55 +02:00
UrlMatch.cpp
Add logic for matchpartial, matchsuffix, matchprefix to host, domain, path
2018-07-12 13:40:43 +02:00
UrlMatch.h
Add logic for matchpartial, matchsuffix, matchprefix to host, domain, path
2018-07-12 13:40:43 +02:00
UrlMatchList.cpp
Remove unused UrlMatchCriterias
2018-07-26 17:38:03 +02:00
UrlMatchList.h
Retry when redirected url match urlretryproxylist
2018-05-31 11:04:42 +02:00
urlmatchlist.txt.example
Update urlmatchlist.txt.example
2018-07-26 17:44:05 +02:00
UrlParser.cpp
Made a const version of getDomainOfIp() to avoid const casts elsewhere
2018-08-31 12:11:16 +02:00
UrlParser.h
Add pathparam to UrlMatchList
2018-02-27 11:48:16 +01:00
UrlRealtimeClassification.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
UrlRealtimeClassification.h
Split out generic server communication from UrlRealtimeClassification into FxClient
2018-02-27 15:28:12 +01:00
UrlResultOverride.cpp
Move reloading of UrlResultOverride to a separate thread
2017-11-02 16:19:44 +01:00
UrlResultOverride.h
Move reloading of UrlResultOverride to a separate thread
2017-11-02 16:19:44 +01:00
urlresultoverride.txt.example
Add ResultOverride for exact url
2017-10-26 11:09:44 +02:00
utf8_convert.cpp
Get compiling on Fedora 39 (bunch of missing includes, might be a better way to clean these up but at this point just trying to get to compile to see how well this thing works)
2023-12-03 19:52:06 -05:00
utf8_convert.h
Add more encodings to dump_wrong_encoding
2018-04-16 11:34:04 +02:00
utf8_fast.cpp
Detect a few cases when generating bigram across '.' shouldn't be done
2018-09-04 13:05:24 +02:00
utf8_fast.h
Detect a few cases when generating bigram across '.' shouldn't be done
2018-09-04 13:05:24 +02:00
utf8.cpp
tokenizer: uise phase-2 tokens
2018-03-13 17:12:42 +01:00
utf8.h
tokenizer: uise phase-2 tokens
2018-03-13 17:12:42 +01:00
valgrind.cfg
valgrind: Suppress a bunch and memcheck warnings originating from cld3->protobuf. Posisbly inappropriate suppressions but easily deleted when full-run leak diagnostics are wanted
2017-09-12 15:17:33 +02:00
Version.cpp
Add print version to tools
2018-04-24 15:39:40 +02:00
Version.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
WantedChecker.cpp
wantedcheck shlib: check single content, example with cellery
2017-09-12 16:24:40 +02:00
WantedChecker.h
wantedcheck shlib: check single content, example with cellery
2017-09-12 16:24:40 +02:00
WantedCheckerApi.h
wantedcheck shlib: check single content, example with cellery
2017-09-12 16:24:40 +02:00
WantedCheckExampleLib.cpp
wantedcheck shlib: check single content, example with cellery
2017-09-12 16:24:40 +02:00
Wiki.cpp
Size file/pathbuffers correctly
2018-10-09 15:39:31 +02:00
Wiki.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
wikititles.txt.part1
Initial file population.
2013-08-02 13:12:24 -07:00
wikititles.txt.part2
Initial file population.
2013-08-02 13:12:24 -07:00
wiktionary-buf.txt
when user searches for a word without the
2014-06-01 09:37:00 -07:00
wiktionary-lang.txt
when user searches for a word without the
2014-06-01 09:37:00 -07:00
wiktionary-syns.dat
when user searches for a word without the
2014-06-01 09:37:00 -07:00
Wiktionary.cpp
Size file/pathbuffers correctly
2018-10-09 15:39:31 +02:00
Wiktionary.h
More const in Synonyms and Wiktionary
2016-12-22 12:52:27 +01:00
WordVariationsConfig.h
hackish implementation for lexicon-based lemmatization
2018-05-25 14:51:56 +02:00
Xml.cpp
#include cleanup of Titledb.h
2018-08-31 12:11:16 +02:00
Xml.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
XmlDoc_Indexing.cpp
Removed non-const pointer-returning methods from Url class
2018-08-24 13:37:43 +02:00
XmlDoc.cpp
Select best of og:title, og_site_name, <title> and meta title
2018-09-06 16:52:15 +02:00
XmlDoc.h
Index lemmas only once per document
2018-06-25 16:10:24 +02:00
XmlNode.cpp
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
XmlNode.h
Got rid of gb-include.h
2018-07-26 17:29:51 +02:00
zconf.h
updated to a new libz64.a. updated zconf.h and
2014-11-17 14:53:15 -08:00