privacore-open-source-search-engine

Forks/privacore-open-source-search-engine

forked from Mirrors/privacore-open-source-search-engine

Author	SHA1	Message	Date
Ivan Skytte Jørgensen	8501e6a8bc	Moved is-url-blocked check to separate file	2017-08-24 15:12:04 +02:00
Ivan Skytte Jørgensen	a0b00870bc	Renamed UrlBlock* to UrlMatch*	2017-08-24 14:46:02 +02:00
Brian Rasmusson	9ac463a2c1	aaaand changed to actually use the new Titledb trace option	2017-05-23 12:05:51 +02:00
Brian Rasmusson	14ff90b147	changed Titledb log to trace log and made Titledb trace log configurable	2017-05-23 11:07:51 +02:00
Brian Rasmusson	6c30ebdc6a	Added more checks for corrupted titledb records	2017-05-21 23:22:07 +02:00
Ai Lin Chia	b6c0cc4535	Remove commented out code	2017-05-09 16:30:42 +02:00
Ai Lin Chia	e797620783	Remove unused allowPageCache	2017-05-08 16:01:19 +02:00
Ai Lin Chia	5d83be6c4a	Remove unused maxCacheAge from Msg5	2017-05-08 16:01:19 +02:00
Ai Lin Chia	63796110ae	Remove unused retryNum from Msg5::getList	2017-05-08 16:01:19 +02:00
Ai Lin Chia	1c8be52e78	Remove unused cacheKeyPtr from Msg5::getList	2017-05-08 16:01:19 +02:00
Ai Lin Chia	22a9893419	Reduce Msg5::getList to only one interface	2017-05-08 16:01:19 +02:00
Ivan Skytte Jørgensen	a20cc2b93f	Dropped unused 'syncPoint' parameter to Msg5::getList()	2017-05-08 12:59:06 +02:00
Ai Lin Chia	b854390ef4	Remove collectionless flag from Rdb::init	2017-03-27 23:12:18 +02:00
Ai Lin Chia	02dbdfddef	Fix clang warning: no newline at end of file	2017-03-27 14:57:20 +02:00
Ai Lin Chia	224bd301b3	Move titledb init variable to static function	2017-03-13 12:12:47 +01:00
Ai Lin Chia	978988373c	Rename spider-dedup threads to merge-filter. Add filterTitledbList (uncalled) to filter based on urlblocklist	2017-03-06 16:17:49 +01:00
Ivan Skytte Jørgensen	106b0e2baa	More constness in Titledb.*	2016-11-28 14:56:03 +01:00
Ivan Skytte Jørgensen	5af39f1358	Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files So not most files have to include Collectiondb.h and Url.h	2016-11-12 20:44:42 +01:00
Ivan Skytte Jørgensen	aba937780d	Stop #including Conf.h from header files	2016-11-12 20:24:20 +01:00
Ivan Skytte Jørgensen	504845bc9b	Remved default values for parametres to Rdb::init()	2016-11-07 16:01:28 +01:00
Ivan Skytte Jørgensen	5825378da5	Use merge-space while merging No more BigFile .part* deletion during a merge to preserve disk space. Instead MergeSpaceCoordinator is used for coordinating access to a large and possibly cheap storage with room for a whole resulting mergefile. When a mrge file has been finished the reads are allowed from that and reads from the source files disallowed, which are then deleted. Then the file is renamed/moved from merge-space to regular collection storage using the 2-phase commit feature of GbMoveFile.cpp, and finally reads are done from the finished file. Details: RdbBase: Use MergeSpaceCoordinator and merge space for temporary target merge file. RdbBase: better cleanup of crashed merges RdbBase: more mutex locing while manipulatin m_fileInfo array RdbBase: keep track of thraeds/jobs RdbMerge: ditto RdbMerge: Dont call file->chopHead() Msg5/Msg3: no more "compensate for merge" flag Msg3: Skip over RdbBase files that have reads disallowed	2016-10-31 18:16:40 +01:00
Brian Rasmusson	d1060ad62e	meh..	2016-10-21 22:37:13 +02:00
Brian Rasmusson	06232ea970	removed var reassignment without use in Titledb::init	2016-10-21 21:51:10 +02:00
Ivan Skytte Jørgensen	3e4613cbef	Removed 'dir' parameter from Rdb::init() 'dir' parameter was only used for a sanity-check. All callers specified g_hostdb.m_dir; and RdbBase et al uses g_hostdb.m_dir directly so there wasn't much point in keeping that parameter	2016-10-17 13:05:19 +02:00
Ivan Skytte Jørgensen	17fdf98102	Removed isTitledb parameter to Rdb::init()	2016-10-03 17:27:24 +02:00
Ivan Skytte Jørgensen	4e0a489629	Don't call g_jobScheduler.disallow_new_jobs() anymore in *db::verify() when just using null callbacks to msg5::getlist()	2016-09-22 16:44:38 +02:00
Ai Lin Chia	a040a78c99	Use key96_t instead of key_t and redefining std lib key_t (which breaks std lib functionality that uses key_t)	2016-09-02 14:49:06 +02:00
Ai Lin Chia	af48ba7e17	Remove always true dedup from Rdb::init	2016-08-31 11:13:31 +02:00
Ai Lin Chia	b0d66c7eb4	Remove always true isTreeBalanced from Rdb::init	2016-08-31 11:13:31 +02:00
Ai Lin Chia	01c655dd5b	Remove unused loadFromDiskCache & pc from Rdb::init	2016-08-31 11:13:31 +02:00
Ai Lin Chia	230e552393	Remove unused maxCacheMem & maxCacheNodes from Rdb::init	2016-08-31 11:13:31 +02:00
Ivan Skytte Jørgensen	e6f510c594	Remvoed Msg5::m_addToCache (and associated parameter in getList()	2016-08-04 12:36:38 +02:00
Ai Lin Chia	4e4e3371e6	Log function will now return void instead of a boolean	2016-08-01 18:12:10 +02:00
Ivan Skytte Jørgensen	8e4254f52c	Removed default parameter values from Msg5::getList() (variant #2 )	2016-08-01 13:49:48 +02:00
Ai Lin Chia	f982c35079	Remove EURLHASNOIP. g_errno is never set to EURLHASNOIP	2016-06-29 16:39:51 +02:00
Ivan Skytte Jørgensen	8c4f30cba6	Fix const-string	2016-06-28 14:39:50 +02:00
Brian Rasmusson	f3f5eefcb6	First batch of changes streamlining emergency shutdown code	2016-06-20 12:30:26 +02:00
Ai Lin Chia	d40ecb2f8e	Replace INT32/INT64 and likes with PRId32 and likes. Add space before definition.	2016-05-20 09:18:32 +02:00
Ai Lin Chia	83bb83a554	Remove unused msg5b	2016-05-17 10:32:56 +02:00
Ai Lin Chia	a97fd76a85	Add configurable parameters for posdb & titledb maxTreeMem	2016-05-13 11:12:53 +02:00
Ai Lin Chia	b73bc3c819	Remove commented out code. Simplify statements. Replace INT32/INT64 with PRId32/PRId64.	2016-05-12 12:45:48 +02:00
Ivan Skytte Jørgensen	be9840d1c0	Replaced Threads.* with a jobscheduler Threads were being created and destroyed which can be expensive. The thread-per-job model has been changed to a job scheduler that manages the job queues and threads in pools. The submission of a job now specifies start/finish routines, state, and as precisely what kind of job it is. The job scheduler then takes care of the rest. it is hidden how many queues and pools there are.	2016-04-29 14:27:27 +02:00
Matt	09de59f026	do not store cblock, etc. tags into tagdb to save disk space. added tagdb file cache for better performance, less disk accesses. will help reduce disk load. put file cache sizes in master controls and if they change then update the cache size dynamically.	2015-09-10 12:46:00 -06:00
Matt	4e8a42e024	text replacements for bad int32_t substitutions	2014-11-17 18:24:38 -08:00
Matt	96b8197ad3	now it compiles with -m32	2014-11-10 14:45:11 -08:00
Matt Wells	e7dd8f7956	replace long long with int64_t	2014-10-30 13:36:39 -06:00
Matt Wells	b13f3d24d7	replaced unsigned long long with uint64_t	2014-10-30 13:30:39 -06:00
Matt Wells	edbd61b0c5	thread fixes. if pthread_create fails then keep thread queue and just return. will try to relaunch later. do not count delete keys towards shard rebalance count.	2014-03-15 20:07:02 -07:00
Matt Wells	27e8e810d2	use collnum instead of coll string. more stable since resetting collections keeps string the same but changes the collnum.	2014-03-06 15:48:11 -08:00
Matt Wells	4606e88721	code cleanups. xmldoc::injectDoc(), and it'll add a SpiderRequest as well. better collectiondb init code.	2014-01-18 21:19:26 -08:00

1 2

58 Commits