58 Commits

Author SHA1 Message Date
8501e6a8bc Moved is-url-blocked check to separate file 2017-08-24 15:12:04 +02:00
a0b00870bc Renamed UrlBlock* to UrlMatch* 2017-08-24 14:46:02 +02:00
9ac463a2c1 aaaand changed to actually use the new Titledb trace option 2017-05-23 12:05:51 +02:00
14ff90b147 changed Titledb log to trace log and made Titledb trace log configurable 2017-05-23 11:07:51 +02:00
6c30ebdc6a Added more checks for corrupted titledb records 2017-05-21 23:22:07 +02:00
b6c0cc4535 Remove commented out code 2017-05-09 16:30:42 +02:00
e797620783 Remove unused allowPageCache 2017-05-08 16:01:19 +02:00
5d83be6c4a Remove unused maxCacheAge from Msg5 2017-05-08 16:01:19 +02:00
63796110ae Remove unused retryNum from Msg5::getList 2017-05-08 16:01:19 +02:00
1c8be52e78 Remove unused cacheKeyPtr from Msg5::getList 2017-05-08 16:01:19 +02:00
22a9893419 Reduce Msg5::getList to only one interface 2017-05-08 16:01:19 +02:00
a20cc2b93f Dropped unused 'syncPoint' parameter to Msg5::getList() 2017-05-08 12:59:06 +02:00
b854390ef4 Remove collectionless flag from Rdb::init 2017-03-27 23:12:18 +02:00
02dbdfddef Fix clang warning: no newline at end of file 2017-03-27 14:57:20 +02:00
224bd301b3 Move titledb init variable to static function 2017-03-13 12:12:47 +01:00
978988373c Rename spider-dedup threads to merge-filter. Add filterTitledbList (uncalled) to filter based on urlblocklist 2017-03-06 16:17:49 +01:00
106b0e2baa More constness in Titledb.* 2016-11-28 14:56:03 +01:00
5af39f1358 Moved MAX_COLL_LEN and MAX_URL_LEN to separate header files
So not most files have to include Collectiondb.h and Url.h
2016-11-12 20:44:42 +01:00
aba937780d Stop #including Conf.h from header files 2016-11-12 20:24:20 +01:00
504845bc9b Remved default values for parametres to Rdb::init() 2016-11-07 16:01:28 +01:00
5825378da5 Use merge-space while merging
No more BigFile .part* deletion during a merge to preserve disk space. Instead MergeSpaceCoordinator is used for coordinating access to a large and possibly cheap storage with room for a whole resulting mergefile.
When a mrge file has been finished the reads are allowed from that and reads from the source files disallowed, which are then deleted. Then the file is renamed/moved from merge-space to regular collection storage using the 2-phase commit feature of GbMoveFile.cpp, and finally reads are done from the finished file.

Details:
  RdbBase: Use MergeSpaceCoordinator and merge space for temporary target merge file.
  RdbBase: better cleanup of crashed merges
  RdbBase: more mutex locing while manipulatin m_fileInfo array
  RdbBase: keep track of thraeds/jobs
  RdbMerge: ditto
  RdbMerge: Dont call file->chopHead()
  Msg5/Msg3: no more "compensate for merge" flag
  Msg3: Skip over RdbBase files that have reads disallowed
2016-10-31 18:16:40 +01:00
d1060ad62e meh.. 2016-10-21 22:37:13 +02:00
06232ea970 removed var reassignment without use in Titledb::init 2016-10-21 21:51:10 +02:00
3e4613cbef Removed 'dir' parameter from Rdb::init()
'dir' parameter was only used for a sanity-check. All callers specified g_hostdb.m_dir; and RdbBase et al uses g_hostdb.m_dir directly so there wasn't much point in keeping that parameter
2016-10-17 13:05:19 +02:00
17fdf98102 Removed isTitledb parameter to Rdb::init() 2016-10-03 17:27:24 +02:00
4e0a489629 Don't call g_jobScheduler.disallow_new_jobs() anymore in *db::verify() when just using null callbacks to msg5::getlist() 2016-09-22 16:44:38 +02:00
a040a78c99 Use key96_t instead of key_t and redefining std lib key_t (which breaks std lib functionality that uses key_t) 2016-09-02 14:49:06 +02:00
af48ba7e17 Remove always true dedup from Rdb::init 2016-08-31 11:13:31 +02:00
b0d66c7eb4 Remove always true isTreeBalanced from Rdb::init 2016-08-31 11:13:31 +02:00
01c655dd5b Remove unused loadFromDiskCache & pc from Rdb::init 2016-08-31 11:13:31 +02:00
230e552393 Remove unused maxCacheMem & maxCacheNodes from Rdb::init 2016-08-31 11:13:31 +02:00
e6f510c594 Remvoed Msg5::m_addToCache (and associated parameter in getList() 2016-08-04 12:36:38 +02:00
4e4e3371e6 Log function will now return void instead of a boolean 2016-08-01 18:12:10 +02:00
8e4254f52c Removed default parameter values from Msg5::getList() (variant ) 2016-08-01 13:49:48 +02:00
f982c35079 Remove EURLHASNOIP. g_errno is never set to EURLHASNOIP 2016-06-29 16:39:51 +02:00
8c4f30cba6 Fix const-string 2016-06-28 14:39:50 +02:00
f3f5eefcb6 First batch of changes streamlining emergency shutdown code 2016-06-20 12:30:26 +02:00
d40ecb2f8e Replace INT32/INT64 and likes with PRId32 and likes. Add space before definition. 2016-05-20 09:18:32 +02:00
83bb83a554 Remove unused msg5b 2016-05-17 10:32:56 +02:00
a97fd76a85 Add configurable parameters for posdb & titledb maxTreeMem 2016-05-13 11:12:53 +02:00
b73bc3c819 Remove commented out code. Simplify statements. Replace INT32/INT64 with PRId32/PRId64. 2016-05-12 12:45:48 +02:00
be9840d1c0 Replaced Threads.* with a jobscheduler
Threads were being created and destroyed which can be expensive. The
thread-per-job model has been changed to a job scheduler that manages the job
queues and threads in pools. The submission of a job now specifies start/finish
routines, state, and as precisely what kind of job it is. The job scheduler then
takes care of the rest. it is hidden how many queues and pools there are.
2016-04-29 14:27:27 +02:00
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
b13f3d24d7 replaced unsigned long long with uint64_t 2014-10-30 13:30:39 -06:00
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00