Commit Graph

  • dd9b4e0ca2 fix little core Matt Wells 2015-08-17 15:04:16 -0700
  • 30693c3cf7 use setBuf() func instead Matt Wells 2015-08-16 22:19:30 -0700
  • 28644f127e fix problem of saving rdbmap when coring in a malloc/free. Matt Wells 2015-08-16 22:14:53 -0700
  • be1ebfbcd0 do not execute backtrace function if core was in Mem.cpp basically otherwise we don't save state. Matt Wells 2015-08-16 20:29:14 -0700
  • 3a67480b63 for BigFile::m_fileBuf array of Files make sure to clear it for files that do not exist so File::m_calledSet is false on them. so BigFile::getFile(j) returns a File ptr whose m_calledSet is false if the file does not exist on disk. and BigFile::removePart(j) sets ((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false. Matt Wells 2015-08-16 19:40:08 -0700
  • 63c7752734 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2015-08-16 17:14:33 -0700
  • e671be17ca fix log msg Matt Wells 2015-08-16 17:14:21 -0700
  • b709f736f4 show max mem alloc slots in pagestats.cpp Matt 2015-08-16 17:32:47 -0600
  • ffa6c09c74 fix BigFile::addPart(n) when adding parts out of order. Matt Wells 2015-08-16 15:13:59 -0700
  • f8fb266844 fix new merging algo. Matt Wells 2015-08-16 10:11:21 -0700
  • 178721d35b speed up getFileSize() by using stat() func again. despam logs at startup. do not perm check every coll dir, only first 100, on startup to make things faster. Matt Wells 2015-08-15 22:21:15 -0700
  • bff643b555 use a linked list of merge candidates to make attemptMergeAll() much much faster. Matt 2015-08-15 19:26:37 -0600
  • d9422d8b0e get rid of limits on file sizes. dynamically allocate file names and fixed-size File array in BigFile class. should save gigabytes of memory in many-collection systems with 1+ million files or so. Matt 2015-08-14 20:14:50 -0600
  • f7f577cf98 the new disk page cache. temporarily disabled. Matt 2015-08-14 15:52:24 -0600
  • 3213858545 Merge branch 'diffbot-testing' into diffbot Matt 2015-08-14 13:08:48 -0600
  • 0d2aa33afb undo #define thing Matt 2015-08-14 13:08:11 -0600
  • a1ed368d82 bring back max mem control into master controls. it's useful to limit per process mem usage to prevent oom killer because we can't save if we get killed. overhaul diskpagecache to just use rdbcache. much simpler and faster, but disabled for now until debugged more. reduce min files to merge for crawlbot collections so they stay more tightly merged to conserve fds and mem. improved logDebugDisk msgs. overhauled File.cpp fd pool. now it is way faster and doesn't use any extra mem. much simpler too. although could be sped up a little by using a linked list, but probably is not significant enough to warrant doing right now. increase mem ptr table from 3M to 8M slots. should really make dynamic though. fix core from null msg20s[0]->m_r. only call attemptMergeAll once every 60 seconds really. do not attempt merge if already merging. Matt 2015-08-14 12:58:54 -0600
  • f09a94fc4e Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-08-13 23:31:17 -0600
  • 36b8d384bd Fixes to injector script. New colors and metrics on performance graph. Zak Betz 2015-08-13 23:29:20 -0600
  • 5c67cbe65d undo Matt 2015-08-12 08:43:44 -0700
  • 444ebeeb65 one scp install per host Matt 2015-08-12 08:39:01 -0700
  • 5c2a2ce496 fix core Matt 2015-08-12 08:36:23 -0700
  • 866e712322 clarify log msg Matt Wells 2015-08-10 11:28:51 -0700
  • c4189c64e6 Merge branch 'diffbot-testing' into diffbot Matt Wells 2015-08-10 11:05:31 -0700
  • 8a6be3b4ac fix contetion of one collection starving others for urls from the same ip. Matt Wells 2015-08-10 11:04:49 -0700
  • adc9d3bc89 Merge branch 'testing' into diffbot-testing Matt 2015-08-08 19:22:50 -0600
  • 3477d39608 fix cores Matt 2015-08-08 19:22:01 -0600
  • 840ca3fea1 fix rdbmap reduce mem thing Matt Wells 2015-08-08 15:43:09 -0700
  • f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem according to the profiler when we got tens of thousands of collections. Matt Wells 2015-08-08 12:27:18 -0700
  • c2bf461d27 call reduceMemFootprint() after writing rdb map to save mem immediately rather than on restart of gb Matt Wells 2015-08-08 11:23:14 -0700
  • 890170aa90 fix core from archive.org yml file checking. show site ip in inlinker table for easier spam removal. Matt 2015-08-02 12:50:29 -0600
  • c1ec4dedbb fix for bad query formation. text:""foo bar"" Matt 2015-08-02 11:34:55 -0600
  • 37591be421 Merge branch 'diffbot-testing' of https://github.com/gigablast/open-source-search-engine Kevin Truong 2015-07-31 18:12:56 -0700
  • b6207ec344 Fixes #3012. Allow facet ranges to work on negative numbers. Kevin Truong 2015-07-31 18:11:37 -0700
  • 18d1a787bb fix core dump from meta data in title rec that was just a \0 from injecting content that way Matt 2015-07-31 18:42:21 -0600
  • e18fca88f4 Merge branch 'diffbot' into diffbot-testing Matt Wells 2015-07-31 08:56:47 -0700
  • 85c7fbae70 fix infinite loop bug from EBADRBDID Matt Wells 2015-07-31 08:56:26 -0700
  • afc1b43619 Merge branch 'ia' into ia-zak Matt 2015-07-30 10:22:23 -0600
  • 5af61ff59a fix core from boolean queries Matt 2015-07-30 10:21:30 -0600
  • 72768c093d Merge branch 'diffbot-sam' of github.com:gigablast/open-source-search-engine into diffbot-sam Matt 2015-07-23 17:24:41 -0600
  • 86946392d0 reverted stepping. Useless sam 2015-07-23 10:53:59 -0700
  • da41d53575 Merge branch 'diffbot-testing' into diffbot-sam Matt 2015-07-23 09:27:00 -0600
  • e165b5d668 speed up bool queries Matt Wells 2015-07-22 13:00:45 -0700
  • e9f86f362e Merge branch 'ia' into ia-zak Matt 2015-07-22 12:02:19 -0600
  • dead58329e Add a script for interacting with hosts.conf files. Zak Betz 2015-07-21 10:17:01 -0600
  • 090e1b35d5 fix score info reporting for new bool query min score based on # of query terms contained. Matt 2015-07-20 14:37:37 -0600
  • 69c791e5aa for now at least do not use siterank for ranking boolean search results. Matt 2015-07-20 11:50:31 -0600
  • 1c93a88d82 use the # of matched terms as the score of a doc when doing a boolean query. later: use proximity scoring for non-field query terms. Matt 2015-07-20 11:09:56 -0600
  • ff7639e323 do not get synonyms for boolean operators. just skip synonyms if ignoreWord is set at all. Matt 2015-07-19 13:07:05 -0600
  • 646bc91c59 fix more possible unicode errors Matt 2015-07-19 12:05:09 -0600
  • b9fc583cae fix core Matt 2015-07-18 18:01:11 -0600
  • 16fd428887 fix more cores from the dynamic query size changes. add how many query terms we truncated in the json/xml replies. document those fields as well. Matt 2015-07-18 14:15:47 -0600
  • dab0726fac typo fix Matt Wells 2015-07-17 10:43:38 -0600
  • 5e7a06229c print special message if no seeds were able to be crawled. Matt 2015-07-17 08:42:01 -0600
  • 3ffa651b63 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-07-16 12:39:33 -0600
  • 15eb7f659d Fix some malformed html on hosts page. Fix core when no collection record in injection request. Add a script to test disk speed. Zak Betz 2015-07-16 12:02:14 -0600
  • 7e526863d7 do not include 'diffbot uri' in urls.csv. should not have been there. Matt 2015-07-16 10:11:04 -0600
  • 0d3cfc2796 single words in quotes - keep them in quotes so we do not get synonym forms Matt 2015-07-15 09:58:25 -0600
  • f1b0bd0149 quick fix for tree sanity checker Matt 2015-07-15 09:46:27 -0600
  • 0d1acb09bc try to fix tree if corruption detected when dumping to disk Matt Wells 2015-07-14 22:27:43 -0600
  • b0a6e590d6 treat estimatedDate like date sam 2015-07-14 17:18:16 -0700
  • 8048517463 gbss fix Matt 2015-07-14 18:17:52 -0600
  • 016fa88b29 treat estimatedDate like date sam 2015-07-14 17:17:21 -0700
  • 9946b4b4be add gbssDiffbotType and gbssIsSeedUrl:1 to spider status docs. Matt 2015-07-14 17:59:50 -0600
  • a697b3d5a5 Fix Bad File Descriptor loop bug when downloading a static file on a slow disk. Zak Betz 2015-07-14 17:00:09 -0600
  • fa38d97ec4 Merge branch 'diffbot' into diffbot-testing Matt 2015-07-14 11:45:05 -0600
  • f173b41e92 additional log info Matt 2015-07-14 11:44:14 -0600
  • baff94875d fix another core from dynamic query sizing Matt 2015-07-14 09:23:44 -0600
  • c8cf0e5440 fix some mem leaks from allowing really big queries. added a max query term control to search controls to limit users doing really big queries. but default it very high to 1M. Matt 2015-07-13 23:17:53 -0600
  • f3d35b557f should solve defect #3002 sam 2015-07-13 18:08:25 -0700
  • fc4b4db425 fix core related to increasing max query length Matt 2015-07-13 19:00:47 -0600
  • c3a0f21600 nomenclature changes Matt 2015-07-13 18:42:13 -0600
  • acf389debd Merge branch 'ia' into ia-zak Matt Wells 2015-07-13 18:39:00 -0600
  • 1ba57f9278 fix pesky memory leak finally Matt Wells 2015-07-13 17:47:34 -0600
  • c03594034d bump up some limits for extraordinarily long queries Matt Wells 2015-07-13 17:43:28 -0600
  • 34ec49e804 get mike's super long query working Matt 2015-07-13 14:59:44 -0600
  • 0e009fa6bc fix cores from dynamic # query terms fix Matt 2015-07-10 20:49:40 -0600
  • f088e734f6 allow up to 3000 query terms. really we can allow much more since we are mostly dynamically allocating, only a few smaller arrays use the 3000 on the stack. Matt 2015-07-10 19:02:30 -0600
  • 5d57862046 do not core on gigabits overflow issue Matt 2015-07-10 11:00:16 -0600
  • 795bdf2a78 Merge branch 'ia-zak' of github.com:gigablast/open-source-search-engine into ia-zak Matt 2015-07-08 21:36:58 -0600
  • 46af0e1bce if url too long return the EURLTOOBIG error code. it prints 'Too many chars in url' as the official error msg. Matt 2015-07-08 21:36:18 -0600
  • f4effaecb8 Fix memory leak. Zak Betz 2015-07-08 21:27:57 -0600
  • 8f19b1a0e2 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-07-08 14:03:53 -0600
  • 6e21bc7d7c Injection script fixes. Temporary fix for core when injecting large warc. Zak Betz 2015-07-08 14:03:39 -0600
  • 6cd91ae32c Merge branch 'ia' into ia-zak Matt 2015-07-08 13:49:25 -0600
  • a15d2470f5 Merge branch 'testing' into diffbot-testing Matt 2015-07-08 13:48:58 -0600
  • 581f287113 api doc update for facets Matt 2015-07-08 13:48:16 -0600
  • 3395ee8111 fix core in sections Matt 2015-07-08 08:15:30 -0600
  • a7ae510e31 Fix string faceting display for json metadata. Add unit test for faceted metadata. Zak Betz 2015-07-06 23:05:18 -0600
  • 97f2052d63 remove the debug log sam 2015-07-06 18:05:35 -0700
  • bcd53016e9 Fixes #2947. Fixed a bug with counting facet 'totalDocsWithField' and 'totalDocsWithFieldAndValue' Kevin Truong 2015-07-06 17:42:47 -0700
  • 6745c72232 implemented stepping sam 2015-07-06 17:28:17 -0700
  • 87fcda0f93 Fix atotime5 to parse ISO8601. Fix qa test for warcs and arcs. Fix inject script. Zak Betz 2015-07-06 00:51:18 -0600
  • a844f618cc Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-07-05 17:07:40 -0600
  • fa78164a30 Injection script now keeps track of injection date and won't reinject something that hasn't changed. Zak Betz 2015-07-05 17:07:21 -0600
  • adae6689e6 fix add url from root page. fix core from corruption Matt 2015-07-04 21:52:11 -0600
  • 815bd7ce0a quite a few bug fixes. Matt 2015-07-02 17:42:05 -0600
  • f664f1788d Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak Zak Betz 2015-07-02 12:18:08 -0600
  • 6de4199ee8 Fix linkdb core. Make file and line number the label for StackBuf. Zak Betz 2015-07-02 12:17:10 -0600
  • 1966f36c00 fix clock candidate bug Matt 2015-07-01 20:34:39 -0600