bd82145626Merge branch 'diffbot-testing' into testing
mwells
2014-04-05 12:34:46 -0700
89f5c8c059Merge branch 'diffbot-matt' into diffbot-testing
mwells
2014-04-05 11:34:27 -0700
61b4ec4ca6added some qa testing logic. qa.cpp.
mwells
2014-04-05 11:33:42 -0700
0988a134d0Merge remote-tracking branch 'origin/diffbot' into diffbot-dan
Daniel Steinberg
2014-04-01 19:48:24 -0700
4856cc4c60||, not &&
Daniel Steinberg
2014-04-01 10:45:54 -0700
3e38bd169eand return an error
Daniel Steinberg
2014-04-01 10:43:17 -0700
94b169b8dconly delete if there were no io errors
Daniel Steinberg
2014-04-01 10:42:12 -0700
6568858e81implement something that works like mv, which tries rename first, and if that fails copies the bytes. rename doesn't work across devices
Daniel Steinberg
2014-03-31 20:44:39 -0700
d6434191d1nomenclature changes to reduce collissions. name collection 'qatest123' for doing smoke tests, not 'test'.
Matt Wells
2014-03-31 15:02:17 -0700
9c8410767dfix critical title alloc/free bug in title.cpp.
Matt Wells
2014-03-28 08:01:01 -0700
c1671015c8Merge branch 'diffbot-dan' into diffbot-testing
Matt Wells
2014-03-27 12:19:50 -0700
582349334fdo not use certain other json fields when computing checksum for deduping. like stats, querystring, ...
Matt Wells
2014-03-27 12:20:53 -0700
402377d2e6fix bug of gbmin, gbmax etc. not working. floats were being rounded down to ints in most cases it seems. so .9 -> 0 etc.
Matt Wells
2014-03-26 11:56:06 -0700
d67f09feebalso include a timestamp field with an RFC 1123 formatted date
Daniel Steinberg
2014-03-25 21:45:21 -0700
0efac8c156Defect #2080: seed URLs duplicated
Daniel Steinberg
2014-03-25 17:25:55 -0700
e1b1b15a38bigger buffer
Daniel Steinberg
2014-03-25 16:34:40 -0700
9846061dffwhen restarting a bulk job, copy bulkurls.txt to /tmp, and then transfer it back to the new collection folder
Daniel Steinberg
2014-03-25 16:20:24 -0700
ab90c06d8dadd TODO for regex checking
Daniel Steinberg
2014-03-25 13:05:43 -0700
1ff6c1fae0Merge remote-tracking branch 'origin/diffbot' into diffbot-dan
Daniel Steinberg
2014-03-25 12:53:37 -0700
b8836745f0use SpiderRequest instead of isonsamedomain flag to determine whether to output data in CSV (Defect #2122)
Daniel Steinberg
2014-03-25 12:51:08 -0700
b6e5424e32do not download bulkjob urls in crawlbot. just return a fake http reply. however, do use crawl-delay throttling logic. deduping is already turned off for bulk jobs so it should be ok.
mwells
2014-03-21 12:40:38 -0700
b33121af7dmake all field names lower case without spaces when we hash them to make the prefixhash. since json names often have mixed case field names and spaces.
Matt Wells
2014-03-20 16:08:02 -0700
98a10d4936Merge branch 'testing' into diffbot-testing
Matt Wells
2014-03-20 15:50:49 -0700
bbc8fc0c79always show admin link
Matt Wells
2014-03-20 15:48:51 -0700
67202f3731Merge branch 'diffbot' into diffbot-testing
Matt Wells
2014-03-20 15:39:03 -0700
99bd9319fdtemp hack to reduce network comm between trinity and neo
Matt Wells
2014-03-20 15:42:34 -0700
5ed19026d9temp debug comments
Matt Wells
2014-03-20 15:33:37 -0700
b8d0e95035Merge branch 'diffbot' into diffbot-testing
Matt Wells
2014-03-20 10:26:55 -0700
b31eaee9fdsimple bool queries work
mwells
2014-03-18 12:07:29 -0700
d4302e3301fix core
Matt Wells
2014-03-18 11:12:50 -0700
3b97682cc3more bool query fixes
Matt Wells
2014-03-18 10:44:56 -0700
6e23d37e47Merge branch 'diffbot' into diffbot-testing
Matt Wells
2014-03-17 17:27:28 -0700
54cc8088fbmore bool query fixes. hopefully this will do it, but still can do some optimizations for speed.
mwells
2014-03-17 17:00:08 -0700
9d3c35ad17nothing
Matt Wells
2014-03-17 13:53:19 -0700
4abf56a75dcleanups
Matt Wells
2014-03-16 18:06:22 -0700
d2511d0befhost table cleanups
Matt Wells
2014-03-16 17:14:47 -0700
5057fdaf14aesthetic cleanups
Matt Wells
2014-03-16 17:12:04 -0700
d320bf9d75spidering back on in main's coll.conf
Matt Wells
2014-03-16 15:06:39 -0700
c513ad9418Merge branch 'diffbot' into testing
Matt Wells
2014-03-16 14:51:22 -0700
acd05aa740fix a few minor bugs. /master/->/admin/ and crawl type mismatch.
Matt Wells
2014-03-16 10:34:58 -0700
edbd61b0c5thread fixes. if pthread_create fails then keep thread queue and just return. will try to relaunch later. do not count delete keys towards shard rebalance count.
Matt Wells
2014-03-15 20:07:02 -0700
5ca411e3e2tuning the rebalance loop
Matt Wells
2014-03-15 14:56:11 -0700
86147fe22ctight merge during rebalance to save disk space, so neg recs annihilate pos recs.
Matt Wells
2014-03-14 23:37:30 -0700
6c704f6fdfMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Matt Wells
2014-03-14 22:16:40 -0700
e37eebd76fwhen rebalancing wait for merge to complete before scanning more
Matt Wells
2014-03-14 22:16:25 -0700
82ac3fab6cmerge fixes
Matt Wells
2014-03-14 22:15:08 -0700
df46a6fc1dMerge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot-matt
Matt Wells
2014-03-14 19:32:10 -0700
1f162ce7b2update localhosts.conf too
Matt Wells
2014-03-14 19:20:23 -0700
553aefdb55keep files tightly merged when doing rebalanced to avoid running out of disk space
Matt Wells
2014-03-14 19:19:41 -0700
cb483c42eamore fixes for bool searching before using a slightly different and simpler approach
mwells
2014-03-13 16:00:23 -0700
7812f5c746more bool fixes. still needs a little more work
mwells
2014-03-13 13:54:23 -0700
3b2d981dffmore fixes for new boolean logic.
mwells
2014-03-13 13:09:33 -0700
fb0123ad53nothing
Matt Wells
2014-03-13 11:27:28 -0700
9acb7ef0f4fix core &token= core
Matt Wells
2014-03-13 07:57:06 -0700
7b5816f194updated error message
Daniel Steinberg
2014-03-12 20:56:27 -0700
018258bcaaMerge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Matt Wells
2014-03-12 20:55:21 -0700
fbd1bcd349initial attempt at new boolean query logic. supports unlimited # of boolean query terms. already docid phased from phasing logic already there but could be phased more to save more mem and speed up a little more.
Matt Wells
2014-03-12 20:53:44 -0700
3e7243c6cefix add url core
Matt Wells
2014-03-12 08:28:42 -0700
34f7540160fix addurl core
Matt Wells
2014-03-12 08:11:48 -0700
7ec1513d41updates
Matt Wells
2014-03-12 08:09:45 -0700
f27d549fc6Defect #2122: If a crawl and there are no urlCrawlPattern or urlCrawlRegEx values, only return URLs from that domain
Daniel Steinberg
2014-03-11 19:46:38 -0700