4cfb934aaf
Addressed format-overflow compiler warnings
2021-06-20 08:16:15 +00:00
6d7e14d2a4
Corrects compiler warning: C++11 requires a space between string literal and macro
2021-06-19 17:17:06 +00:00
b1ace63607
codespell: spelling corrections
2021-05-06 01:52:55 +10:00
fe448173d5
Merge branch 'ia' into testing
2015-11-09 11:14:00 -07:00
aeca57e9f4
Pass in the buffer size of an injection request so that if the content
...
length header field is bigger than the actual buffer we won't index
random memory. Fixes bug with truncated warc captures.
2015-10-28 00:38:08 -06:00
3e19d43aa5
fix core
2015-10-14 12:03:12 -06:00
a4901431be
a couple little fixes to pass smokes
2015-10-14 11:53:05 -06:00
1708d0608c
some fixes for detecting corrupted injection requests.
...
seems to be very common.
2015-10-07 21:47:10 -06:00
eefbe95ce9
Merge branch 'diffbot-testing' into ia-zak
2015-09-21 10:13:29 -06:00
bcdecc63c6
expose "urlip" injection parm to provide ip of url
...
being injected to save gigablast from an ip lookup
if you want.
2015-09-16 09:43:15 -06:00
32d7f5cb97
better warc injection load balancing
2015-09-15 15:04:26 -06:00
e2c61c7a78
Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak
2015-09-11 14:22:27 -06:00
f01db79e5f
show inject requests in the spider queue table now
2015-09-11 14:16:26 -06:00
a270e163de
Fix coring on udp timeout when clustering search results.
...
Add ability to force update a list of items in warc injector.
2015-09-11 11:05:57 -06:00
e7f1c75855
Add logic to limit number of msg7s to 100 per hosts, then we drop the
...
requests.
2015-09-03 22:17:16 -06:00
36b8d384bd
Fixes to injector script.
...
New colors and metrics on performance graph.
2015-08-13 23:29:20 -06:00
15eb7f659d
Fix some malformed html on hosts page.
...
Fix core when no collection record in injection request.
Add a script to test disk speed.
2015-07-16 12:02:14 -06:00
46af0e1bce
if url too long return the EURLTOOBIG error code.
...
it prints 'Too many chars in url' as the official error msg.
2015-07-08 21:36:18 -06:00
815bd7ce0a
quite a few bug fixes.
2015-07-02 17:42:05 -06:00
32987e76ee
Add json metadata field to page inject.
...
Fix memory leak when spidering warc files.
Add script to inject warcs from internet archives search results.
2015-06-14 20:58:41 -06:00
ee5ffef834
fix core
2015-05-05 02:53:42 +00:00
08e01b5ac8
fix more bugs. new injections seem somewhat stable now.
2015-05-03 21:58:26 -07:00
ff969d92bb
can inject a single doc now
2015-05-03 21:14:28 -07:00
bc54282339
complete overhaul of injection pipeline now compiles.
...
should distribute injection requests evenly over the cluster.
uses new InjectionRequest class which sets from httprequest
using parms in Parms.cpp. and easily serializes into a udp request.
very nice. we should use this model going forward.
2015-05-03 19:07:44 -07:00
b39a065259
checkpoint #2
2015-05-03 17:51:47 -07:00
0df4abc759
checkpoint
2015-05-04 00:17:17 +00:00
b0abe597e7
more fixes from qa test.
2015-05-02 14:34:07 -07:00
16b73a9bdd
now we pass both injection tests in qa.cpp
2015-05-02 12:32:13 -07:00
ecb6d081d5
fix indexArc()
2015-05-01 23:24:40 -07:00
5c89bde956
now all container doc logic is in xmldoc
...
and out of pageinject. compiles. needs testing.
2015-05-01 20:32:54 -07:00
0ca27638bc
checkpoint. moved warc and arc looping into xmldoc.
...
now will any container doc from pageinject into
xmldoc. simplifies pageinject.cpp a lot. and sets up
a framework for dealing with container docs.
2015-05-01 19:11:13 -07:00
ce030fcfb0
now .arc and .arc.gz injections work
2015-04-30 20:25:26 -07:00
b4d0c53904
fix single url injects
2015-04-30 19:09:07 -07:00
fbfdde5195
fix for old delimeterized injects. was coring in gb smokes.
2015-04-30 19:07:12 -07:00
e387c0f154
yay test warc injecting working
2015-04-30 18:45:46 -07:00
f1663402d9
compiles again now
2015-04-30 18:23:46 -07:00
2479dd330d
ok, move all the warc/arc parsing/indexing logic into
...
pageinject.cpp and out of xmldoc.cpp. it makes more
sense there. since really all we need to do is download
the warc's content and it is like injecting a delimeterized
document in the loop already in pageinject.cpp.
2015-04-29 21:39:18 -07:00
45c0909cb7
injecting warc files nicely now
2015-04-29 19:55:06 -07:00
21948e15f6
more fixes
2015-04-28 23:30:14 -07:00
9370c8f52e
more fixes
2015-04-28 23:20:16 -07:00
0eb415d408
added preliminary support for spidering .warc.gz and .arc.gz files
2015-04-27 21:41:22 -06:00
38caa517f2
add switches to disable injections or querying
...
from the master controls, for all collections.
2015-03-04 10:49:37 -08:00
b89f071f7c
quite a few bug fixes from adding the new query
...
syntax qa test.
2014-12-11 18:24:28 -08:00
0460335861
more permission system updates
2014-12-08 09:49:17 -08:00
a7462ed1f4
fix injection stuff
2014-12-04 09:29:17 -07:00
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
29f928a71e
import fixes
2014-09-25 20:48:34 -07:00
d4182cf4ed
fix importing function some
2014-09-25 20:33:42 -07:00
fce036868b
only host #0 should read the import data.
2014-09-25 07:55:30 -07:00