Commit Graph

29 Commits

Author SHA1 Message Date
45b8bb3421 log msg cleanups 2014-05-11 21:55:44 -07:00
acd05aa740 fix a few minor bugs.
/master/->/admin/ and crawl type mismatch.
2014-03-16 10:34:58 -07:00
8aa0662a27 Merge branch 'diffbot' into testing
Conflicts:

	Make.depend
	PageResults.cpp
	Parms.cpp
	Spider.cpp
	Spider.h
	gb.conf
2014-03-08 09:38:44 -07:00
0e48bbcea9 fix a core from bad return values 2014-02-12 13:21:30 -08:00
eb044c765c remove login link on root pages.
add hand cursor to logout link.
2014-02-12 00:47:58 -07:00
8d5e1cb547 added url download support 2014-01-20 23:17:04 -08:00
d03028ea93 bulk api post truncation fix 2013-12-17 10:03:46 -08:00
5e4b5a112c Merge branch 'master' into diffbot
Conflicts:

	PageResults.cpp
	Threads.cpp
	XmlDoc.cpp
	XmlDoc.h
2013-12-07 11:34:26 -07:00
5da41cd113 fix a couple different cores. 2013-11-24 19:46:44 -07:00
7020f66daa bulk api nominal updates 2013-11-13 14:30:51 -08:00
e395628d5a use &format=0 1 or 2 for html/xml/json now.
use &icc=1 to get dump of json objects in serps.
2013-11-08 18:00:30 -08:00
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
2c7035ac2b do not truncate diffbot reply 2013-11-05 11:17:54 -08:00
22f9e9355d /v2/bulk api fixes 2013-10-22 18:51:09 -07:00
d16e5d37f1 tested robots crawl-delay directive
by forcing a 10.1 second delay for
diffbot.com in XmlDoc.cpp.
seemed to work after a few fixes.
however, it is ultimately only
an IP-based crawl delay, although
the delay applies to all subdomains
on the same domain, it's just that each IP
has its own timer for that delay.
2013-10-22 17:41:52 -07:00
8c3a61f070 /v2/crawl api 2013-10-22 12:25:37 -07:00
ea859ef685 added 'gb emailmandrill' for testing.
got it working. it posts json, not url encoded.
2013-10-09 17:35:51 -06:00
3702a05d64 add sendEmailThroughMandrill() to send
through mail chimp http api.
2013-10-08 18:01:38 -07:00
76c9f47498 file download api updates.
to include collection name in filename
being downloaded.
2013-09-30 11:10:43 -06:00
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
fd081478de fix crawlbot to work on a distributed network
as far as adding/deleting/resetting  colls
and updating parms. ideally we'd have a Colldb
Rdb where each key was a parm. that would make
syncing easier if a host went down, then it would
get the negative/positive colldb parm keys later.
so it could sync up on all your operations as long
as all your operations in terms of adding and deleting
database key/value pairs.
2013-09-26 22:41:05 -06:00
02bf6ab3cc new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
93ce424d99 start working on the main gui for
crawlbot which is /crawlbot
2013-09-13 16:22:07 -07:00
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
82ee2dfed7 fix cores when spider is unzipping
gzipped web pages.
2013-08-28 22:49:22 -06:00
e9297df240 listen on DNS port 5998 not 6000. 6000 seemed
to cause issues on a particular install for
some reason.
2013-08-19 15:02:27 -06:00
0b94b31fbc Fix potential core issue in proxy. 2013-08-08 15:14:36 -06:00
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00