12 KiB
Overview
Vagus is a building block for keeping track of which services/instances/servers/things are alive. It does not actively probe instances, so the instances have to tell it that it is alive. Vagus supports multiple clusters/namespaces and the easiest installation is to just install one on every host. As well as distributing alive-information it also supports piggy-backing small pieces of information per instance.
+-------------------+ +-------------------+
| host A | | host B |
| | inter-vagus protocol | |
| vagus A...........|..........................|.vagus B |
| ^ | | ^ |
| +--- instance 1 | | +--- instance 4 |
| +----instance 2 | | +----instance 5 |
| +----instance 3 | | +----instance 6 |
| | | |
+-------------------+ +-------------------+
Goals
- Rapidly stabilize efter power loss
- Rapidly distribute alive-information to the whole system
- Scale to at least 50 hosts
- Scale to at least 1000 instances in total
- Be able to cross networks
- Use UDP multicast and broadcast to lower the load on the network
Non-goals
- Detect "split brain"
- Keeping history of what has once been alive
- Knowing what should be alive
- HA, quorums, vector clocks, etc.
Assumptions
- Secure network: It is assumed that the network is secure. Vagus is not designed to operate over an insecure network.
- Trusted clients: It is assumed that the clients are non-malicious.
- Synchronized clocks: It is assumed that the host clocks are synchronized to a tolerance well below any timeouts used in the messages. NTP is sufficient. Cron-driven 'rdate' is not recommended.
Protocols
Vagus uses two protocols:
- client protocol
- inter-vagus protocol
Client -> Vagus protocol
The client protocol is text-based. Vagus acts as a TCP server. The client sends a single line with a command terminated by NL (or CR+NL). The server responds with a multi-line response terminated by an empty line. If a connection is lost to a client it is not interpreted as the instances reported by that client is dead. If a command is malformatted the server simply cuts the connection.
The clients are normally instances. The main operation on the protocol is clients issuing keepalive
or keepalivepoll
commands repeatedly in an interval of their of choice.
Parameters used in the commands:
- cluster-identifier: Vagus supports multiple clusters/namespaces. The cluster identifier is chosen by the clients and is typically the service name. Eg. if a client is a WebWaffle service then "webwaffle" would be an obvious cluster name. The identifier cannot contain a colon.
- instance-identifier: The id of the instance. It must be unique within a cluster. The identifier cannot contain a colon.
- keepalive lifetime: The valid lifetime of a keepalive expressed in milliseconds.
- optional: extra information
If present, this extra information is distributed and is available via the
keepalivepoll
andpoll
commands. - vagus-identifier: A unique identifier of a vagus. It must be globally unique across all clusters. It is typically the hostname.
getversion command
This commands asks the server about the supported protocol version
Example:
Client:
getversion <NL>
Server:
1 <NL>
<NL>
keepalive command
This commands servers as an instance registration and as a keepalive.
Command parameters (colon-separated)
- cluster-identifier
- instance-identifier
- keepalive lifetime
- optional: extra information
If a client specifies eg. 2500ms as the valid lifetime of the keepalive then if a new keepalive has not been received within 2500ms the registration times out and is deleted.
Example 1 (plain):
Client:
keepalive giraffes:1:2500
Server:
<NL>
Example 2 (with piggy-backed information):
Client:
keepalive giraffes:1:2500:durian+icecream
Server:
<NL>
poll command
This commands retrieves a list of instances in a cluster
Command parameter:
- cluster-identifier
Command response:
- List of instances in the cluster and optionally extra-information
Example:
Client:
poll giraffes <NL>
Server:
3 <NL>
1:durian+icecream <NL>
2 <NL>
5 <NL>
289 <NL>
<NL>
keepalivepoll command
This commands functions as both an instance registration/keepalive and a poll. It is more efficient to issue this command than two separate keepalive
and poll
commands.
Command parameters (colon-separated)
- cluster-identifier
- instance-identifier
- keepalive lifetime
- optional: extra information
Command response:
- List of instances in the cluster and optionally extra-information
Example:
Client:
keepalivepoll giraffes:1:2500
Server:
3 <NL>
1:durian+icecream <NL>
2 <NL>
5 <NL>
289 <NL>
<NL>
getclusters command
This commands retrieves a list of clusters Vagus knows about. It is unspecified if a cluster without any live instances will be listed.
Example:
Client:
getclusters <NL>
Server:
giraffes <NL>
penguins <NL>
armadillos <NL>
<NL>
getvaguslist command
This commands retrieves a list of vagus processes Vagus knows about together when when it last heard from it and its end-of-life.
Example:
Client:
getvaguslist <NL>
Server:
hosta:1493239621752:1493239631000 <NL>
hostb:1493239621726:1493239631000 <NL>
hostd:1493239621747:1493239631000 <NL>
<NL>
vagushint command
This command tells Vagus where another vagus may exist. This can be useful if the client has knowledge of networks where broadcast/multicast may not reach and are not specified in the Vagus's configuration. Only UDP unicast and TCP can be specified. UDP broadcast and multicast can only be specified in the Vagus configuration file.
Example 1 (UDP on IPv4):
Client:
vagushint udp4:192.0.2.7:8720 <NL>
Server:
<NL>
Example 2 (TCP on IPv6):
Client:
vagushint tcp6:[2001:0db8:1::7]:8720 <NL>
Server:
<NL>
pollx command
This commands retrieves a list of instances in a cluster with additional information
Command parameter:
- cluster-identifier
Command response:
- List of instances in the cluster and optionally extra-information
Example:
Client:
poll giraffes <NL>
Server:
3:potato:1496396190.05 <NL>
1:potato:1496396190.03:durian+icecream <NL>
1:tomato:1496396191.99:durian+icecream <NL>
2 <NL>
5 <NL>
289 <NL>
<NL>
where the extra information compared to the normal poll command is the vagus-identifier and the lifetime of the instance.
Vagus <-> Vagus protocol
The vagus processes exchange information about the alive instances they know about. A Vagus sends its alive-instance information at regular interval to all specified destinations.
The inter-Vagus protocol is binary for compactness. It has a single message type.
message ::= "vagus"
| <length>
| <message_type>
| <payload>
length ::= 32bit integer
message_type ::= 8bit enumeration
The signature "vagus" is for easily identifying packets. All integers are in network order. The only message type currently defined is 1 (announcement).
Announcement message (type 0x01) payload:
announcement ::= end_of_life
| vagus_id_length | vagus_id
| cluster_id_length | cluster
| {instance_information}
instance_information ::= instance_id_length | instance_id
| end_of_life
| extra_information
end_of_life ::= 64bit integer (milliseconds since 1970)
vagus_id_length ::= 8bit integer
vagus_id ::= string
cluster_id_length ::= 8bit integer
cluster_id ::= string
extra_information ::= extra_information_len | info
extra_information_len ::= 8bit integer
info ::= string
Vagus configuration
Vagus can be configured, but provides built-in defaults that will work in most simple setups.
main section
This section contains the general configuration of Vagus.
identity
Default value: the same as the hostname. This is the identity of the vagus. The hostname is usually a good choice.
instance-timeout-min
Default value: 500 The minimum timeout for an instance, in milliseconds. If a client specifies an instance lifetime short than this the the actual value is raised to this minimum.
instance-timeout-max
Default value: 600000 The minimum timeout for an instance, in milliseconds. If a client specifies an instance lifetime longer than this the the actual value is lowered to this minimum.
announcement-interval-min
Default value: 500 The minimum interval between sending a batch of announcements to other vagus processes, in milliseconds.
announcement-interval-max
Default value: 10000 The maximuminterval between sending a batch of announcements to other vagus processes, in milliseconds.
client-port
Default value: 8720 Which TCP port to listen for client commands.
udp section
This section contains configuration for how to use UDP unicast and broadcast.
If no peer
nor any broadcast
items are specified then the default is broadcast: *
port
Default value: 8721 The UDP port to listen on and send to/from.
peer
Default value: none This item can be specified multiple times. It specifies where to send unicasts. The value can be either an IPv4 address or an IPv6 address. Examples:
peer: 192.0.2.7
peer: 2001:0db8:1::7
broadcast
Default value: none (but see comment on whole section above) This item can be specified multiple times. It specifies where to send broadcasts. The value comes in three forms:
*
(asterisk) Specifies to send broadcast to 255.255.255.255- dotted-quad Specifies to send a broadcast to that address
- interface name (must start with a letter) Specifies to send broadcast on this network interface on all subnets. Only IPv4 addresses (dotted-quad) is supported.
Broadcast within a subnet usually work fine. If you want to cross to another subnet you may need to tell the router to allow that.
Examples:
broadcast: *
broadcast: eth0
broadcast: 192.0.2.255
broadcast: eth0:192.0.2.255
udp-multicast section
This section specifies how to use UDP multicasts. Note: multicast usually work fine on a single subnet. When crossing subnets you need to tell you network administrator to allow it. Or install a multicast router yourself (see http://troglobit.github.io/smcroute.html for ideas) on your router.
port
Default value: 8721 Specifies which port listen and send to.
ttl
Default value: 3 This option specifies which TTL to set in the generated multicast datagrams. For your conveniance these are the normally used values of multicast TTLs:
- 0: Restricted to the same host. Won't be output by any interface.
- 1: Restricted to the same subnet. Won't be forwarded by a router.
- 2..31: Restricted to the same site/organization/department.
- 32...: region, continents and larger scopes
multicast
Default value: none
This item can be specified multiple times.
The value takes the form of :
. The interface name can be *
in which case Vagus uses all interfaces and all networks (except loopback).
Examples:
multicast eth0:224.1.1.1
multicast *:224.1.1.1
multicast eth0:ff02::1
tcp section
port
Default value: 8721 Specifies which port listen on and connect to.
peer
Default value: none Specifies the IP-address of a peer.
Examples:
peer: 192.0.2.7
peer: 2001:0db8:1::7
Example configuration:
[main]
identity: box-in-the-broom-cupboard
instance-timeout-min: 500
instance-timeout-max: 600000
announcement-interval-min: 500
announcement-interval-max: 10000
client-port: 8720
[udp]
port: 8721
peer: 192.0.2.7
peer: 2001:0db8:1::7
broadcast: *
broadcast: eth0
broadcast: 192.0.2.255
[udp-multicast]
port: 8722
ttl: 3
multicast: eth0:224.1.1.1
multicast: *:224.1.1.1
multicast: eth0:ff02::1
[tcp]
port: 8721
peer: 192.0.2.7
peer: 2001:0db8:1::7