Logiweb

Logiweb Help: The Logiweb protocol

Home. Help index. Up.

Overview

Logiweb servers communicate with each other and with Logiweb clients using 'Logiweb messages'.

A Logiweb message is a sequence of bytes, i.e. a sequence of bits whose length is a multiple of eight. On this page, messages are written as sequences of bits using mixed endian notation.

In the following, the syntax of messages is presented bottom up, meaning that the actual definition of the syntax of a messages comes last.

Table of contents

Cardinals and bit vectors
Identifiers
Message identifiers
Logiweb identifier
Event identifiers
Class identifiers
Operation identifiers
References
Pages
Parameters
Messages
   Semantics of nop requests
   Semantics of event responses
   Semantics of ping requests and responses
   Semantics of prefix messages
   Semantics of attribute requests and responses
   Semantics of notify messages

Cardinals and bit vectors

bit ::= 0 | 1
byte ::= bit*8
bytes ::= byte*
septet ::= bit*7
middle-septet ::= 1 septet
end-septet ::= 0 septet
cardinal ::= middle-septet* end-septet
vector ::= length bytes
length ::= cardinal

A byte denotes a value in 0..255. As an example, mixed endian 0000 0010 denotes 2.

A middle or end septet denotes a value in 0..127. As examples, the middle septet 1000 0010 and the end septet 0000 0010 both denote 2.

In this context, a 'Cardinal' is a non-negative integer. Cardinals are expressed little endian base 128. As an example,

1000 0011 0000 0010

denotes 3+128*2 = 3+256 = 259.

A 'vector' encodes a sequence of bits. The 'length' field of the vector indicates the number of bits in the vector. The 'bytes' field of the vector contains the actual bits, padded with zero to seven bits to make the length of the the 'bytes' field divisible by eight. As an example, consider the vector

<1,1,0,0, 0,0,0,0,  1,1,0,0, 0,1,0,0>

In mixed endian binary (each byte written in 'reverse' order with the MSB first and LSB last), the vector reads:

0000 0011  0010 0011

The vector above encode the three-bit sequence <1,1,0>. The length field contains binary 3. This is easiest to see in the mixed endian notation. The bytes field contains the three bits <1,1,0> followed by garbage. This is easiest to see in the bit-vector notation.

The grammar of messages is *not* context free because the 'vector' syntax class cannot be expressed in a context free manner.

Identifiers

x00 ::= 0000 0000 | 1000 0000 x00
x01 ::= 0000 0001 | 1000 0001 x00
x02 ::= 0000 0010 | 1000 0010 x00
x03 ::= 0000 0011 | 1000 0011 x00
x04 ::= 0000 0100 | 1000 0100 x00
x05 ::= 0000 0101 | 1000 0101 x00
x06 ::= 0000 0110 | 1000 0110 x00
x07 ::= 0000 0111 | 1000 0111 x00

Each non-negative integer can be represented in more than one way. As an example, the syntax class x05 covers all bit-vectors that represent the number five.

Identifiers are cardinals used for identifying something. When parsing an identifier one may choose to parse it as a cardinal, convert it to a number, and then do further processing based on that number.

Message identifiers

id-request-nop ::= x00
id-respond-event ::= x01
id-request-ping ::= x02
id-respond-ping ::= x03
id-request-attribute ::= x04
id-respond-attribute ::= x05
id-message-notify ::= x06
id-message-prefix ::= x07

At the time of writing, the Logiweb message protocol defines eight kinds of messages. These eight kinds of messages are named 'request-nop','respond-event', 'request-ping', 'respond-ping', 'request-attribute', 'respond-attribute', 'message-notify', and 'message-prefix'; they are identified by the identifiers above.

Logiweb identifier

L ::= 1100 1100
o ::= 1110 1111
g ::= 1110 0111
i ::= 1110 1001
w ::= 1111 0111
e ::= 1110 0101
b ::= 1110 0010
id-version ::= 0000 0001
id-logiweb ::= L o g i w e b id-version

The Logiweb identifier is a cardinal that a Logiweb server places in ping response messages in order to identify itself.

Event identifiers

sorry ::= x00
received ::= x01
rejected ::= x02
event ::= sorry | received | rejected

When a Logiweb server receives a Logiweb message from some client, it responds to that client with zero or one message. A Logiweb server never sends out more than one message in response to a single incomming message. This convention is made to avoid abuse of Logiweb servers for denial of service attacks.

When and if a Logiweb server responds to an incomming message, it occasionally responds using a 'respond-event' message, where the 'respond-event' message contains one 'event' out of the three possible events above.

A Logiweb server responds with a 'received' event when it wants to tell a client that it has received one message from that client. When a Logiweb server uses this response, it does not say anything about what it did to the incomming message. One reason for not saying anything could be that a more informative answer could be a give-away to malicious clients.

'Received' events are useful when using datagram protocols like UDP for carrying Logiweb messages. When a client receives a 'received' event it knows that the UDP packet found its way to the server and does not have to be retransmitted.

If a Logiweb server responds with a 'received' event to a message that is labelled by a prefix, then the received event will be labelled by the same prefix to allow identification (c.f. the treatment of prefix messages later).

A Logiweb server responds with a 'rejected' event when it wants to tell a client that it has received one message from that client and that it is never ever going to process it (so the client can spare resending it). A 'rejected' event typically means that the implementer of the client has made a bug such as letting the client send malformed messages.

A Logiweb server responds with a 'sorry' event when it wants to tell a client that it has received one message from that client but was unable to process it at this time. As an example, if a (possibly malicious) client starts sending billions of requests that require a response that is longer than the request, then the outgoing connection from the Logiweb server may saturate. In this case the Logiweb server may start responding with 'sorry' events since 'sorry' events are only two bytes long.

Class identifiers

update ::= x00
type ::= x01
left ::= x02
right ::= x03
sibling ::= x04
url ::= x05
class ::= update | type | left | right | sibling | url

When requesting information from a Logiweb server, the kind of information requested is identified by a 'class identifier'. At the time of writing, one may request six classes of information, the most important one being the 'url' class. The other classes are useful for Logiweb servers that cross index or mirror each other.

Operation identifiers

remove ::= x00
add ::= x01
operation ::= remove | add

One may request a Logiweb server to 'add' or 'remove' information from the internal state of the server. The most important use is to 'add' an association from a particular reference to a particular url or to 'remove' such an association.

Needless to say, Logiweb servers must be suspicious towards such requests and must have a strategy for rejecting malicious or false requests for update of the data base. The present implementation of the Logiweb server takes the very simple approach to take note of all 'add' and 'remove' requests that come from the host of the server itself, and to reject all others. A Logiweb server should always respond with a 'received' message to a request for changing the state as any response that is more informative could be a give-away to malicious clients.

References

reference ::= version key timestamp
version ::= id-version
key ::= byte*20
timestamp ::= mantissa exponent
mantissa ::= cardinal
exponent ::= cardinal

A reference consists of a version, a RIPEMD-160 hash key, and a time stamp. The version is a cardinal and must be equal to one. The RIPEMD-160 hash key contains 20 bytes (160 bits). The timestamp indicates the time of publication expressed in Logiweb time, i.e. as the number of seconds that have elapsed since International Atomic Time (TAI) 00:00:00 on Modified Julian Day (MJD) 0. The number of seconds is expressed on the form m*10^(-e) where m and e are cardinals.

Pages

vector ::= bibliography bytes
bibliography ::= reference reference* nil
nil ::= 0000 0000
default-format ::= dictionary flat-tree ignored
dictionary ::= entry* x00
entry ::= id arity
id ::= cardinal (whose value is non-zero)
arity ::= cardinal
flat-tree ::= cardinal*
ignored ::= bytes

A vector is the format in which Logiweb pages are stored on disk and transmitted over networks.

The syntax for vectors is irrelevant to the Logiweb protocol, but is included here because it just takes up a few lines in the present context.

A vector consists of a bibliography and some bytes. A bibliography consists of at least one reference.

Reference zero (the first reference) is the reference of the page itself. The key of reference zero must be the RIPEMD-160 hash key for all the bytes following the key (i.e. the timestamp of reference zero, the remainder of the bibliography, and the bytes following the bibliography).

Until further, the bytes following the bibliography have the format stated as default-format.

The dictionary is an association list that associates 'arities' to 'ids'.

The flat-tree is the Polish prefix representation of a body.

Each cardinal in the flat-tree represents one symbol where a symbol is a pair (r,i) of a reference cardinal r and an id i, which is also a cardinal. Each cardinal in the flat-tree has value i*b+x where i is the id of the symbol, b is the smallest power of two that is greater than or equal to the length of the bibliography, and x is an index into the bibliography. The index x is converted to a reference cardinal by looking up the x'th reference in the bibliography, appending a one-byte to it, and converting it to a cardinal using little endian radix 256 encoding.

Unpacking the Polish prefix representation into a tree requires knowledge of all arities of all symbols, i.e. knowledge of the dictionary of all pages mentioned in the bibliography.

Parameters

address ::= vector
value ::= vector
index ::= cardinal
attribute-count ::= cardinal

These four syntax classes are just aliases for cardinals and vectors; they are introduced to enhance the readability of the definitions below.

Messages

request-nop ::= id-request-nop
respond-event ::= id-respond-event event
request-ping ::= id-request-ping
respond-ping ::= id-respond-ping id-logiweb timestamp
request-attribute ::= id-request-attribute address class index
respond-attribute ::= id-respond-attribute address class index length attribute-count timestamp value
message-notify ::= id-message-notify address class operation value
message-prefix ::= id-message-prefix cardinal message
message ::= request-nop | respond-event | request-ping | respond-ping | request-attribute | respond-attribute | message-notify | message-prefix

Semantics of nop requests

A nop (No Operation) request asks a server to do nothing. A server does not respond to nop messages. Nop messages may be used for padding when using a connection based protocol for carrying Logiweb messages.

Semantics of event responses

Event responses are treated above in connection with the definition of the 'event' syntax class.

Semantics of ping requests and responses

A ping request asks 'who are you, and what time is it'. The associated ping response answers the first question using the Logiweb identifier and answers the second question by a timestamp. When mirroring a Logiweb server it is reasonable to start asking what time it is and then querying all information residing on the server. Later on, one may then do incremental mirroring by asking for all information that has changed since last.

Semantics of prefix messages

A prefix message attaches a label to a message. When a server receives message with a label, it copies the label into the response. This is useful when a client needs to know which responses match which requests. Requests tend to be self-contained so, in general, one does not need to have the request to understand the response. But occasionally it may be convenient or even necessary to be able to match requests and responses.

The prefix message is recursively defined so that one may attach an arbitrary number of labels to a message. Servers are expected to handle messages that are up to 65536 bytes long as a minimum. If the buffers of the server is exhausted by a single ingoing or outgoing message (e.g. because of too many labels), then the server is expected to respond with a 'rejected' message.

The ability to attach labels recursively may be used for relays that let Logiweb messages pass through a firewall since each relay may want to add a label to each incomming message and strip it off the associated response.

Semantics of attribute requests and responses

The request-attribute and respond-attribute messages allows a client to query the state of the server.

The state of the server associates values to addresses where values as well as addresses are bit vectors. For each address, the state may store values of several classes (at present, the classes are 'update', 'type', 'left', 'right', 'sibling', and 'url', c.f. the definition of 'class identifiers' above).

Furthermore, for each address and class, the state may associate more than one value, in which case each value is identified by an 'index' which is a cardinal. The indexing is chronological such that the value with index 1 is the oldest, the one with index 2 is the second oldest and so on. When requesting a value with index zero or an index larger than the number of values on store, the Logiweb server provides the newest available value.

A request-attribute message contains an address, a class, and an index, and asks the server to provide the associated value. If a Logiweb server is too pressed to respond to a request-attribute message, it should send a 'sorry' event response. Otherwise, it should respond with a respond-attribute message.

A respond-attribute message contains many fields. First, it contains the address, class, and index of the request-attribute.

If the state of the server did contain values for the given address and class, then the respond-attribute message contains the following: The 'length' field is a copy of the 'length' field of the given 'address'. The 'attribute-count' field contains the total number of attributes with the given address and class. The 'value' field contains the returned value. The 'timestamp' field indicates the time at which that value entered the state of the server. If the requested index is between one and attribute-count, inclusive, then the returned value is the n'th-oldest value on the given address of the given class where n is the value of the given 'index'. Otherwise, the returned value is the newest value on store.

If the state of the server did not contain values for the given address and class, but the server did contain a node for the given address, then the respond-attribute message contains the following: The 'length' field is a copy of the 'length' field of the given 'address'. The 'attribute-count' field is zero. The 'value' field contain the empty bit-vector. The 'timestamp' field is the current server time.

If the state did not contain a node for the given address, then the server finds the longest prefix of the address for which the server does have a node. Call that node the 'closest node'. In that case, the respond-attribute message contains the following: The 'length' field is the length of the prefix. The 'attribute-count' field is the number of sibling attributes of the closest node. If the closest node has no sibling attributes, then the 'value' field is the empty bit-vector and the 'timestamp' field contains the current server time. Otherwise, the 'value' field contains a random among the sibling attributes and the 'timestamp' field contains the timestamp of that attribute.

In the last possibility above, the respond-attribute message is used to redirect the client to another Logiweb server. The present implementation of the Logiweb server never does indirection since interserver communication is not yet implemented. When being redirected from server to server, the 'length' field should be strictly increasing for each redirection.

Semantics of notify messages

A notify message suggests that the server should consider to add or remove an association from the given address and class to the given value. What the server does to such suggestions is highly implementation dependent. The present server just does as told if the suggestion comes from the host on which the server runs and ignores the suggestion otherwise. Other strategies could be to verify the information, possibly in a separate, low priority process, and then add the information when verified.

Klaus Grue, GRD-2006-01-09