abstract image of colored circles
Blog Farsight TXT Record

Network Message, Volume 3: Headers and Encoding

Abstract

This article is the third in a multi-part blog series intended to introduce and acquaint the user with Farsight Security’s NMSG suite. This article explores some of the low-level implementation details of the NMSG protocol including header composition and data encoding.

Before reading this article, it is recommended that you read Farsight’s Network Message, Volume 1: Introduction to NMSG and Farsight’s Network Message, Volume 2: Introduction to nmsgtool. This article covers NMSG (protocol) version 2 and nmsg (C library) version 0.9.1.

The NMSG header

NMSG units begin with a small 10 octet header as depicted below:

      0                   1                   2                   3   
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'N'      |      'M'      |      'S'      |      'G'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     Flags     |    Version    |            Length             |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Length (cont)         |           Payload(s)
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . . . . . . . .

The NMSG header always starts with the four octet magic value: N M S G. The Flags octet is next, and depending if payload(s) is a fragment and/or compressed, it can be one, both, or none of the following:

  • NMSG_FLAG_ZLIB: Payload(s) is/are compressed.
  • NMSG_FLAG_FRAGMENT: Payload is a fragment.

The Version octet should be 2. The final header field, Length, is an unsigned four octect integer in network byte order that holds the length in octets of the payload(s).

NMSG payload(s) are encoded using Google Protocol Buffers. They are introduced in the next section.

Google Protocol Buffers

Google Protocol Buffers (sometimes referred to a “protobufs”) are an efficient language and platform neutral way to serialize arbitrary structured data. Protobufs are comparable to to XML but smaller, faster, and more efficient. This makes them an ideal solution to encode the variably typed data that flows through our Security Information Exchange (SIE).

To use protobufs in a program (or library code such as nmsg), the programmer first needs to define what the source data looks like. Again using XML as the model, protobufs are similar to an XML schema. This definition is written using a simple specification language and saved to a text file with a .proto extension. Once defined, this file is compiled using the one of the protobuf compilers. This produces header and source files containing the API to serialize the data.

The nmsg library is written in C so it uses the protobuf-c compiler to generate the API code for its protobuf serialization code.

If you want to learn more, Google maintains great documentation. The following protobuf-heavy sections will make more sense if you are familiar with the .proto specification language.

NMSG Protobuf Data

After the header, the first protobuf encoded message will either be of type Nmsg (which carries one or more NmsgPayload messages) or NmsgFragment (which carries an NmsgFragment message). Both are discussed below.

The .proto definition for Nmsg is shown below:

    message Nmsg
    {
        repeated NmsgPayload    payloads     = 1;
        repeated uint32         payload_crcs = 2;
        optional uint32         sequence     = 3;
        optional uint64         sequence_id  = 4;
    }
  • payloads: The actual NMSG payloads, the .proto for these is explained below.
  • payload_crcs: A CRC used for error detection.
  • sequence: The optional sequence number.
  • sequence_id: The optional sequence number space. This is a randomized 64-bit number identifying the sequence number space that the ‘sequence’ parameter exists in. The sequence_id is used by NMSG consumers to uniquely ID sequence number “flows”.

If the NMSG_FLAG_FRAGMENT flag is set in the NMSG header, then the data part is an NmsgFragment protobuf message, as shown below:

    message NmsgFragment
    {
        required uint32         id           = 1;
        required uint32         current      = 2;
        required uint32         last         = 3;
        required bytes          fragment     = 4;
        optional uint32         crc          = 5;
    }
  • id: Fragment ID used by all fragments in this group (chosen at random by the NMSG library).
  • current: The current fragment in the list.
  • last: The last fragment in the list.
  • fragment: The actual fragment bytes.
  • crc: The CRC of reassembled NMSG.

The NmsgPayload messages contain payload data and are defined as follows:

    message NmsgPayload
    {
        required uint32         vid          = 1;
        required uint32         msgtype      = 2;
        required int64          time_sec     = 3;
        required fixed32        time_nsec    = 4;
        optional bytes          payload      = 5;
        optional uint32         source       = 7;
        optional uint32         operator     = 8;
        optional uint32         group        = 9;
    }
  • vid: The Farsight assigned NMSG vendor ID. These values have a printable name (base, SIE, etc) and corresponding codes which are used here.
  • msgtype: A vendor-specific message type code that signals the serialization type used to encode the payload. Like vid, msgtype has a printable name (dns, encode, ipconn, etc) and corresponding codes. They are defined in more detail below. The vid together with the msgtype can be used to determine the type of data contained in the payload.
  • time_sec: Seconds timestamp of when payload was generated.
  • time_nsec: Nanoseconds timestamp of when payload was generated.
  • payload: The actual NMSG payload data.
  • source: Optional user-defined unsigned 32-bit value, used to uniquely identify an organization submitting data to SIE.
  • operator: Optional unsigned 32-bit value, used to further differentiate the sender of the data. Value is an integer on the wire and on disk, but is intended to be translated into a symbolic string for presentation by a lookup against the nmsg.opalias file.
  • group: Optional user-defined unsigned 32-bit value, used for fine grain winnowing. Value is an integer on the wire and on disk, but is intended to be translated into a symbolic string for presentation by a lookup against the nmsg.gralias file.

Base Message Modules

Accompanying nmsg are the vendor base encoding modules. These provide protobuf serialization for a handful of common use cases. Currently included are the following modules:

  • dns: For encoding DNS RRs, RRsets, and question RRs.
  • dnsqr: For capturing DNS query/response state. This message type is used by Farsight’s Passive DNS sensors.
  • email: For describing email message metadata relating to unsolicited email messages (colloquially referred to as “spam”.)
  • encode: For encapsulating data in other generic formats for transport across SIE. Supported are text, JSON, YAML, MsgPack, and XML.
  • http: For representing hits to HTTP sinkholes.
  • ipconn: For describing an IP connection, a five tuple that includes the transport layer protocol.
  • linkpair: For representing links between web pages.
  • logline: For representing a single line from a log file (i.e.: syslog).
  • ncap: For representing legacy NCAP data.
  • packet: For representing an IPv4 or IPv6 packet.
  • pkt: A legacy encoder for representing packet data, deprecated in favor of packet.
  • xml: For representing XML data.

SIE Message Modules

Farsight maintains a separate package, sie-nmsg, that contains a group of message module plug-ins specifically designed for Farsight’s SIE. These plug-ins are:

  • delay: A legacy encoder used to generate a reduction of SIE Channel 202 containing transaction latencies.
  • dnsdedupe: For encoding de-duplicated and de-duplicated/verified Passive DNS traffic.
  • newdomain: For encoding Newly Observed Domains (NOD) traffic.
  • qr: A legacy encoder intended for use with an early version DNSDB lookup server.
  • reputation: For encoding Distributed Reputation Whiteboard data, an experimental service developed by Farsight to facilitate the real-time sharing of reputation data without a priori knowledge of data types.

Coming up

The next article in the NMSG series will introduce the libnmsg C programming API.

Mike Schiffman is a Protocol Legerdemainist for Farsight Security, Inc.

Read the next part in this series: Farsight’s Network Message, Volume 4: The C Programming API