Farsight TXT Record

Network Message, Volume 3: Headers and Encoding

Written by: 
Published on: 
Feb 11, 2015
On This Page
Share:

Abstract

This article is the third in a multi-part blog series intended to introduceand acquaint the user with Farsight Security’s NMSG suite. This articleexplores some of the low-level implementation details of the NMSG protocolincluding header composition and data encoding.

Before reading this article, it is recommended that you readFarsight’s Network Message, Volume 1: Introduction to NMSG and Farsight’s Network Message, Volume 2: Introduction to nmsgtool. This article covers NMSG (protocol) version

2

and

nmsg

(C library) version

0.9.1

.

The NMSG header

NMSG units begin with a small 10 octet header as depicted below:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'N' | 'M' | 'S' | 'G' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags | Version | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length (cont) | Payload(s)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . . . . . . . .

The NMSG header always starts with the four octet magic value:

N

M

S

G

.The

Flags

octet is next, and depending if payload(s) is a fragment and/orcompressed, it can be one, both, or none of the following:

NMSG_FLAG_ZLIB

  • : Payload(s) is/are compressed.

NMSG_FLAG_FRAGMENT

  • : Payload is a fragment.

The

Version

octet should be

2

. The final header field,

Length

, is anunsigned four octect integer in network byte order that holds the length in octets of the payload(s).

NMSG payload(s) are encoded using Google Protocol Buffers. They are introduced in the nextsection.

Google Protocol Buffers

Google Protocol Buffers (sometimes referred to a “protobufs”) are an efficientlanguage and platform neutral way to serialize arbitrary structured data.Protobufs are comparable to to XML but smaller, faster, and more efficient.This makes them an ideal solution to encode the variably typed data that flowsthrough our Security Information Exchange (SIE).

To use protobufs in a program (or library code such as nmsg), the programmerfirst needs to define what the source data looks like. Again using XML as themodel, protobufs are similar to an XML schema. This definition is writtenusing a simple specification language and saved to a text file with a

.proto

extension.Once defined, this file is compiled using the one of theprotobuf compilers. This produces header and source files containing the API toserialize the data.

The nmsg library is written in C so it uses theprotobuf-c compilerto generate the API code for its protobuf serialization code.

If you want to learn more, Google maintains great documentation. The following protobuf-heavysections will make more sense if you are familiar with the

.proto

specification language.

NMSG Protobuf Data

After the header, the first protobuf encoded message will either be of type

Nmsg

(which carries one or more

NmsgPayload

messages) or

NmsgFragment

(which carries an

NmsgFragment

message). Both are discussed below.

The .proto definition for

Nmsg

is shown below:

message Nmsg
{
repeated NmsgPayload payloads = 1;
repeated uint32 payload_crcs = 2;
optional uint32 sequence = 3;
optional uint64 sequence_id = 4;
}

payloads

  • : The actual NMSG payloads, the

.proto

  • for these is explainedbelow.

payload_crcs

  • : A CRCused for error detection.

sequence

  • : The optional sequence number.

sequence_id

  • : The optional sequence number space. This is a randomized64-bit number identifying the sequence number space that the ‘sequence’parameter exists in. The

sequence_id

  • is used by NMSG consumers to uniquelyID sequence number “flows”.

If the

NMSG_FLAG_FRAGMENT

flag is set in the NMSG header, then the data partis an

NmsgFragment

protobuf message, as shown below:

message NmsgFragment
{
required uint32 id = 1;
required uint32 current = 2;
required uint32 last = 3;
required bytes fragment = 4;
optional uint32 crc = 5;
}

id

  • : Fragment ID used by all fragments in this group (chosen at random bythe NMSG library).

current

  • : The current fragment in the list.

last

  • : The last fragment in the list.

fragment

  • : The actual fragment bytes.

crc

  • : The CRC of reassembled NMSG.

The

NmsgPayload

messages contain payload data and are defined as follows:

message NmsgPayload
{
required uint32 vid = 1;
required uint32 msgtype = 2;
required int64 time_sec = 3;
required fixed32 time_nsec = 4;
optional bytes payload = 5;
optional uint32 source = 7;
optional uint32 operator = 8;
optional uint32 group = 9;
}

vid

base

  • ,

SIE

  • , etc) and corresponding codes which are used here.

msgtype

  • : A vendor-specific message type code that signals theserialization type used to encode the payload. Like

vid

  • ,

msgtype

  • hasa printable name (

dns

  • ,

encode

  • ,

ipconn

  • , etc) and correspondingcodes. They are defined in more detail below. The

vid

  • together with the

msgtype

  • can be used to determine the type of data contained in thepayload.

time_sec

  • : Seconds timestamp of when payload was generated.

time_nsec

  • : Nanoseconds timestamp of when payload was generated.

payload

  • : The actual NMSG payload data.

source

  • : Optional user-defined unsigned 32-bit value, used to uniquelyidentify an organization submitting data to SIE.

operator

  • : Optional unsigned 32-bit value, used to further differentiatethe sender of the data. Value is an integer on the wire and on disk, butis intended to be translated into a symbolic string for presentation by alookup against the

nmsg.opalias

  • file.

group

  • : Optional user-defined unsigned 32-bit value, used for fine grainwinnowing. Value is an integer on the wire and on disk, but is intended tobe translated into a symbolic string for presentation by a lookup againstthe

nmsg.gralias

  • file.

Base Message Modules

Accompanying nmsg are the vendor

base

encoding modules. These provideprotobuf serialization for a handful of common use cases. Currently includedare the following modules:

  • dns: For encoding DNS RRs, RRsets, and question RRs.
  • dnsqr: For capturing DNS query/response state. This message type is used byFarsight’s Passive DNS sensors.
  • email: For describing email message metadata relating to unsolicited emailmessages (colloquially referred to as “spam”.)
  • encode: For encapsulating data in other generic formats for transportacross SIE. Supported are text, JSON, YAML, MsgPack, and XML.
  • http: For representing hits to HTTP sinkholes.
  • ipconn: For describing an IP connection, a five tuple that includes thetransport layer protocol.
  • linkpair: For representing links between web pages.
  • logline: For representing a single line from a log file (i.e.: syslog).
  • ncap: For representing legacy NCAP data.
  • packet: For representing an IPv4 or IPv6 packet.
  • pkt: A legacy encoder for representing packet data, deprecated in favor of

packet

  • .
  • xml: For representing XML data.

SIE Message Modules

Farsight maintains a separate package,

sie-nmsg

, that contains a group ofmessage module plug-ins specifically designed for Farsight’s SIE. Theseplug-ins are:

  • delay: A legacy encoder used to generate a reduction of SIE Channel 202containing transaction latencies.
  • dnsdedupe: For encoding de-duplicated and de-duplicated/verifiedPassive DNS traffic.
  • newdomain: For encoding Newly Observed Domains (NOD) traffic.
  • qr: A legacy encoder intended for use with an early version DNSDB lookupserver.
  • reputation: For encoding Distributed Reputation Whiteboard data, an experimentalservice developed by Farsight to facilitate the real-time sharing of reputationdata without a priori knowledge of data types.

Coming up

The next article in the NMSG series will introduce the

libnmsg

C programmingAPI.

Mike Schiffman is a Protocol Legerdemainist for Farsight Security, Inc.

Read the next part in this series: Farsight’s Network Message, Volume 4: The C Programming API