User Guides

DNSDB Farsight Compatible Regular Expressions (FCRE) Reference Guide

Farsight Compatible Regular Expressions (FCRE)

DNSDB Farsight Compatible Regular Expressions (FCRE) provides regular expression (regexp) functionality for searching DNS hostnames and rdata values in DNSDB. The regexp searches are evaluated against the DNS master file form of the hostnames and rdata values, which by design contains only printable ASCII characters. All non-printable characters, including octets outside the ASCII range, are converted to “\DDD” escape sequences, where “DDD” is a three digit decimal number per RFC 1035. This is only applicable to RData (RHS) queries.

For this limited use case, DNSDB FCRE provides a simplified subset of the POSIX Extended Regular Expression syntax, with the most notable restrictions being:

  1. Only printable characters are allowed in a regexp.
  2. Hexadecimal or octal escape sequences are not allowed in a regexp.
  3. Only special characters may be escaped with ‘\’. Note that ‘]’ and ‘}’ are not considered special characters, but ‘[‘ and ‘{‘ are.
  4. POSIX collating elements (e.g., [=ch=], [.a.]) in character classes are not supported. The sequences [= and [. are not allowed in character classes.
  5. As in POSIX regexps, the character ‘\’ has no special meaning within a character class, so the class [\w] matches the characters ‘\’ or ‘w’.
  6. Capturing groups and backreferences are not supported.

Note that restriction (3) means that PCRE extensions such as ‘\w’ and ‘\d’ are not allowed in FCRE regexps.

Regexp Syntax

A regular expression is a string of printable characters, with the following characters given special meaning:

  • \ — Escape the next character, which must be a special character. A regexp may not end with an unescaped ‘\’, or contain an unescaped ‘\’ followed by a character other than ‘\’ or the characters listed below, except inside of a character class.

  • ^ — Matches the beginning of the subject string.

  • $ — Matches the end of the subject string.

  • [ — Begin a character class

  • . — A special character class matching any character.

  • ( — Begin a sub-pattern. Sub-patterns may occur within other sub-patterns.

  • ) — End a sub-pattern.

  • | — Specify an alternative match. A pattern or subpattern matches if the pattern before or after the ‘|’ matches.

  • * — Match the previous character, character class, or subpattern zero or more times.

  • ? — Match the previous character, character class, or subpattern at most once.

  • + — Match the previous character, character class, or subpattern at least once.

  • { — If followed by a character other than a decimal digit, is treated as a literal ‘{‘ character. Such a ‘{‘ may be escaped with ‘\’ even though it is not technically a special character in this context.

If followed by a decimal digit, begins a bounded match specification. “{n}” matches exactly n repetitions of the previous character, character class, or subpattern. “{n,m}” with m >=n matches at least n but at most m repetitions.

Character Class Syntax

A character class is a set of characters enclosed between an opening ‘[‘ and a closing ‘]’. Within the character class, the following characters are handled specially:

  • ^ — If the first character after the opening ‘[‘, denotes a negated character class, i.e. a class which matches any character not listed in the remainder of the class
  • ] — If the first character after the opening ‘[‘ or ‘[^’, encodes a literal ‘]’ as a member of the class. A ‘]’ after the first character after the opening ‘[‘ or ‘[^’ ends the character class.
  • - — If the first character after the opening ‘[‘ or ‘[^’ or the last character before the closing ‘]’, encodes a literal ‘-‘ as a member of the character class.If between two characters A and B, encodes the range of characters between A and B, inclusive, as members of the character class. The character A must occur before B in ASCII encoding.

The sequences [. and [= are not allowed between the opening [or [^ and the closing ], to prevent confusion with unsupported POSIX collation sequences and collation classes.

If the sequence [: appears in a character class, it must be the beginning of one of the following POSIX character classes:

  • [:alnum:] — Alphanumeric characters 0-9, A-Z, and a-z
  • [:alpha:] — Alphabetic characters A-Z, a-z
  • [:blank:] — Blank characters (space and tab)
    • Only printable characters occur in searchable strings and space is the only printable whitespace character, thus use of [:blank:] is equivalent to a space character.
    • Tabs in data appear as the escape sequence \009 and can be matched with \009.
  • [:cntrl:] — Control characters
    • Only printable characters occur in searchable strings, so [:cntrl:] will not match any characters.
    • Control characters in data will appear as \DDD escape sequences sequences. To match one of those, you will need to backslash-quote the backslash. Match with \[:digit:]{3} in a regular expression.
  • [:digit:] — Decimal digits 0-9
  • [:graph:] — Any printable character other than space.
    • Only printable characters occur in searchable strings, thus a character class containing [:graph:] is equivalent to [^ ] (negated character class containing only a space).
  • [:lower:] — Lower case alphabetic characters a-z
    • Hostnames will be folded to lower case, thus use of [:lower:] is equivalent to [:alpha:].
  • [:print:] — Any printable character
    • Only printable characters occur in searchable strings, so [:print:] will match any character.
  • [:punct:] — Punctuation characters (printable characters other than space and [:alnum:])
  • [:space:] — Any whitespace character (tab, newline, vertical tab, form feed, carriage return, and space)
    • The space character is the only printable whitespace character, thus use of [:space:] is equivalent to a space character.
    • Tabs in data appear as the escape sequence \009 and can be matched with \009. The other characters can also be matched by searching for their decimal equivalent.
  • [:upper:] — Upper case alphabetic characters A-Z
    • Since all of our data is indexed as lower-case, this is not useful as it is equivalent to [:lower:].
  • [:xdigit:] — Hexadecimal digits 0-9, a-f, A-F

The above named character classes must appear inside an enclosing [ and ], e.g. [[:digit:][:punct:]] to match a digit or punctuation character. Without the enclosing braces, [:digit:] will match the characters :, d, i, g, or t.

Neither the above character classes nor a character range may begin or end a character range. For example, the character class expressions [0-[:alpha:]] and [a-n-z] are invalid.

All other characters between the opening [ or [^ and the closing ] are added to the character class, including the backslash \ character.

There is no way to express a character class containing a single ^ character: an escaped \^ should be used instead of a character class.

Important notes

  • Regular expression searches are not case sensitive.
  • Regular expression patterns are not “anchored” front and back by default. (This is a major difference from glob searches.)
  • To exactly match a literal . (such as between labels in a DNS name), you need to backslash-quote the ., for example google\.com. This is not necessary if the . is inside a character class, for example foo[.-_]bar. If you don’t backslash-quote the ., for example google.com then it will match ‘googlexcom’, ‘google_com’, etc.
  • All rrnames (i.e. hostnames) in the DNS dataset end in a ., which must be accounted for in regular expressions.
  • All well-formed rdata we currently index in the DNS dataset ends in a . or a ", which should be accounted for in regular expressions.
  • There must be at least two consecutive non-wildcard characters in the pattern.

Examples

Some example regular expressions and some of the matching values

  • www\..*\.com — Hostnames with a label ending in “www.” and a later label starting with “.com”.
    • www.example.com.
    • dev-www.subdomain.example.com.
    • www.example.com.cdn.net.
    • stage-www.dev.community.org.
  • ^www\..*\.com — Hostnames starting with “www.” and ending in “.com”.
    • No Results
      • Hostnames in the DNS dataset contain a trailing “.”, which must be accounted for in regexps. A search for “www..*.com$” will not match any hostnames.
  • ^www\..*\.com\.$ — Hostnames starting with “www.” and ending in “.com.”
    • www.example.com.
    • www.subdomain.example.com.
  • ^www\.[^.]+\.com\.$ — Hostnames starting with “www.” and ending with “.com” with no other dots in between.
    • www.example.com.
    • www.other-domain.com.
  • ^((dev|stage)-)?www\.[^.]+\.(net|edu)\.$ — Hostnames starting with “www” optionally preceded by a “dev-” or “stage-” prefix in a .net or .edu domain.
    • www.college.edu
    • dev-www.isp.net
  • ^"v=spf1 .* ~all"$ — TXT records encoding an SPF policy with a ~all default
    • “v=spf1 a mx ~all”
    • “v=spf1″ ” a ” “10.2.0.0/16″ ” ~all”
  • (^|[-._])star([-_]?)z[-._] — Hostnames that start with “star”, or have “star” as a label or otherwise separate from other letters/digits, followed by an optional dash or underscore, then a z, then a period, dash or underscore. This might be used to look for a visibly embedded trademark.
    • star-z.at.
    • edge-star-z-mini-shv-02-mia3.goldmansachs.de.
    • starz.webex.com.
    • shooting-starz.tv.

Additional Information