
As part of our research for “Farsight Security Global Internationalized DomainName (IDN) Homograph, Q2 2018 Report“, Farsight Security discovered a bug in thepopular libidn and libidn2 C libraries, which are used to buildInternationalized Domain Name in Applications (IDNA)-aware software. Dependingon how the code is written, this bug could lead to a security vulnerability intrusting applications. It occurs in the Punycode decoder when pathologicalinputs decode to illegal Unicode code point values.
While we worked closelywith the vendor to report and patch the vulnerability, it is important forapplication programmers and end-users to patch their code.
To get the most from this article, the reader should be familiar with thefollowing technologies:
The functions responsible for decoding Punycode into Unicode in bothlibidn andlibidn2 can be coerced togenerate invalid Unicode code point values yet return successfully.These resultant code point values are larger than the maximum valid Unicodecode point of 0x10FFFF (1,114,112) and depending on how they are subsequentlytreated by application code, these values may result in a program crash orother undefined behavior including possible arbitrary code execution.
The simplest Punycode string that triggers this behavior is
xn--0000h
, whichdecodes to a single “code point” value of U+127252 (1,208,914) – and is not alegal Unicode code point. This is shown below using a simple test program“punydecode” (available in Appendix A).
$ echo "xn--0000h" | punydecode -
0000h:1:U+127252
The libidn and libidn2 libraries are open source implementations of IDNA(libidn implements IDNA2003 whilelibidn2 implements IDNA2008). They bothprovide APIs to encode and decode internationalized domain names.
Inside the latest versions of both libraries (1.35 for libidn and 2.0.5 forlibidn2) are two almost identical¹ functions responsible for decoding Punycodestrings into Unicode code points. Libidn calls this function
punycode_decode()
while libidn2 calls it
_idn2_punycode_decode()
².
From here on out, we will refer to both functions as simply the “Punycodedecoder”.
The Punycode decoder is an implementation of the algorithm described insection 6.2 of RFC 3492.As it walks the input string, the Punycode decoder fills the output array withdecoded code point values. The output array itself is typed to hold unsigned32-bit integers while the Unicode code point space fits within 21bits. This leaves a remainder of 11 unused bits that can result in theproduction of invalid Unicode code points if accidentally set. Thevulnerability is enabled by the lack of a sanity check to ensure decoded code points are less than the Unicode code point maximum of 0x10FFFF. As such, for offending input, unchecked decoded values are copied directly to the output arrayand returned to the caller.
The bug can be fixed simply by checking for excessive code point values priorto insertion into the output array. Something as simple as the following willwork:
/* decoding of basic string */
if (code_point > 0x10FFFF)
return punycode_bad_input;
/* insertion into the output array */
A similar patch has been pushed to the libidn and libidn2 repositories andshould be readily available.
For the remediation and disclosure of this security condition, Farsightworked directly with Tim Rühsen, the maintainer of libidn and libidn2. We wouldlike to thank him for his prompt and detailed responses at every point in theprocess.
Finally, Farsight did not discover this vulnerability through a code audit,but rather, through an encounter with a malformed IDN in the wild. Whilewe won’t (currently) release details on the domain in question, we feel it’simportant to inform others that there are live hostnames out there that maytrigger this bug, and thus that it is important to upgrade dependentlibidn / libidn2 packages.
The following program can be used to check Punycode input strings for overflow.It expects input as single Punycode-encoded labels with or without the ACEprefix and can read from a file or a pipeline.
If there is no error, the output is colon separated as per the following:
input punycode:code point count:code points
.
For conforming inputs punydecode will prepend a lowercase
u+
before eachcode point:
$ echo "xn--8a" | punydecode -
8a:1:u+00a2
For offending inputs it will prepend an uppercase
U+
before each code point:
$ echo "xn--0000h" | punydecode -
0000h:1:U+127252
Additionally, the program tests the reversibility of the input Punycode stringand will emit an “encode mismatch” error if the decoded code points don’tencode to the original Punycode.
To build punydecode.c, you’ll need “idn2.h”, “puny_decode.c”, “puny_decode.c”,and “punycode.h” from libidn2 to reside in the same directory. You can buildwith something like:
gcc -Wall -O0 -ggdb punydecode.c puny_decode.c puny_encode.c -o punydecode
.
/*
* alabel punycode decoder
*
* Copyright (c) 2018 by Farsight Security, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <errno.h>
#include "idn2.h"
#include "punycode.h"
int
main(int argc, char **argv)
{
int rc;
FILE *f;
char *line_buf = NULL, line[BUFSIZ], *p, alabel[BUFSIZ];
ssize_t line_len;
size_t line_cap = 0;
uint32_t i, ulabel[BUFSIZ] = {0};
size_t ulabel_len = sizeof (ulabel), alabel_len = sizeof (alabel);
if (argc != 2) {
fprintf(stderr, "usage: %s <infile> || cat <infile> | %s -\n", argv[0], argv[0]);
return (EXIT_FAILURE);
}
if (strcmp(argv[1], "-") == 0)
f = stdin;
else {
f = fopen(argv[1], "r");
if (f == NULL) {
fprintf(stderr, "error: unable to open %s: %s\n",
argv[1],
strerror(errno));
return (EXIT_FAILURE);
}
}
while ((line_len = getline(&line_buf, &line_cap, f)) > 0) {
strcpy(line, line_buf);
p = line;
line[line_len - 1] = '\0';
if (line[0] == 'x' && line[1] == 'n' && line[2] == '-' && line[3] == '-')
p += 4;
rc = _idn2_punycode_decode(strlen(p), p, &ulabel_len, ulabel);
if (rc != IDN2_OK) {
fprintf(stderr, "%s:decode err: %d\n", p, rc);
continue;
}
fprintf(stderr, "%s:%zu:", p, ulabel_len);
for (i = 0; i < ulabel_len; i++) {
if (ulabel[i] > 0x10FFFF)
/* overflow */
fprintf(stderr, "U+%04x", ulabel[i]);
else
fprintf(stderr, "u+%04x", ulabel[i]);
if (i + 1 < ulabel_len)
fprintf(stderr, ",");
}
/* check reversibility */
rc = _idn2_punycode_encode(ulabel_len, ulabel, &alabel_len, alabel);
if (rc != IDN2_OK) {
fprintf(stderr, "%s:encode err: %d\n", p, rc);
continue;
}
if (alabel_len > 0 && strncasecmp(alabel, p, strlen(p)) != 0)
fprintf(stderr, ":encode mismatch %s\n", alabel);
else
fprintf(stderr, "\n");
}
fclose(f);
return (EXIT_SUCCESS);
}
¹ The only difference is libidn’s support for case-awareness. Since IDNA2008removes support for uppercase characters, libidn2 has no such support.
² This function is ostensibly private and not directly usable through thelibidn2 API. In fact, access to it is “protected” by a call to thelibunistring function
u8_to_u32()
which validates the Punycode before handing it off to
_idn2_punycode_decode()
. However, the function is not static in scope and isexternally accessible. According to the libidn2 README, the library is intendedto be drop-in replacement for libidn:
“This library is backwards (API) compatible with the libidn library. Replacing the idna.h header with idn2.h into a program is sufficient to switch the application from IDNA2003 to IDNA2008 as supported by this library.”
As such, if an application programmer upgrades from libidn to libidn2 andhas an IDNA-based application that directly calls
punycode_decode()
, and doessomething like the following, program will be vulnerable the overflow:
extern _IDN2_API int
_idn2_punycode_encode (size_t input_length, const uint32_t input[],
size_t * output_length, char output[]);
extern int
_idn2_punycode_decode (size_t input_length, const char input[], size_t *
output_length, uint32_t output[]);
#define punycode_decode _idn2_punycode_decode
#define punycode_encode _idn2_punycode_encode
/* ...libidn-based code here ...*/
Furthermore, if an application programmer is concerned about bloat and/orperformance, the Punycode source files might be cherry-picked directly from thelibrary, bypassing any protections afforded by
u8_to_u32()
.
Mike Schiffman is an IDNA2020 Hopeful for Farsight Security, Inc.