Correctly Validating IP Addresses: Why encoding matters for input validation.

Published: 2021-05-10
Last Updated: 2021-05-10 12:30:47 UTC
by Johannes Ullrich (Version: 1)
1 comment(s)

Recently, a number of libraries suffered from a very similar security flaw: IP addresses expressed in octal were not correctly interpreted. The result was that an attacker was able to bypass input validation rules that restricted IP addresses to specific subnets. 

The vulnerability was documented in (this list is unlikely to be complete):

  • Node.js netmask package [1]
  • Perl (various packages) [2]
  • Python stdlib ipaddress module [3]

All of these vulnerabilities were caused by a similar problem: These libraries attempted to parse IP addresses as a string. Later, standard-based "socket" libraries were used to establish the actual connection. The socket libraries have their own "inet_aton" function to convert an IP address string to a long unsigned integer.

It isn't an accident that security researchers started to look at these libraries right now. More and more of our applications have surrendered data to cloud services and implement APIs to access the data. Server-Side Request Forgery (SSRF) is becoming more of an issue as a result. And to restrict which IP address a particular application may connect to, we often use the libraries above.

As a simple test string, use "". The leading "0" indicates that the address is octal. In "dotted decimal," the address translates to Google's name server

So if you have to validate an IP address: How to do it? First of all: Try to stick to one of the (hopefully now fixed) standard libraries. But in general, it can be helpful to stick to the same library for input validation and to establish the connection. This way, the IP address is not interpreted differently. There is also an argument to be made to just not allow octal. I leave that decision up to you. Standards do allow that notation.

For example, most languages under the hood rely on the C "Sockets" library to establish connections [4]. This library also includes functions to convert IP addresses expressed as a string into an unsigned 32-bit integer (inet_aton) and back (inet_ntoa)  [5]. Take a look if your language is implementing these functions.  Part of the problem of the node.js vulnerability was that node.js implemented its own "ip2long" function. 

So how do you verify if an IP address is inside a subnet? Well, there is a reason netmasks are called netmasks: They are bitmasks. Bitmasking is a very efficient and simple way to verify IP addresses.

First of all: Is this IP address a valid IPv4 address?

My favorite "trick" is to convert the address to an integer with inet_aton. Next, convert it back to a string with inet_ntoa. If the string didn't change: You got a valid address. This will work nicely if you only allow dotted decimal notation. If not: just convert the string with inet_aton and keep it an integer. Not all languages may use the Sockets library for "inet_aton." Some may use their own (buggy?) implementation. So best to check quickly: = = 134744072 .

If you get 168430090, then you got it wrong (that would be MySQL/MariaDB, for example, will get you the wrong answer:

MariaDB [(none)]> select inet_aton('')-inet_aton('');
| inet_aton('')-inet_aton('') |
|                                                     0 |

Output encoding matters for proper input validation!

So now, how do we verify if IP address "A" is in subnet B/Z ?

"z" is the number of bits that need to be the same. So let's create a mask:

mask = 2^32-2^(32-Z)

next, let's do a bitwise "AND" before comparing the result:

If ( A & mask == B & mask ) {
    IP Address A is in B/Z

So in short:

  1. Try to use the same library to validate the IP address and to establish connections. That is probably the easiest way to avoid issues. Sadly, modern languages add additional layers, and it isn't always clear what happens in the background. 
  2. Test! inet_aton('') != inet_aton('').
  3. IPs are long integers, not strings.

So why do I sometimes see IP addresses like "" on

Remember, we started collecting data 20 years ago. I have learned since :) But some of my early sins still haunt me. Over the last few years, we moved to store the IP addresses as 32-bit unsigned integers, but "back in the day," we used 15 digit strings and zero-padded them for easy sorting... sorry. should be mostly gone by now and if you find a case where this is still happening, let me know.

What about IPv6?

That will be a different diary :)

What about hex notation?

There are a number of additional formats that can be used for IP addresses. For a complete collection of different representations of IP addresses [6]. Test them. Let me know what you find :) .



Johannes B. Ullrich, Ph.D. , Dean of Research,

1 comment(s)
ISC Stormcast For Monday, May 10th, 2021


Diary Archives