PowerShell regex to accurately match IPv4 address (0-255 only)

From Svendsen Tech PowerShell Wiki
Jump to: navigation, search




The final result here is a PowerShell/.NET regex that matches only 000-255, four times, separated by periods. The first version I put up could successfully be used for validation, but not extraction. The new version can be used both for validation and extraction of IPv4 addresses from text.

Now there's also a version for validating IPv6 addresses in this article - and one for listing and validating subnet masks in this article.

So I was watching the advanced PowerShell 3 lessons on Microsoft Virtual Academy yesterday, and one thing I noticed stuck with me and has now haunted me to write this article. They used a regex validate pattern as an example of checking that a string of text could be expected to be an IPv4 address. The regex they used was:

\b\d{1,3}\.d{1,3}\.d{1,3}\.\d{1,3}\b

What this really does is check for four sequences of digits (0-9) that can be from 1 to 3 in length, separated by (literal) periods, and separated from possibly surrounding text by a word boundary, \b. A word boundary is defined as sort of "in between the switch from a \w to a \W character in a string" (it's considered "zero-length"), and vice versa (from \W to \w). Most commonly it probably matches between a space or punctuation character, and a number or letter (but that doesn't cover all the cases).

This will also happily match 500.900.999.350, and similar, so I decided to write a more sophisticated regex that actually validates that the number is between 0 and 255. I refused to search the web, since I wanted to use my brain, and it took me about 10 minutes to come up with this to match a digit between 0-255:

^0*(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-5][0-5]|2[0-4][0-9])$

There's more to it than this, though, such as that "10.1" on its own can be a valid IP, or that "192.11010305" can represent 192.168.0.1, but I'm considering the "normal" structure here: 000-255 four times, separated by periods.

This test doesn't on its own absolutely verify it, but it shows that it matches 0-255, and not higher than 255:

PS D:\> 0..260 | ?{ $_ -notmatch '^0*(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-5][0-5]|2[0-4][0-9])$' }
256
257
258
259
260

That's pretty close, but it is still (potentially) flawed since it accepts an arbitrary amount of leading zeroes before the number, so it would accept 000001, 000100, 0255, etc. This might be what you want, but if not, to increase the precision a bit, I cooked this up:

PS D:\> @('00001', '0100', '0255') + @(0..258) |
    ?{ $_ -notmatch '^(?:0?0?[0-9]|0?[1-9][0-9]|1[0-9]{2}|2[0-5][0-5]|2[0-4][0-9])$' }
00001
0100
0255
256
257
258

I think that's perfect? 100% validation?

The rest should be child's play, so let's put this together to match an actual IP address instead of a number between 0 and 255 on its own:

PS D:\> $Octet = '(?:0?0?[0-9]|0?[1-9][0-9]|1[0-9]{2}|2[0-5][0-5]|2[0-4][0-9])'
PS D:\> [regex] $IPv4Regex = "^(?:$Octet\.){3}$Octet$"
PS D:\> '1.10.100.0' -match $IPv4Regex
True
PS D:\> '1.10.100.255' -match $IPv4Regex
True
PS D:\> '1.10.100.256' -match $IPv4Regex
False
PS D:\> '1.256.100.50' -match $IPv4Regex
False

Validation-ready result

Here is a full text version of a regex that works for validation on one line for you:

^(?:(?:0?0?\d|0?[1-9]\d|1\d\d|2[0-5][0-5]|2[0-4]\d)\.){3}(?:0?0?\d|0?[1-9]\d|1\d\d|2[0-5][0-5]|2[0-4]\d)$

I think that's completely accurate for matching an IPv4 address, and only an IPv4 address. You could replace the starting and ending anchors (^ and $) with "\b" if that's more appropriate for your scenario. But read on for a slightly improved version at the bottom of the article.

My main, comprehensive PowerShell regex article is here.

Extracting IPv4 addresses from text

Now what happens if you use this with [regex]::Matches() in an attempt to extract IP addresses? By "accident" I discovered a small problem with the above regex, in that it's sort of "non-greedy", in that the first alternatives between the regex alternation operators match before the last.

I found out by testing in order to document extracting IPs from text.

First I created a multi-line string with IPv4 addresses with some random crap text around them, where the first three lines look like what you see below.

PS C:\> $String = (Invoke-PSipcalc -Net 10.20.30.40/28 -Enumerate | % IPEnumerated | 
% -Begin { $RandomString = 'foo', 'car', 'baz', 'boo', 'blorp' } -Process {
    ($RandomString | Get-Random) + " $_ " + ($RandomString | Get-Random) }) -join "`n"

PS C:\> $String -split '\n' | select -first 3
car 10.20.30.33 foo
boo 10.20.30.34 blorp
car 10.20.30.35 baz

Then I ran the regex from this page against it and discovered the flaw when used for this purpose:

PS C:\> $IPv4Regex = '((?:(?:0?0?\d|0?[1-9]\d|1\d\d|2[0-5][0-5]|2[0-4]\d)\.){3}(?:0?0?\d|0?[1-9]\d|1\d\d|2[0-5][0-5]|2[0-4]\d))'
PS C:\> [regex]::Matches($String, $IPv4Regex) | %{ $_.Groups[1].Value } | select -first 3
10.20.30.3
10.20.30.3
10.20.30.3

Uh oh, it doesn't capture the last digit.

Fortunately, my brain had this idea to change the order of the elements, and thus this new, slightly altered regex was born, and put to the test.

PS C:\> $IPv4RegexNew = '((?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d))'
PS C:\> [regex]::Matches($String, $IPv4RegexNew) | %{ $_.Groups[1].Value }
10.20.30.33
10.20.30.34
10.20.30.35
10.20.30.36
10.20.30.37
10.20.30.38
10.20.30.39
10.20.30.40
10.20.30.41
10.20.30.42
10.20.30.43
10.20.30.44
10.20.30.45
10.20.30.46

I had to test with three digits as well, and all seems well:

PS C:\> $String = (Invoke-PSipcalc -Net 10.20.30.180/28 -Enumerate | % IPEnumerated | 
% -Begin {
    $RandomString = 'foo', 'car', 'baz', 'boo', 'blorp' } -Process {
        ($RandomString | Get-Random) + " $_ " + ($RandomString | Get-Random) }) -join "`n"

PS C:\> [regex]::Matches($String, $IPv4RegexNew) | %{ $_.Groups[1].Value }
    | select -first 4
10.20.30.177
10.20.30.178
10.20.30.179
10.20.30.180

End result (regex)

Here is a full text version of a PowerShell/.NET regex that works for both validation and extraction of IPv4 addresses from text, on one line for you.

Remember that you may need to anchor, by putting "^" first and "$" last (complete match - respectively "starts with" and "ends with"), or by putting "\b" on either side to match a word boundary, as I spoke about earlier in the article.

(?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)

To use it with [regex]::Matches() and capture, you need to add parentheses around the entire regex (stuff inside () is captured/back-referenced) - or you could use $_.Groups[0].Value if you don't have surrounding regex parts.

The System.Net.IPAddress class can also be used for validation

It's been bothering me in the back of my head ever since writing this article that I don't even mention it, so here we go:

PS C:\temp> [ipaddress]'10.0.0.1'


Address            : 16777226
AddressFamily      : InterNetwork
ScopeId            :
IsIPv6Multicast    : False
IsIPv6LinkLocal    : False
IsIPv6SiteLocal    : False
IsIPv6Teredo       : False
IsIPv4MappedToIPv6 : False
IPAddressToString  : 10.0.0.1



PS C:\temp> [ipaddress]'10.0.0.256'
Cannot convert value "10.0.0.256" to type "System.Net.IPAddress". Error: "An invalid IP address was specified."
At line:1 char:1
+ [ipaddress]'10.0.0.256'
+ ~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvalidCastParseTargetInvocation

PS C:\temp>

You could use this with $ErrorActionPreference = "Stop" before the cast (string IP to IPAddress class) to validate and try/catch errors.

Be well.