Sort strings with numbers more humanely in PowerShell

From Svendsen Tech PowerShell Wiki
Jump to: navigation, search

In this article I describe how to sort any string containing a number, or a mix of such strings and numerical types, in a human-friendly way in PowerShell – including IPv4 addresses, for that matter. The technique (zero-padding) can be re-used in other .NET languages and other programming languages as well.

The code in this article is compatible with PowerShell version 2 and up.

For numbers to be sorted correctly by a computer, a workaround is to make them so-called “zero-padded”. This means e.g. “001, 002 … 010” or “01, 02 … 10” instead of simply “1, 2 … 10”.

We’ve all (at least you are about to – or just did) seen the incorrect sorting where “2” is put after 19, then comes 20, 21, 22, and so on, assuming a sequence of numbers with no missing numbers “in between”, unlike in my example below where “3” comes after 20, but that just further clarifies the point.

Example of wrong sorting

It looks like this when I reproduce the problem by using the range operator “..” to enumerate numbers from 1 to 20 (“1..20”), cast to a string type to get the incorrect sorting and finally sort them with the cmdlet Sort-Object, using no parameters for Sort-Object. It is actually a very flexible cmdlet.

PS C:\temp> 1..20 | %{ [System.String] $_ } | Sort-Object
1
10
11
12
13
14
15
16
17
18
19
2
20
3
4
5
6
7
8
9



The most flexible and versatile solution I could cook up

The most flexible and versatile solution is (as far as I can determine, but I still haven’t finished high school, so don’t listen to me – what do I know) using a regular expression match evaluator to zero-pad all numbers by using the format operator, “-f”, and then sort them after (leaving the original data unchanged, of course).

The only decision you should have to make with this approach is how many digits to account for. If you know the maximum amount of digits is two, you can use: "{0:D2}" -f $NumberHere

Or rather in the case where you use the advanced function, you will use the -MaximumDigitCount parameter.

To make it dynamic, putting the number in a variable works, and that way you can adapt it easily.

I wanted to use a "likely ridiculously large enough number", such as 8192, but the maximum (found by trial and error...) is 100 for the -f operator in .NET/PowerShell.

I will therefore just use the maximum of 100. If you’re processing very large files, you might want to limit it to a more “close-to-reality” number, such as maybe 10 or whatever, depending on your data, but I wouldn’t worry about it until I had to if I were you.

$MaximumDigits = 100
"{0:D$MaximumDigits}" -f $NumberHere

I’ll briefly demonstrate how this works on the basic level since it helps you understand the method cognitively.

PS C:\> "{0:D2}" -f 1
01

PS C:\> ("{0:D2}" -f 1).GetType().FullName
System.String

PS C:\> "{0:D3}" -f 1
001

PS C:\> "{0:D4}" -f 1
0001

PS C:\> "{0:D5}" -f 1
00001

… And so on.

Then we implement this inside the match evaluator and should after that be able to throw most anything at this and get it sorted correctly.

As I finished the paragraph above, I decided to implement it as a small advanced function, so there will be a short delay now, but fortunately you don’t read this in real-time.

The ultimate string or number sorting function

It took me about 9 minutes to write this function, and then an hour to make it OK, document it and have it (hopefully) bug-free (a lot of it spent wondering why it suddenly didn't work, and then guessing the maximum for "-f and 'D##'", which is "100" - go figure - I tried 8192... optimistic, ha). There's some docs at Microsoft here.

function Sort-STNumerical {
    <#
        .SYNOPSIS
            Sort a collection of strings containing numbers, or a mix of this and 
            numerical data types - in a human-friendly way.

            This will sort "anything" you throw at it correctly.

            Author: Joakim Borger Svendsen, Copyright 2019-present, Svendsen Tech.

            MIT License

        .PARAMETER InputObject
            Collection to sort.

        .PARAMETER MaximumDigitCount
            Maximum numbers of digits to account for in a row, in order for them to be sorted
            correctly. Default: 100. This is the .NET framework maximum as of 2019-05-09.
            For IPv4 addresses "3" is sufficient, but "overdoing" does no or little harm. It might
            eat some more resources, which can matter on really huge files/data sets.

        .EXAMPLE
            $Strings | Sort-STNumerical

            Sort strings containing numbers in a way that magically makes them sorted human-friendly
            
        .EXAMPLE
            $Result = Sort-STNumerical -InputObject $Numbers
            $Result

            Sort numbers in a human-friendly way.
    #>
    [CmdletBinding()]
    Param(
        [Parameter(
            Mandatory = $True,
            ValueFromPipeline = $True,
            ValueFromPipelineBypropertyName = $True)]
        [System.Object[]]
        $InputObject,
        
        [ValidateRange(2, 100)]
        [Byte]
        $MaximumDigitCount = 100)
    
    Begin {
        [System.Object[]] $InnerInputObject = @()
    }
    
    Process {
        $InnerInputObject += $InputObject
    }

    End {
        $InnerInputObject |
            Sort-Object -Property `
                @{ Expression = {
                    [Regex]::Replace($_, '(\d+)', {
                        "{0:D$MaximumDigitCount}" -f [Int] $Args[0].Value })
                    }
                },
                @{ Expression = { $_ } }
    }
}

To use it, put it in a file and dot-source it, or just paste the function into your terminal for a “one-off”.

You can download it from GitHub here.

If you put it in a file, you use it by dot-sourcing it, like this:

. .\Sort-STNumerical.ps1

Notice the two periods and the space between them. This means putting the content of the file into your current session. In this case it imports the function Sort-STNumerical. ST is short for Svendsen Tech, as a sort of namespace for (some of) my functions.

Examples of Sort-STNumerical

Here is the same example as above, but now we use Sort-STNumerical, not the built-in Sort-Object.

PS C:\temp> 1..20 | %{ [System.String] $_ } | Sort-STNumerical
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

And we see it sorts correctly.

What about those IPv4 addresses?

Those work as well. I put three IPs, as strings, in the variable $IPAddresses that will demonstrate the incorrect default string sorting, as perceived by humans.

PS C:\temp> $IPAddresses
10.0.0.1
10.0.0.19
10.0.0.2

PS C:\temp> $IPAddresses | Sort-Object # incorrectly sorted
10.0.0.1
10.0.0.19
10.0.0.2

PS C:\temp> $IPAddresses[0].GetType().FullName
System.String

PS C:\temp> $IPAddresses | Sort-STNumerical
10.0.0.1
10.0.0.2
10.0.0.19

PS C:\temp> Sort-STNumerical -InputObject $IPAddresses
10.0.0.1
10.0.0.2
10.0.0.19

The IPv4 System.Version trick

A clever trick in the case of IPv4 addresses in strings, is alternatively, and more easily in that case, casting them to the [System.Version] type in the Sort, as this will introduce logic that does what humans want when it comes to sorting.

That can be accomplished in this way:

PS C:\temp> $IPAddresses | Sort-Object -Property @{ Expression = { [Version] $_ } }
10.0.0.1
10.0.0.2
10.0.0.19