Remove first or last n characters from a string in PowerShell

From Svendsen Tech PowerShell Wiki
Jump to: navigation, search

This ought to be a good guide for developers on how to remove the first or last n characters from a string in PowerShell. Results of using the 4 different methods are benchmarked and compared to find the fastest way possible given both single strings and string arrays/collections given a few different scenarios.

tl;dr: System.String.SubString() or System.String.Remove() win (almost) consistently, perform almost equivalently during my testing, and should be used if the action is repeated a lot and/or performance is important – even on arrays/collections, where it can outperform how -replace works on collections, if you throw in a foreach loop. I will show you code.

There is also an exception with a collection (1000 repeated, identical strings) and removing the first one or 50 characters from a 100-character string, where the -replace operator does actually outperform SubString(). It does not on removing the last character(s).

Beware, however, that if you use a pipeline ForEach-Object instead, it will in fact be slower than the -replace. In my experience, foreach loops outperform ForEach-Object consistently – at the expense of enumerating the entire collection at once, and probably using more memory.




Code to test speed differences

Study this code to see how I replace the first and last n characters. There will be little spoonfeeding (I said "for developers" initially).

Here is a link to my benchmarking module that I use to get the "Measure-These" command.

Import-Module Benchmark # My PowerShell benchmarking module. Useful.
[string] $String = '0123456789' * 5 + 'abcdefghij' * 5
[int] $n = 1
[int] $StringLength = $String.Length
$RemoveLast = $true
Write-Host -ForegroundColor Cyan "String length: $($String.Length). Removing $n $(if($RemoveLast){'_LAST_'}else{'_FIRST_'}) characters."
if ($RemoveLast) {
    # Remove last
    Measure-These -Count 1000 -ScriptBlock `
        { $String2 = $String.Substring(0, ($String.Length - $n)) },
        { $String2 = $String.Remove($n) },
        { $String2 = -join $String[0..($String.Length - 1 - $n)] },
        { $String2 = $String -replace ".{$n}$" } -Title `
        SubString, StringRemove, IndexIntoString, Regex | ft -AutoSize
}
else {
    # Remove first
    Measure-These -Count 1000 -ScriptBlock `
        { $String2 = $String.Substring($n) },
        { $String2 = $String.Remove(0, $n) },
        { $String2 = -join $String[$n..($String.Length - 1)] },
        { $String2 = $String -replace "^.{$n}" } -Title `
        SubString, StringRemove IndexIntoString, Regex | ft -AutoSize
}

I will be changing the numbers on the following lines to perform a few tests with different string lengths and number of characters to remove, as well as first or last in the string.

[string] $String = '0123456789' * 5 + 'abcdefghij' * 5
[int] $n = 1
$RemoveLast = $false # $true or $false

Short string, few characters

This shows a string length of 100, and removing only one character. This is where regex performs the best among the variants I tested in length and characters to remove: With very few characters to remove. It’s still outperformed vastly by System.String.SubString() and System.String.Remove() – except on collections, where -replace beats them when you replace the first characters.

You will have to test for your scenario, or just go with the “likely best thing”, which is one of the .NET System.String methods.

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 100. Removing 1 _LAST_ characters.

Title/no.       Average (ms) Count Sum (ms)  Maximum (ms) Minimum (ms)
---------       ------------ ----- --------  ------------ ------------
SubString       0.03820       1000 38.19810  2.49940      0.02740     
StringRemove    0.03714       1000 37.13880  2.65950      0.02670     
IndexIntoString 0.49590       1000 495.89780 8.36510      0.36070     
Regex           0.06534       1000 65.34290  2.70940      0.05060     
 

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 100. Removing 1 _FIRST_ characters.

Title/no.       Average (ms) Count Sum (ms)  Maximum (ms) Minimum (ms)
---------       ------------ ----- --------  ------------ ------------
SubString       0.03275       1000 32.75270  2.42420      0.02670     
StringRemove    0.03026       1000 30.25790  0.55800      0.02670     
IndexIntoString 0.49758       1000 497.57560 4.30990      0.35920     
Regex           0.03862       1000 38.61850  0.34690      0.03360     
 

For some reason, during my testing, the Remove() method seemingly (it’s probably not statistically significant) outperforms SubString by a smidge in some cases, while it's the other way around in other cases.

Also notice how the "Regex" average line is significantly faster when you remove the first character compared to when you remove the last. This must be due to some optimization in, or just the behavior of, the .NET regex engine.

Also checking arrays/collections

From now on I also show the behavior when you remove the first and last n characters from a collection of strings, below the other output. Check the descriptions to the far left and you’ll see the word "array" in the titles.

The code added below the code above (in this article) is as follows:

# This creates an array containing 1000 repeated instances of $String
$StringArray = @()
foreach ($derp in 1..1000) {
    $StringArray += $String
}

$CheckArray = $true
if ($CheckArray) {
    if ($RemoveLast) {
        # Remove last
        Measure-These -Count 100 -ScriptBlock `
            {
                $String2Array = foreach ($String in $StringArray) {
                    $String.Substring(0, ($String.Length - $n))
                }
            },
            { $String2Array = $StringArray | foreach { $_.SubString(0, ($String.Length - $n)) } },
            { $String2Array = $StringArray -replace ".{$n}$" } -Title `
            ArraySubString, ArrayForeachObjSubStr, ArrayRegex | ft -AutoSize
    }
    else {
        # Remove first
        Measure-These -Count 100 -ScriptBlock `
            {
                $String2Array = foreach ($String in $StringArray) {
                    $String.Substring($n)
                }
            },
            { $String2Array = $StringArray | foreach { $_.SubString($n) } },
            { $String2Array = $StringArray -replace "^.{$n}" } -Title `
            ArraySubString, ArrayForeachObjSubStr, ArrayRegex | ft -AutoSize
    }
}
 

Longer line, still few characters removed/added

String length: 1000. Removing last 1 character. SubString() and Remove() together at the top. Regex -replace quite a bit slower with a longer string.

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 1000. Removing 1 _LAST_ characters.

Title/no.       Average (ms) Count Sum (ms)    Maximum (ms) Minimum (ms)
---------       ------------ ----- --------    ------------ ------------
SubString       0.03544       1000 35.44030    3.11640      0.02810     
StringRemove    0.03260       1000 32.59500    2.65630      0.02670     
IndexIntoString 4.40322       1000 4,403.22240 9.42120      3.26680     
Regex           0.21610       1000 216.10290   2.44990      0.20270     



Title/no.             Average (ms) Count Sum (ms)     Maximum (ms) Minimum (ms)
---------             ------------ ----- --------     ------------ ------------
ArraySubString        6.74887        100 674.88730    13.41710     4.21740     
ArrayForeachObjSubStr 72.16376       100 7,216.37570  99.13540     63.04710    
ArrayRegex            175.56913      100 17,556.91260 190.24250    171.24350   

Still a string length of 1000. Removing first 1 character. Same results as before, except that, as for a 100-character string length, the regex performs much better when removing the first character compared to the last.

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 1000. Removing 1 _FIRST_ characters.

Title/no.       Average (ms) Count Sum (ms)    Maximum (ms) Minimum (ms)
---------       ------------ ----- --------    ------------ ------------
SubString       0.03380       1000 33.79730    2.35590      0.02710     
StringRemove    0.04274       1000 42.74340    3.40630      0.02710     
IndexIntoString 4.47560       1000 4,475.59980 9.13500      3.25450     
Regex           0.04671       1000 46.71060    2.60890      0.03540     



Title/no.             Average (ms) Count Sum (ms)    Maximum (ms) Minimum (ms)
---------             ------------ ----- --------    ------------ ------------
ArraySubString        6.27379        100 627.37900   14.23900     3.69980     
ArrayForeachObjSubStr 71.28306       100 7,128.30570 106.98760    61.86010    
ArrayRegex            7.44445        100 744.44460   20.03420     2.43140     

Cutting the string in half

String length: 100. Removing the last 50 characters. SubString() and Remove() together at the top as usually. Regex is notably slower than when removing only 1 character, especially on the collection (but still beats ForEach-Object).

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 100. Removing 50 _LAST_ characters.

Title/no.       Average (ms) Count Sum (ms)  Maximum (ms) Minimum (ms)
---------       ------------ ----- --------  ------------ ------------
SubString       0.03145       1000 31.44510  0.33940      0.02740     
StringRemove    0.03502       1000 35.01660  2.47370      0.02630     
IndexIntoString 0.26918       1000 269.17730 3.76230      0.19870     
Regex           0.09482       1000 94.81610  2.13790      0.08710     



Title/no.             Average (ms) Count Sum (ms)    Maximum (ms) Minimum (ms)
---------             ------------ ----- --------    ------------ ------------
ArraySubString        4.73858        100 473.85780   14.09450     3.50170     
ArrayForeachObjSubStr 56.36237       100 5,636.23690 62.15760     53.57200    
ArrayRegex            55.72091       100 5,572.09060 61.06240     54.88730    

String length still 100, now removing the first 50 characters. Regex picks up speed, and gosh darn it if it doesn't actually beat SubString on the 1000-string collection with a short string and cutting it in half. This goes against what I initially thought, but I guess the numbers don’t lie. I ran it multiple times.

PS C:\temp> C:\Dropbox\PowerShell\temp2.ps1
String length: 100. Removing 50 _FIRST_ characters.

Title/no.       Average (ms) Count Sum (ms)  Maximum (ms) Minimum (ms)
---------       ------------ ----- --------  ------------ ------------
SubString       0.03439       1000 34.38780  2.33960      0.02670     
StringRemove    0.03822       1000 38.22220  2.88400      0.02630     
IndexIntoString 0.26908       1000 269.08080 3.88160      0.20090     
Regex           0.04431       1000 44.31000  2.84020      0.03390     



Title/no.             Average (ms) Count Sum (ms)    Maximum (ms) Minimum (ms)
---------             ------------ ----- --------    ------------ ------------
ArraySubString        3.55440        100 355.44000   8.20310      2.76330     
ArrayForeachObjSubStr 55.17603       100 5,517.60250 71.78420     52.21950    
ArrayRegex            2.24021        100 224.02060   5.58950      1.80290     
 

I guess that covers most real-world cases(?). Be well.