The following "+" is a quantifier, and this one means "one or more", and it will try to match as many as it can, so-called greedy matching. If you want non-greedy matching, use "\s+?". The only difference is a trailing question mark, which after a quantifier means "make the quantifier non-greedy". Non-greedy means it will try to match as few characters as possible, while getting a complete regexp match, instead of as many as possible. A question mark used after a character, group (in parentheses), character class or somewhere else that's not after a quantifier, means it makes the preceding element optional; that means the regexp will match whether the element is there or not.
For instance you might split on newlines, while possibly accounting for and removing carriage returns, with the regex: $MultiLineString -split "\r?\n" If you have mangled or wrongly formatted data, this can also be useful; it will split on any consecutive newlines and/or carriage returns it finds: $MultiLineString -split "[\r\n]+"Another quantifier is "*", which means "zero or more". So ".*?" means "match zero or more of any character" (except newlines, unless you use the SingleLine option: "(?s)"). It will try to match as few as possible while still having a successful complete regexp match, including the surrounding regexp parts. Be aware that ".*" '''always''' matches, even if you pass in an empty string:
PS C:\> '' -match '.*' True
Here's an example where I split a string with different whitespace using the regexp "'''\s+'''" which, as I described above, means "match one or more whitespace characters, greedily (as many as you can)":
PS C:\> "a `t b`t`t`t c `n `td e`tf" -split '\s+' a b c d e f
And to verify that there aren't any surrounding spaces or whitespace, I wrap it in a Foreach-Object that prepends and appends a single quote:
PS C:\> "a `t b`t`t`t c `n `td e`tf" -split '\s+' | Foreach { "'" + $_ + "'" } 'a' 'b' 'c' 'd' 'e' 'f'
To output it on one line, separated by commas, you could enclose the whole pipeline in parentheses and use the ''-join'' operator, like this:
PS C:\> ("a `t b`t`t`t c `n `td e`tf" -split '\s+' | Foreach { "'$_'" } ) -join ', ' 'a', 'b', 'c', 'd', 'e', 'f'
PS C:\> ("a `t b`t`t`t c `n `td e`tf" -split '\s+')[2,4] c e
To get the first three elements, you could index and use the range operator (".."):
PS C:\> ("a `t b`t`t`t c `n `td e`tf" -split '\s+')[0..2] a b c
PS C:\> 'foo bar baz' -split ' ' foo bar baz PS C:\>
The obvious trick to pull out here to filter out empty elements is simply splitting on any amount (more than nothing) of spaces in sequence, trying to match as many as possible:
PS C:\> 'foo bar baz' -split ' +' foo bar baz PS C:\>
You can also use Where-Object and something like this to filter out empty elements:
# Skip elements/lines that do NOT contain non-whitespace. # "\S" is the complement/opposite of \s, so it includes # anything that is not defined as whitespace. PS C:\> 'foo bar baz' -split ' ' | Where { $_ -match '\S' } foo bar baz # Skip elements that have zero length. PS C:\> 'foo bar baz' -split ' ' | Where { $_.length -gt 0 } foo bar baz # Since an empty string is considered false, you can also simply use: PS C:\> 'foo bar baz' -split ' ' | Where { $_ } foo bar baz
There are a few exceptions, most notably that a "^" character first in a character class will negate the rest of the characters in the character class. So [^abc] will match anything except the letters a, b and c.
The character class [foo\s+] will match one instance of either whitespace (\s), the letter "f", the letter "o" or the literal symbol "+".A character class meta character is "-", which indicates a range like "[a-f0-9]" for matching hex digits or "[a-z]" for matching English alphabet letters.
To split on, for instance, the characters "-", "," and "_" (dash, comma, underscore) - you can use the character class [,_-]. Notice how I said "-" is special inside character classes, but there's a special case when it comes first or last, where it's interpreted literally (makes sense, right?). To avoid future errors where you might add something before or after the dash, I recommend always escaping it. You can also put it anywhere if you escape it. A small quirk here, is that you need to escape it using a backslash, "\", not the PowerShell escape character "`". Demonstrated in the examples below.
PS C:\> 'a-b,c,d_e' -split '[,_\-]' a b c d e
Like I mentioned above, you can escape the character class range operator with a backslash ("\"). Here i ''-join'' the string, and effectively replace the characters I split on with a hash sign ("#"):
PS C:\> 'a-b,c,d_e' -split '[,\-_]' -join '#' a#b#c#d#e
PS C:\> '3.2.1' -split '.' PS C:\> ('3.2.1' -split '.').count 6 PS C:\> ('3.2.1' -split '.')[0].length 0 PS C:\> ('3.2.1' -split '.')[0].GetType().FullName System.String
To split on a literal period/dot, you need to escape it, and not using the PowerShell escape character, "`", but a backslash: "\". Here's the result we wanted in this case:
PS C:\> '3.2.1' -split '\.' 3 2 1 PS C:\>
PS C:\> 'a,b;c,d'.Split(',') -join ' | ' a | b;c | d PS C:\> 'a,b;c,d'.Split(',;') -join ' | ' a | b | c | d
You can also remove empty elements using [StringSplitOptions]::RemoveEmptyEntries. Below, I demonstrate how you get empty elements, and how they can be removed by adding a parameter to the Split() method.
PS C:\> 'a,,,d'.Split(',') -join ' | ' a | | | d PS C:\> 'a,,,d'.Split(',', [StringSplitOptions]::RemoveEmptyEntries) -join ' | ' a | d
An alternative way of effectively removing empty elements in the form of these doubled-up delimiters (commas), is by using -replace to replace multiple commas with a single comma. This of course doesn't consider quoted fields containing delimiters/commas in the data.
PS C:\> ('a,,,d' -replace ',+', ',').Split(',') -join ' | ' a | d# or with -split:
PS C:\> 'a,,,d' -replace ',+', ',' -split ',' -join ' | ' a | d# or slicker still:
PS C:\> 'a,,,d' -split ',+' -join ' | ' a | d
Minimum cookies is the standard setting. This website uses Google Analytics and Google Ads, and these products may set cookies. By continuing to use this website, you accept this.