I'm dealing with a specific, real-world problem where I wanted to parse an IRC log. A small spanner in the works is that I needed to run a regular expression against the contents of a file - and it needed to span multiple lines. This means I need to read the entire file into memory, which PowerShell's Get-Content does anyway. The log file is 43 MB in size.
Both Perl and PowerShell will use a single core on a CPU with four cores. The CPU is a Quad Core Q9550 at 2.83 GHz.The computer is mostly idle, but honestly I'm running a music player, chat clients, mail clients, etc. as usual. This is by no means, not even a far cry from, the ideal test environment, but I ran the scripts several times consecutively and the numbers were always very similar, so my brain seems satisfied the numbers are quite representative. For proper benchmarking you'd need thousands of runs, consider averages, standard deviations, median and all that jazz. Honestly, it doesn't seem necessary.
By the way, I have since writing this article written a generic PowerShell benchmarking module.$IRCLog = [string] (Get-Content e:\temp\irc.log)$Regex = [regex] '(?s)Delectus> -\*\).*?Vinneren er\.{3}.+?Delectus> -\*> (.+?) med\s+\d+\s+poeng'
$IRCWinners = @{} [regex]::Matches($IRCLog, $Regex) | ForEach-Object { $IRCWinners.($_.Groups[1].Value) += 1 } $IRCWinners.GetEnumerator() | Sort-Object @{e={$_.Value}; Ascending=$false},@{e={$_.Name}; Ascending=$true}
Then I measured how long it took to run a few times and got numbers similar to this:
PS E:\temp> (Measure-Command { .\IRC-Winner-Extract.ps1 > foo.txt }).TotalSeconds 24,6283702
About 24 seconds. Not too impressive?
I replaced the line where I read the file:
$IRCLog = [string] (Get-Content e:\temp\irc.log)
With this:
$IRCLog = [System.IO.File]::ReadAllText('e:\temp\irc.log')
Then I ran the code again and measured how long it took:
PS E:\temp> (Measure-Command { .\IRC-Winner-Extract.ps1 > foo.txt }).TotalSeconds 3,7554382
About 3.76 seconds on that run. A lot faster than the previous attempt. Always similar results on subsequent runs.
I suppose one might conclude that Get-Content is slow for this type of operation. What will be even slower is using "-Delimiter `0" as a parameter to Get-Content. I didn't even have the patience to wait for the script to finish after a few minutes. I know this article is pretty haphazardly put together; my apologies.Code:
use warnings; use strict;my $regex = qr'(?s)Delectus> -\*\).*?Vinneren er\.{3}.+?Delectus> -\*> (.+?) med\s+\d+\s+poeng';
open my $fh, '<', 'irc.log' or die "Failed to open file: $!\n$^E";my $irc_log = do { local $/; <$fh> };
my %winners;$winners{$1} += 1 while $irc_log =~ m/$regex/g;
foreach my $key (sort { $winners{$b} <=> $winners{$a} or $a cmp $b} keys %winners) { printf '%-35s%s', $key, "$winners{$key}\n"; }
Then I ran it from PowerShell using Measure-Command:
PS E:\temp> (Measure-Command { perl .\IRC-Winner-Extract.pl > foo2.txt }).TotalSeconds 0,3780953
About 0.38 seconds. It seems Perl is faster for this specific problem.
I've run the scripts numerous times and the numbers are always close to the results I've posted here.You see that the first time it's run, it takes a while longer. There's probably some caching going on. All the previous example times are after consecutive runs, so whatever caching there is should be in effect.
The code to read the file is now:use File::Slurp; my $irc_log = read_file('e:/temp/irc.log');
And giving it a few spins, we get:
PS E:\temp> 1..3 | %{ (Measure-Command { perl .\IRC-Winner-Extract.pl > foo2.txt }).TotalSeconds } 1,3639568 0,648764 0,6400252 PS E:\temp> 1..3 | %{ (Measure-Command { perl .\IRC-Winner-Extract.pl > foo2.txt }).TotalSeconds } 0,6360413 0,6393869 0,6458041
About 0.64 seconds with File::Slurp.
On larger files you should see greater time savings with File::Slurp. The optional parameter to the read_file() function, '''scalar_ref => 1''', shaves off a significant amount of time (relatively speaking):PS E:\temp> 1..5 | % { (Measure-Command { perl .\IRC-Winner-Extract.pl }).TotalSeconds } 0,5304287 0,5253521 0,5210729 0,5206464 0,5241358
About 0.52 seconds.
Conclusion: Mostly insignificant in most situations.
PS E:\> 1..10 | %{ (Measure-Command { perl -e 1 }).TotalSeconds } 0,0310774 0,0273425 0,0264974 0,0275955 0,0282403 0,0282031 0,0286994 0,0288328 0,0287991 0,0301134
I calculated the average on 10 and 1000 iterations:
PS E:\> 1..10 | %{ (Measure-Command { perl -e 1 }).TotalSeconds } | Measure-Object -Sum | %{ $_.Sum / $_.Count } 0,02691432 PS E:\> 1..1000 | %{ (Measure-Command { perl -e 1 }).TotalSeconds } | Measure-Object -Sum | %{ $_.Sum / $_.Count } 0,027541721
Very similar to the individual results.
Perl Powershell Windows All CategoriesMinimum cookies is the standard setting. This website uses Google Analytics and Google Ads, and these products may set cookies. By continuing to use this website, you accept this.