Get Folder Size with PowerShell, Blazingly Fast

From Svendsen Tech PowerShell Wiki
Jump to: navigation, search

This Get-FolderSize script uses a super fast Scripting.FileSystemObject COM object, with an optional fallback to robocopy.exe with the logging only option (no actual copying), to list directories where you do not have access to one or more files or subfolders (then the COM object returns $null). Both methods are much faster than Get-ChildItem and .NET - and also have the extremely useful benefit of avoiding the long path limitation in .NET (and therefore PowerShell).

Limit which output fields are displayed with Select-Object if you only want path and total bytes, for instance.

It works against UNC paths. Sometimes I've had to do something like a "Get-ChildItem -Directory \\srv\share | % FullName | Get-FolderSize" because robocopy couldn't list the (DFS) root directly for some reason (v3 and up syntax in that example).

Should work with PowerShell version 2 and up. PowerShell version 2 comes in-the-box with Server 2008 R2 and Win 7.

The fallback to robocopy.exe was inspired by Boe Prox' post on using robocopy.exe to avoid the 248-260 max path character limitation (PathTooLong exception) seen with Get-ChildItem, and to still be able to list items in deep directory structures. Read Boe's post for more background information and details.

The default is now the COM method which does not have long path problems. The robocopy fallback does not have problems with long paths either, and although slower than COM, it enumerates the files and folders blazingly fast compared to the alternatives.

The alternatives being Get-ChildItem in conjunction with Measure-Object or some such (although fine for certain situations), and .NET methods for listing files and folders with some form of recursion. Both these methods also suffer from the long path limitation.

CPU usage seems to be at around 35-50 % while it's processing with robocopy.exe (-RoboOnly will force it) on my desktop with four cores, which is also running other stuff.

Parsing text is somewhat of a skill of mine, so when I wrote the robocopy part, I cooked up some logic and a regex to extract what I think are the presumably relevant details in addition to the folder size from the robocopy summary. The code can also serve as a basic example of how to parse the robocopy.exe summary report.

One thing I quickly became aware of when a Norwegian coworker tested the code, is that robocopy's output needs to be in English for the fallback to work, and when he got it in Norwegian, he got the warning about an unexpected format. I tried tweaking the current "culture" (language) in the PS session, but haven't had any luck so far in forcing English output from robocopy.exe. I think it's hardcoded in the .exe file (but not sure). You could edit the regexes in the script to support "your" language for the header line and the start of the other lines.

I have also come across "Support Tools" folders (in the Program Files folder) in $Env:Path containing an incompatible version of robocopy which didn't support the /bytes parameter (causing me to rename it so it went to the one in the Windows directory instead, which works). It works fine with the native robocopy.exe on 2008 R2 and 2012 R2 (English only!).


Screenshot Example

Here's an example with the default behaviour, which is COM first, with a fallback to robocopy if access is denied to one or more files or folders. The COM output is less detailed, but you can add -RoboOnly (see below) to always have full details, if needed, (possibly/usually) at the expense of speed.

Get-FolderSizeExample-new-FSO-robocopy.png

If you need all the data for all objects, regardless of full access, you can use the -RoboOnly parameter. This will usually be a bit slower than COM, but after I increased the thread count to 16 from the default of 8, robocopy actually outperforms COM in some cases according to the screenshot below. You can set the number of threads, between 1-128, with the -RoboThreadCount parameter.

Using the -RoboOnly parameter will cause the script to never use COM, only robocopy, always giving file count and directory count as well as sizes.

To use only COM and never fall back to robocopy, use the parameter -ComOnly.

Here is a demonstration. As we can see, the reported sizes are identical.

Get-FolderSize-Example-new-FSO-comonly-roboonly.png

Download

Get-FolderSize.ps1.txt - Right click and download. Remember to unblock. Dot-source (. .\Get-FolderSize.ps1) to get the cmdlet/function Get-FolderSize into your current PowerShell session. You can put it in your profile (md (split-path $profile -parent); ise $profile # if it does not already exist).

GetSTFolderSize.zip - download, remember to unblock (Unblock-File is in PSv3 and up) before extracting, and copy to a module folder (see $Env:PSModulePath). Or use the PowerShell Gallery if you have PSv5+ (see below). When you use the module, the function that's exported is named "Get-STFolderSize", not "Get-FolderSize" (to avoid name collisions with stuff other people might have written).

  • 2017-08-31: v1.2.1. Both Resolve-Path and UNC paths now work. Replaced Path with ProviderPath as per feedback from some users. Updated here and in gallery. Updated stand-alone file as well.
  • 2017-08-12: v1.2. Introduced -LiteralPath. In Get-STFolderSize. Also uploaded to the gallery. Stand-alone file not updated yet.
  • 2017-05-01: The regular Get-FolderSize.ps1 stand-alone script file now also supports relative paths, etc.
  • 2017-04-23: v1.1. GetSTFolderSize module updated. Support for relative paths and wildcards and whatever Resolve-Path supports.
  • 2016-10-31: Uploading module version 1.0, with the function name "Get-STFolderSize". Minorly polished code. Improved regex validation.
  • 2016-05-20: Added -RoboThreadCount. 16 threads gave faster results than the default 8 on my 4 core rig. Can be from 1-128. Tweakable for faster results.
  • 2016-05-19: Added a -ComOnly parameter for people who might be paranoid about falling back to robocopy (I've always tested on a non-dangerous folder first myself whenever using it in a new environment - the fear being that the logging option isn't supported, but fortunately it appears to be ubiquitously).
  • 2016-02-23: More regex tweaking. Changed int64 to decimal to support larger sizes since someone apparently has _very_ large data storage units. Changed "time elapsed" to also respect the -Precision (default 4), and print 0.nnnn seconds.
  • 2016-02-22: Changed a few \s+ to \s* to support larger byte sizes, but went overboard and sacrificed some validation. Will work with well-formatted output. It's updated in the module 1.0 version and up.
  • 2016-02-19: Added catching a PERMISSIONDENIED exception, and falling back to robocopy also then. Not sure why I haven't seen this before. Noticed it on a 2012 R2 server. In retrospect it was probably needless, since robocopy also should fail in that case.
  • 2016-02-04: Fixed a bug and uploaded a new version.
  • 2016-02-04: Uploaded new version with COM as default approach, and fallback to robocopy.


If you have Windows Management Framework 5 or higher (WMF 5 is available for Windows 7 and up), you can install my GetSTFolderSize module from the PowerShell gallery, a Microsoft site and online repository for PowerShell modules and scripts.

To install this module with WMF 5 and up (to get the latest GetSTFolderSize module version available), simply run this command (requires an internet connection):

Install-Module -Name GetSTFolderSize

Be aware that the function has the name "Get-STFolderSize" when you use the module version. The "ST" prefix was added to avoid name collisions with functions other people may have written and given the obvious name.

More about the COM method

The "old" COM object method GetFolder() in the Scripting.FileSystemObject class is what I use.

It accesses a property I initially thought was immediate, but I realized it is calculated after some testing against remote servers (it's just really fast). This property contains the size of all the files in a folder including its subfolders and their files.

It works like this:

PS C:\> $FSO = New-Object -ComObject Scripting.FileSystemObject
PS C:\> $FSO.GetFolder('C:\temp').Size
14204279

This is described in further detail towards the bottom of this Microsoft Technet article.

However, if you don't have access to one or more subfolders of the folder you want the size of, the size returned will be $null. I initially wrote a version using only robocopy, but have now (2016-02-04) updated the script to a version that tries COM first, checks for $null, and then falls back to robocopy if the COM Scripting.FileSystemObject method doesn't work. This produces the fastest result. On 2016-05-19 I also made falling back to robocopy optional by adding a -ComOnly parameter.

Briefly about Get-ChildItem and Measure-Object

If you know you won't have long path problems and think this cmdlet is overkill, don't trust it, it's too many lines of unknown code, or for whatever reason you might want to use Get-ChildItem and Measure-Object, I'll briefly demonstrate this as well for the sake of the completeness of this article.

Also, apparently in at least PowerShell version 4, Get-ChildItem seems to have picked up speed significantly since the v2 days, although it is still about 120 % slower than my Get-FolderSize script with either method against the test folder. Revealed now during my testing in order to prepare this documentation.

The syntax in this example is, however, PSv2-compatible. With PowerShell v3 and up, you can replace the Where-Object that filters out directories with "Get-ChildItem -File" (dir -File / ls -File / gci -File). If you leave it in, directories should have a length of 0 anyway, so it shouldn't matter regardless.

PS C:\> $a = Get-Date; $MObjSizeSum = dir -LiteralPath T:\Movies -Recurse | `
    Where { -not $_.PSIsContainer } | measure -Sum -Property Length;
    "Time elapsed: $(((Get-Date)-$a).TotalSeconds.ToString('N')) seconds."

Time elapsed: 1.37 seconds.

PS C:\> $MObjSizeSum.Sum / 1GB
1248.42245620396

Without recursion (just omit the -Recurse flag to Get-ChildItem), slightly different, and without timing it, plus a few examples of how you might use it to display/use sizes in MB and GB.

PS C:\> $DirSizeBytes = (Get-ChildItem -LiteralPath T:\Movies |
    Where { $_.PSIsContainer -eq $false } | Measure-Object -Property Length -Sum).Sum
PS C:\> $DirSizeBytes
28277899720
PS C:\> $DirSizeMB = $DirSizeBytes / 1MB
PS C:\> $DirSizeGB = $DirSizeBytes / 1GB
PS C:\> $DirSizeMB
26967.9066848755
PS C:\> $DirSizeGB
26.3358463719487
PS C:\> if ($DirSizeGB -gt 25) { "Dir size is greater than 25 GB" }
Dir size is greater than 25 GB

Relative paths

Relative path and wildcard support was introduced in GetSTFolderSize version 1.1. It works like this:

PS C:\> Measure-Command { $Sizes = Get-STFolderSize -Path .\temp\testdir\sub* } |
    Select TotalSeconds | Format-Table -AutoSize

TotalSeconds
------------
   0.2049735
 
PS C:\> $Sizes | Format-Table -AutoSize Path, TotalMBytes

Path                    TotalMBytes
----                    -----------
C:\temp\testdir\subdir0           1
C:\temp\testdir\subdir1           2
C:\temp\testdir\subdir2           3
C:\temp\testdir\subdir3           4
C:\temp\testdir\subdir4           5
C:\temp\testdir\subdir5           6
C:\temp\testdir\subdir6           7
C:\temp\testdir\subdir7           8
C:\temp\testdir\subdir8           9
C:\temp\testdir\subdir9          10

Source Code

Here's the actual script code for the Get-FolderSize script.

#requires -version 2


<#
.SYNOPSIS
    Gets folder sizes using COM and by default with a fallback to robocopy.exe, with the
    logging only option, which makes it not actually copy or move files, but just list them, and
    the end summary result is parsed to extract the relevant data.

    There is a -ComOnly parameter for using only COM, and a -RoboOnly parameter for using only
    robocopy.exe with the logging only option.

    The robocopy output also gives a count of files and folders, unlike the COM method output.
    The default number of threads used by robocopy is 8, but I set it to 16 since this cut the
    run time down to almost half in some cases during my testing. You can specify a number of
    threads between 1-128 with the parameter -RoboThreadCount.

    Both of these approaches are apparently much faster than .NET and Get-ChildItem in PowerShell.

    The properties of the objects will be different based on which method is used, but
    the "TotalBytes" property is always populated if the directory size was successfully
    retrieved. Otherwise you should get a warning (and the sizes will be zero).
    
    Online documentation: http://www.powershelladmin.com/wiki/Get_Folder_Size_with_PowerShell,_Blazingly_Fast
    
    MIT license. http://www.opensource.org/licenses/MIT
    
    Copyright (C) 2015-2017, Joakim Svendsen
    All rights reserved.
    Svendsen Tech.
    
.PARAMETER Path
    Path or paths to measure size of.

.PARAMETER LiteralPath
    Path or paths to measure size of, supporting wildcard characters
    in the names, as with Get-ChildItem.

.PARAMETER Precision
    Number of digits after decimal point in rounded numbers.

.PARAMETER RoboOnly
    Do not use COM, only robocopy, for always getting full details.

.PARAMETER ComOnly
    Never fall back to robocopy, only use COM.

.PARAMETER RoboThreadCount
    Number of threads used when falling back to robocopy, or with -RoboOnly.
    Default: 16 (gave the fastest results during my testing).

.EXAMPLE
    . .\Get-FolderSize.ps1
    PS C:\> 'C:\Windows', 'E:\temp' | Get-FolderSize

.EXAMPLE
    Get-FolderSize -Path Z:\Database -Precision 2

.EXAMPLE
    Get-FolderSize -Path Z:\Database -RoboOnly -RoboThreadCount 64

.EXAMPLE
    Get-FolderSize -Path Z:\Database -RoboOnly

.EXAMPLE
    Get-FolderSize A:\FullHDFloppyMovies -ComOnly

#>
function Get-FolderSize {
    [CmdletBinding(DefaultParameterSetName = "Path")]
    param(
        [Parameter(ParameterSetName = "Path",
                   Mandatory = $true,
                   ValueFromPipeline = $true,
                   ValueFromPipelineByPropertyName = $true,
                   Position = 0)]
            [Alias('Name', 'FullName')]
            [string[]] $Path,
        [int] $Precision = 4,
        [switch] $RoboOnly,
        [switch] $ComOnly,
        [Parameter(ParameterSetName = "LiteralPath",
                   Mandatory = $true,
                   Position = 0)] [string[]] $LiteralPath,
        [ValidateRange(1, 128)] [byte] $RoboThreadCount = 16)
    begin {
        if ($RoboOnly -and $ComOnly) {
            Write-Error -Message "You can't use both -ComOnly and -RoboOnly. Default is COM with a fallback to robocopy." -ErrorAction Stop
        }
        if (-not $RoboOnly) {
            $FSO = New-Object -ComObject Scripting.FileSystemObject -ErrorAction Stop
        }
        function Get-RoboFolderSizeInternal {
            [CmdletBinding()]
            param(
                # Paths to report size, file count, dir count, etc. for.
                [string[]] $Path,
                [int] $Precision = 4)
            begin {
                if (-not (Get-Command -Name robocopy -ErrorAction SilentlyContinue)) {
                    Write-Warning -Message "Fallback to robocopy failed because robocopy.exe could not be found. Path '$p'. $([datetime]::Now)."
                    return
                }
            }
            process {
                foreach ($p in $Path) {
                    Write-Verbose -Message "Processing path '$p' with Get-RoboFolderSizeInternal. $([datetime]::Now)."
                    $RoboCopyArgs = @("/L","/S","/NJH","/BYTES","/FP","/NC","/NDL","/TS","/XJ","/R:0","/W:0","/MT:$RoboThreadCount")
                    [datetime] $StartedTime = [datetime]::Now
                    [string] $Summary = robocopy $p NULL $RoboCopyArgs | Select-Object -Last 8
                    [datetime] $EndedTime = [datetime]::Now
                    [regex] $HeaderRegex = '\s+Total\s*Copied\s+Skipped\s+Mismatch\s+FAILED\s+Extras'
                    [regex] $DirLineRegex = 'Dirs\s*:\s*(?<DirCount>\d+)(?:\s+\d+){3}\s+(?<DirFailed>\d+)\s+\d+'
                    [regex] $FileLineRegex = 'Files\s*:\s*(?<FileCount>\d+)(?:\s+\d+){3}\s+(?<FileFailed>\d+)\s+\d+'
                    [regex] $BytesLineRegex = 'Bytes\s*:\s*(?<ByteCount>\d+)(?:\s+\d+){3}\s+(?<BytesFailed>\d+)\s+\d+'
                    [regex] $TimeLineRegex = 'Times\s*:\s*(?<TimeElapsed>\d+).*'
                    [regex] $EndedLineRegex = 'Ended\s*:\s*(?<EndedTime>.+)'
                    if ($Summary -match "$HeaderRegex\s+$DirLineRegex\s+$FileLineRegex\s+$BytesLineRegex\s+$TimeLineRegex\s+$EndedLineRegex") {
                        New-Object PSObject -Property @{
                            Path = $p
                            TotalBytes = [decimal] $Matches['ByteCount']
                            TotalMBytes = [math]::Round(([decimal] $Matches['ByteCount'] / 1MB), $Precision)
                            TotalGBytes = [math]::Round(([decimal] $Matches['ByteCount'] / 1GB), $Precision)
                            BytesFailed = [decimal] $Matches['BytesFailed']
                            DirCount = [decimal] $Matches['DirCount']
                            FileCount = [decimal] $Matches['FileCount']
                            DirFailed = [decimal] $Matches['DirFailed']
                            FileFailed  = [decimal] $Matches['FileFailed']
                            TimeElapsed = [math]::Round([decimal] ($EndedTime - $StartedTime).TotalSeconds, $Precision)
                            StartedTime = $StartedTime
                            EndedTime   = $EndedTime

                        } | Select-Object -Property Path, TotalBytes, TotalMBytes, TotalGBytes, DirCount, FileCount, DirFailed, FileFailed, TimeElapsed, StartedTime, EndedTime
                    }
                    else {
                        Write-Warning -Message "Path '$p' output from robocopy was not in an expected format."
                    }
                }
            }
        }
    }
    process {
        if ($PSCmdlet.ParameterSetName -eq "Path") {
            $Paths = @(Resolve-Path -Path $Path | Select-Object -ExpandProperty ProviderPath -ErrorAction SilentlyContinue)
        }
        else {
            $Paths = @(Get-Item -LiteralPath $LiteralPath | Select-Object -ExpandProperty FullName -ErrorAction SilentlyContinue)
        }
        foreach ($p in $Paths) {
            Write-Verbose -Message "Processing path '$p'. $([datetime]::Now)."
            if (-not (Test-Path -LiteralPath $p -PathType Container)) {
                Write-Warning -Message "$p does not exist or is a file and not a directory. Skipping."
                continue
            }
            # We know we can't have -ComOnly here if we have -RoboOnly.
            if ($RoboOnly) {
                Get-RoboFolderSizeInternal -Path $p -Precision $Precision
                continue
            }
            $ErrorActionPreference = 'Stop'
            try {
                $StartFSOTime = [datetime]::Now
                $TotalBytes = $FSO.GetFolder($p).Size
                $EndFSOTime = [datetime]::Now
                if ($null -eq $TotalBytes) {
                    if (-not $ComOnly) {
                        Get-RoboFolderSizeInternal -Path $p -Precision $Precision
                        continue
                    }
                    else {
                        Write-Warning -Message "Failed to retrieve folder size for path '$p': $($Error[0].Exception.Message)."
                    }
                }
            }
            catch {
                if ($_.Exception.Message -like '*PERMISSION*DENIED*') {
                    if (-not $ComOnly) {
                        Write-Verbose "Caught a permission denied. Trying robocopy."
                        Get-RoboFolderSizeInternal -Path $p -Precision $Precision
                        continue
                    }
                    else {
                        Write-Warning "Failed to process path '$p' due to a permission denied error: $($_.Exception.Message)"
                    }
                }
                Write-Warning -Message "Encountered an error while processing path '$p': $($_.Exception.Message)"
                continue
            }
            $ErrorActionPreference = 'Continue'
            New-Object PSObject -Property @{
                Path = $p
                TotalBytes = [decimal] $TotalBytes
                TotalMBytes = [math]::Round(([decimal] $TotalBytes / 1MB), $Precision)
                TotalGBytes = [math]::Round(([decimal] $TotalBytes / 1GB), $Precision)
                BytesFailed = $null
                DirCount = $null
                FileCount = $null
                DirFailed = $null
                FileFailed  = $null
                TimeElapsed = [math]::Round(([decimal] ($EndFSOTime - $StartFSOTime).TotalSeconds), $Precision)
                StartedTime = $StartFSOTime
                EndedTime = $EndFSOTime
            } | Select-Object -Property Path, TotalBytes, TotalMBytes, TotalGBytes, DirCount, FileCount, DirFailed, FileFailed, TimeElapsed, StartedTime, EndedTime
        }
    }
    end {
        if (-not $RoboOnly) {
            [void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($FSO)
        }
        [gc]::Collect()
        [gc]::WaitForPendingFinalizers()
    }
}