This article was originally written some time between 2011 and 2015. Last update: 2022-01-26.
If you have an ANSI-encoded file, or a file encoded using some other (supported) encoding, and want to convert it to UTF-8 (or another supported encoding), this article is for you. I ran into this when working with exported data from Excel which was in latin1/ISO8859-1 by default, and I couldn't find a way to specify UTF-8 in Excel.
The problem occurred when I wanted to work on the CSV file using the PowerShell cmdlet Import-Csv, which, as far as I can tell, doesn't work correctly with latin1-encoded files exported from Excel or ANSI files created with notepad - if they contain non-US characters. 2022-01-26: It's a known bug that has probably been fixed. The bug occurs when the file is missing the UTF-8 BOM (more on that below). The bug was submitted to Microsoft Connect years ago here.
A command you may be looking for is Set-Content. Type "Get-Help Set-Content -Online" at a PowerShell prompt to read the help text, and see the example below.
Also see the part about using Get-Content file.csv | ConvertFrom-Csv.Click here for an article on how to convert using iconv on Linux.
Internally in PowerShell, a string is a sequence of 16-bit Unicode characters (often called a Unicode code point or Unicode scalar value). It's implemented directly using the .NET System.String type, which is a reference type (read more about that in my deep copying article).A string can be arbitrarily long (computer memory and physics as we currently understand it allowing) and it is immutable, meaning it can't be changed without creating an entirely new altered version/"copy" of the string.
PS C:\> 'foo' | Set-Content -Encoding whatever Set-Content : Cannot bind parameter 'Encoding'. Cannot convert value "whatever" to type "Microsoft.PowerShell.Commands. FileSystemCmdletProviderEncoding" due to invalid enumeration values. Specify one of the following enumeration values an d try again. The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii". At line:1 char:30 + 'foo' | Set-Content -Encoding <<<< whatever + CategoryInfo : InvalidArgument: (:) [Set-Content], ParameterBindingException + FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.SetContentCommandNotice the part with the possible enumeration values:
PS C:\> notepad .\norwegian-vowels.txt PS C:\> gc .\norwegian-vowels.txt vowel,position æ,27 ø,28 å,29
PS C:\> Import-Csv .\norwegian-vowels.txt vowel position ----- -------- ? 27 ? 28 ? 29
Then I just pass it to Import-Csv to verify it's displayed correctly.
PS C:\> Get-Content .\norwegian-vowels.txt | Set-Content -Encoding utf8 norwegian-vowels-utf8.txt PS C:\> Import-Csv .\norwegian-vowels-utf8.txt vowel position ----- -------- æ 27 ø 28 å 29
In looking at why Import-Csv doesn't work as expected found that the missing element is simply the UTF-8 BOM
(see https://en.wikipedia.org/wiki/Byte_order_mark.php )
The Get-Content cmdlet correctly determines the encoding at UTF-8 if the BOM is present or not, Import-Csv only works if the BOM is present.
I tried specifying the encoding to Import-Csv and that does not work either: PS C:\> Import-Csv -Encoding UTF8 .\norwegian-vowels.txt
You can eliminate the interim file encoding step like this: PS C:\> Get-Content .\norwegian-vowels.txt | ConvertFrom-Csv
I submitted a bug to Microsoft Connect:https://connect.microsoft.com/PowerShell/feedback/details/1371244/import-csv-does-not-correctly-detect-encoding-for-utf-8-files-without-bom
keywords: convert from latin1 to utf8 using powershell, convert from latin1 to utf-8, convert from any encoding to utf8, convert from utf7 to utf8, convert from utf16 to utf8, powershell, iconv, linux, converting to utf8, converting file encodings with powershell, converting file encoding with linux, convert file, iso-8859-1-15, iso-8859-1, latin1, incompatible file encoding, characters displayed incorrectly, norwegian vowels incorrectly displayed in powershell, characters incorrectly displayed in powershell, converting files using powershell, excel csv, import-csv, csv latin1, csv iso8859, import-csv utf8, characters display wrong with import-csv in powershellPowershell Windows All Categories
Minimum cookies is the standard setting. This website uses Google Analytics and Google Ads, and these products may set cookies. By continuing to use this website, you accept this.