The original Zip specification does not support non-ASCII, international characters. Thanks to extensions to the specification, Unicode, the widespread standard for the encoding, representation and handling of text expressed in most of the world's writing systems, is available to encode filenames and comments. Unicode support was added to the specification years after the original version. Because of this, Unicode support is not universal amongst zip tools. Some tools will not show or extract Unicode paths and files names correctly if at all.
All Xceed Zip components handle Unicode characters automatically by default, so this is not an issue if an application uses Xceed components to create, update and extract zip archives. Issues can arise when 3rd party zip tools with varying support for Unicode are used in some operations. This becomes an interoperability problem.
A zip archive contains, for each file, meta-data about the items in the archive: the name and path of file/folder, optional comment, dates, compression method, etc. It is in the text-based meta-data that Unicode can be applied.
A zip archive is made up of the items (files or folders) archived one after the other. Each item is made up of a header, followed with the item's compressed data. That header is made up of a standard set of fields followed by a sequence of optional extra headers. Meta-data is located in those headers. The most important field, with regards to Unicode is the filename field. That field is part of the standard set of fields for each item in an archive.
The Zip format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To solve this, several independent solutions exist:
Every flavor of Xceed Zip components support all these solutions. They can be activated (or not) independently of each other in any combination.
Not every 3rd party zip tool support the Unicode options mentioned above. Some tools don't support any of them, some support some of them. Often, Unicode support is not clearly advertised and if it is, it's not clearly explained what method is used.
Interoperability is the ability of making different systems work together. In the case of Zip and Unicode, it applies when zip archives are created with Xceed Zip for .NET and will be extracted by other zip tools. It also applies when zip archives created by other zip tools are extracted by Xceed Zip for .NET.
Xceed Zip for .NET (and all zip components from Xceed) supports all Unicode options defined in the Zip specification and a proprietary solution. This means that it should be able to automatically properly handle Unicode characters in any archive created by a 3rd party zip tool unless it uses a proprietary solution.
Because not every 3rd party zip tool supports Unicode options, interoperability problems are likely in this situation if filenames or comments contain non-ASCII characters. The symptoms will be that these filenames will contain incorrect characters when viewed or extracted by 3rd party tools that don't support Unicode.
Even if some Unicode options are enabled when you create archives with Xceed Zip for .NET, if a 3rd party zip tool doesn't support that specific option, filenames will appear incorrectly.
To show the effects of interoperability problems, we will use an example where a zip archive is created to contain two files. The first file will be named "FileNameWithNonASCIIStartÀÉяテEnd.dat" and will have the comment "私のコメント". The second file's name will be, "FileNameWithOEMCharsStartéàùïEnd.dat" and will not have any comment.
Code: Creating an archive using the default Unicode behavior
Using a modern 3rd party zip tool like WinZip, the archive shows 2 files and you see the names displayed correctly. This is because the default behavior of Xceed Zip for .NET is to create archives with the ExtraHeader.UTF8Filename and ExtraHeader.UTF8Comment options. These extra headers are supported by WinZip so the Unicode characters appear correctly.
Image: Default behavior seen in WinZip
If we open the same zip file but, this time, with Windows Explorer on Windows 7, we see something completely different. The first thing we notice is that the "FileNameWithNonASCIIStartÀÉяテEnd.dat" file seems to be gone. While it is present in the zip file, Explorer chooses not to display it because the file name was written as "FileNameWithNonASCIIStart?É??End.dat" in the main header The non-ASCII characters were written as '?'. This is the standard behavior in .NET's %System.Text.Encoding% class. Windows Explorer does not support the Unicode (UTF8) extra header so it uses the filenames in the main header. The '?' character is illegal in filenames under Windows, so Windows Explorer chooses not to display the filename, a rather unfortunate behavior.
Image: Default behavior seen in Windows Explorer 7
In Windows 8, it appears Microsoft has improved the quality of its built-in Zip to support both Unicode extra headers and Unicode text encoding. So the default behavior from Xceed Zip for .NET produces the expected results.
Image: Default behavior seen in Windows Explorer 8
Xceed Zip for .NET provides control over how non-ASCII characters are converted into ASCII. The ZipArchive.UnknownCharacterFallbackConversion property specifies what character ASCII should be used when a Unicode character cannot be expressed in ASCII when writing zip files. The default value is '?'.
We can update the previous example to improve interoperability on Windows
Code: Create a zip archive aiming for maximum interoperability
Windows Explorer now displays all the files. The name of "FileNameWithNonASCIIStartÀÉяテEnd.dat" is still mangled but at least it can be extracted now. Other Zip tools, like WinZip, that support Unicode extra headers will continue to display the names properly. Because of this we consider this approach to provide the maximum interoperability while still preserving backward compatibility.
Image: Maximum Interoperability seen in Windows Explorer 7
On Windows 8, because of the good Unicode support, the archive displays items without issue.
Image: Maximum Interoperability seen in Windows Explorer 8
Some Zip tools do not support Unicode extra headers. They do, however, support Unicode text encoding. Unicode text encoding means that filenames will be written in the zip archive encoded in Unicode (UTF-8) in the main headers.
Code: Create a zip archive using Unicode text encoding
The 7-Zip tool supports Unicode text encoding but not Unicode extra headers. When the zip file is viewed in the 7-Zip tool, the items are shown correctly.
Image: Unicode text encoding seen in 7-Zip
However when the zip file is viewed in Windows Explorer on Windows 7, we see the disadvantage of the Unicode text encoding options. It is not backwards compatible with tools that do not support it. Windows Explorer rightly expects to read file names encoded in the OEM code page because it's the only thing it knows about. But the main header file name field has been written in Unicode instead. So the filename shows up mangled.
Comparatively, the Unicode extra header option is backwards compatible because the extra header mechanism is part of the original Zip specification and, by design, any unknown extra header is ignored. This is how Windows Explorer ignores the Unicode extra headers in previous examples. Here, the Unicode data cannot be ignored because it's part of the main header.
For this reason, we do not consider using the Unicode text encoding option good for interoperability and only suggest its use if the situation warrants it.
Image: Unicode text encoding seen in Windows Explorer 7
Windows Explorer in Windows 8, with its improved Unicode support, correctly handles the Unicode text encoding and the items show up as they should.
Image: Unicode text encoding seen in Windows Explorer 8
Because of the different Unicode options available and the awkward way they were introduced in the Zip specification, there is no universal, full-proof way to support non-ASCII characters in zip file meta data.
If you are producing zip files meant to be consumed by Xceed Zip for .NET or another Zip component sold by Xceed Inc., the default behavior of the component is adequate and no special option needs to be enabled.
If your zip files will be consumed by 3rd party zip tools, make a few tests using the examples above to determine what Unicode options they support and tailor your code to use the Unicode options that best fit the 3rd party tool.