Search This Blog

Wednesday, January 11, 2006

Dilemma - Filenames

Somebody just kill me, now.

So, I've been considering supporting Unicode filenames in LibQ, but that brings up not nice problem. First and foremost, not all operating systems and/or file systems support Unicode filenames. Windows NT supports true Unicode (UCS-2) filenames on NTFS, but converts (and degrades) the filenames to ANSI (similar to ASCII, both not supporting foreign language characters) on file systems like FAT and FAT32 that don't support Unicode file names. I'm told some Linux file systems support Unicode file names, but I've yet to find an interface to manipulate such files.

I could get around this problem by automatically converting Unicode filenames on unsupporting OS or file systems to something like UTF-7 or UTF-8, which will fit in a char array, but this brings up its own problems. First, as previously mentioned, NT automatically converts Unicode filenames to ANSI code page on unsupporting file systems, replacing characters that can't be converted to some default character (and it's not feasible to detect file systems that don't support Unicode, to protect against this). Second, this would create an odd duality for files saved with UTF filenames. You could open them by the Unicode version of their name (as it would automatically convert the name to UTF before calling the OS functions), but the names could appear garbled in directory listings (as there'd be no good way to detect that a filename is UTF, and convert it back to Unicode).

Last but not least, I'm having difficulty figuring out the API to convert Unicode strings to UTF-7/8 on POSIX.

No comments: