What is unknown 8bit encoding?
A special purpose character set called “unknown-8bit” is defined to be an unknown 8bit character set, encoded into a sequence of octets. It can be used as a label for any character set from any language, using any encoding.
What does iconv do in Linux?
Description. The iconv command converts the encoding of characters read from either standard input or the specified file from one coded character set to another and then writes the results to standard output.
How to convert unknown 8bit charset to UTF-8?
After google’ing some I’ve tried the following in terminal, but “unknown-8bit” is unsupported. You can use enca or chardet, enca will probably be more successful. If you know the language the document was written in, you can guess the encoding and try converting until you get the right results:
Is there an extension to convert files to UTF-8?
There’s also a Brackets extension to convert files to UTF-8 encoding so you don’t have to leave your editor. But if other tools are having trouble identifying/converting the original file’s encoding, I’m guessing this extension will have the same problem. Thanks for contributing an answer to Stack Overflow!
Is there a way to convert unknown encoding to known encoding?
6 There is no reliable way to convert from an unknown encoding to a known one. In your case, if you know the original text is in Farsi / Persian, maybe you can identify a number of possible encodings, and iterate over those until you see the output you expect.
Why is the unknown 8bit code page called unknown?
A file in an unknown 8-bit code page is determined as “unknown-8bit” for a reason: it is not an easy problem without any ideas about the language. Not to say it’s impossible but, to work efficiently, such heuristic detector had to possess a large vocabulary of all most-used languages, a large list of code pages, and know some grammar.