Encoding

Let's not discuss why file encoding is essential or its history. We accept that it is here to stay and that we must support different encodings when reading and writing files.

The file service can read and write text files in almost any encoding. Some functions for reading and writing use a default encoding, while others allow you to set a specific encoding.

Here is a list of all of the supported encodings:

List of supported encodings

Text File BOM

A text file BOM (Byte Order Mark) is a special marker placed at the beginning of a text file to indicate its encoding and endianness (byte order). It's commonly used in Unicode-based encoding schemes such as UTF-8 and UTF-16. The BOM serves to inform applications how the bytes in the file should be interpreted.

In UTF-8, the BOM is optional and rarely used because UTF-8 does not have byte order issues. However, in UTF-16, which can be either big-endian or little-endian, the BOM is crucial to indicate the byte order.

The BOM is represented by a specific sequence of bytes at the beginning of the file. For UTF-8, it's typically the bytes 0xEF, 0xBB, and 0xBF. For UTF-16, it can be either 0xFE 0xFF (big-endian) or 0xFF 0xFE (little-endian).

While the BOM can be helpful for applications to correctly interpret text files, it can also cause issues in certain scenarios, such as when it's not expected or supported by a particular software or system. Therefore, its usage is a topic of debate and consideration in software development and data exchange.

The file service has support for BOMs. You can force a BOM or use it if it is there.

Table of Contents

Encoding

Text File BOM

See also