4. Encoding Rules#

Historically, diffs have lacked any encoding information. A diff generated on one computer could use an encoding for diff content or filenames that would make it difficult to parse or apply on another computer.

To address this, DiffX has explicit support for encodings.

DiffX files follow these simple rules:

  1. DiffX files have no default encoding. Tools should always set an explicit encoding (utf-8 is strongly recommended).

    If not specified, all content must be treated as 8-bit binary data, and tools should be careful when assuming the encoding of any content. This is to match behavior with existing Unified Diff files.

  2. Section headers are always encoded as ASCII (no non-ASCII content is allowed in headers).

  3. Sections inherit the encoding of their parent section, unless overridden with the encoding option.

  4. Preambles and metadata in content sections are encoded using their section’s encoding.

  5. Diff sections do not inherit their parent section’s encoding, for compatibility with standard diff behavior. Instead, diff content should always be treated as 8-bit binary data, unless an explicit encoding option is defined for the section.

Tip

DiffX parsers should prioritize content (such as filenames) in metadata sections over scraping content in diff sections, in order to avoid encoding issues.