pydiffx.utils.unified_diffs#

Utilities for parsing Unified Diffs.

Functions

get_unified_diff_hunks(lines[, ignore_garbage])

Return information on each hunk in a Unified Diff.

pydiffx.utils.unified_diffs.get_unified_diff_hunks(lines, ignore_garbage=False)#

Return information on each hunk in a Unified Diff.

This will iterate through each hunk, generating information on each hunk. Parsing will continue until something other than a hunk is found (unless passing ignore_garbage=True).

Parameters
  • lines (list of bytes) – The list of lines in the diff. This should generally be the result of using split_lines().

  • ignore_garbage (bool, optional) –

    Whether to ignore garbage lines found outside of a hunk.

    If True, all lines will be processed for hunk data.

    If False (the default), reading will stop once something other than a hunk is found.

Returns

A dictionary containing the results. This will have the following keys:

hunks (list of dict):

The list of hunks. Each dictionary will contain:

context (bytes):

Optional context shown after the @@ header. This may be None.

lines_of_context_pre (int):

The number of lines of context before the first changed line in the hunk.

lines_of_context_post (int):

The number of lines of context after the last changed line in the hunk.

modified (dict):

Information on the modified side of the hunk. This will contain the following keys:

start_line (int):

The 0-based line number in the original file where the hunk begins. This will be the line number of the first line shown in the hunk, which may include lines of context.

num_lines (int):

The number of lines shown for the original side of the hunk in the diff, including any lines of context, unchanged lines, or changed lines.

first_changed_line (int):

The 0-based line number in the original file where the first change in the hunk (a - line) occurs. This will always be after any lines of context.

last_changed_line (int):

The 0-based line number in the original file where the last change in the hunk (a - line) occurs. This will always be before any lines of context.

num_lines_changed (int):

The number of lines that were changed in original side of the hunk (the number of - lines).

orig (dict):

Information on the original side of the hunk. This will contain the following keys:

start_line (int):

The 0-based line number in the modified file where the hunk begins. This will be the line number of the first line shown in the hunk, which may include lines of context.

num_lines (int):

The number of lines shown for the modified side of the hunk in the diff, including any lines of context, unchanged lines, or changed lines.

first_changed_line (int):

The 0-based line number in the modified file where the first change in the hunk (a + line) occurs. This will always be after any lines of context.

last_changed_line (int):

The 0-based line number in the modified file where the last change in the hunk (a + line) occurs. This will always be before any lines of context.

num_lines_changed (int):

The number of lines that were changed in modified side of the hunk (the number of + lines).

num_processed_lines (int):

The number of lines read in the diff to produce these results. Callers can use this to start parsing the rest of a diff after these lines.

total_deletes (int):

The total number of deleted lines found.

total_inserts (int):

The total number of inserted lines found.

Return type

dict

Raises

pydiffx.errors.MalformedHunkError – A line was found within a hunk that was not valid and could not be parsed, or a hunk was terminated prematurely.