Benchmarks

All of this is only ivory tower stuff until we get into hard numbers of performance, right? Agreed! Here are some benchmarks comparing a "pure" situation with the piecewise mutable one.

The theoretical "best case" of piecewise-mutable (and worst case of pure) is when you only modify a small part of a data type repeatedly. This forces many unnecessary copies in the pure form.

The theoretical "worst case" of piecewise-mutable (and best case of pure) is if you only ever modify the entire data type.

Both of these situations are tested below:

(Only bars of the same color are comparable, and shorter bars are better, performance-wise)

Benchmarks

There are four setups here, compared and contrasted between pure and mutable versions.

A large ADT with 256 fields, generated by repeated nestings of data V4 a = V4 !a !a !a !a
1. Updating only a single part (one field out of 256)
2. Updating the entire ADT (all 256 fields)
A composite data type of four Vectors of 500k elements each, so 2 million elements total.
1. Updating only a single part (one item out of 2 million)
2. Updating all elements of all four vectors (all 2 million items)

We can see four conclusions:

For a large ADT, updating a single field (or multiple fields, interleaved) is going to be faster with mutable. This speedup is between x4 and x5, suggesting it is a speedup arising from the fact that the top-level type has four fields.
For a large ADT, updating the whole ADT (so just replacing the entire thing, no actual copies) is faster just as a pure value by a large factor (which is a big testament to GHC).
For a small ADT with huge vectors, updating a single field is much faster with mutable.
For a small ADT with huge vectors, updating the entire value (so, the entire vectors and entire ADT) is actually faster with mutable as well.

Interestingly, the "update entire structure" case (which should be the worst-case for mutable and the best-case for pure values) actually becomes faster with mutable when you get to the region of many values... somewhere between 16 and 2 million, apparently. However, this may just be from the efficiency of modifying vectors sequentially.