Text slicing strategy for CRDT-based collaborative editors

2020-06-04

Motivation🔗

What are splicing edits? Splicing edits are essentially a way to model text moving as a move instead of (as is currently done in most collaborative editors) as pairs of deletion/insertion. In this post, we will refer to splices as "moves" and "move-edits".

For motivation why we need this, the referenced articles mention check list applications and bullet points.

My motivation for this comes from the desire to have collaborative editors with better intention preservation for merging edits of users which have been disconnected for longer amounts of time. Let's consider for instance that we have a document with two users concurrently editing it.

Initial document:

User A is reordering a line while user B is correcting a typo in the same line:
illustration of insert and move

In a traditional collaborative editor, user A's reordering gets converted to a erase/insert pair.

A possible result in a traditional collaborative editor is therefore (if inserts concurrently with erases get preserved, this is the guaranteed result):
broken document state with the insert on a separate line

A more intention-preserving way would be to just reorder the paragraphs and have the typo-correction move with the word it applies to:
expected document state

Basic approach: just move nodes🔗

Let's initially just consider the case of preserving non-moving edits (inserts, erases) while a move happens concurrently. We are relying on the OpSets approach which guarantees total order over edits for us, so we do not need to care about the order in which edits arrive in different places as they will be put in the same order in the end.

The idea is that if text keeps its identity when moved, edits will still find the text to which they apply, rather than just a tombstone.

Following this rule, we have already solved the case of a move happening concurrently with non-moving edits.

Dealing with concurrent moves: First move moves, any later move copies🔗

Things get more tricky when we look at concurrent moves. If we were to say that the original nodes now exist in several different locations, that would force the user to always change these locations at the same time. This would be equivalent to having multi-cursor editing which cannot be disabled and where the position of the cursors cannot be chosen, which is clearly undesirable.

One possible solution is to apply a simple rule here: the first move in the document history "owns" the IDs of the text fragments it touches. Any "later" (in document-history) move will be converted to an erase/insert pair.

This implies that edits touching the moved range will end up at the first moved-to position and the other moved-to positions will end up with a copy.

As it is not clear from just a start- and end-point what a user was actually moving at the time the edit was made, for each move-edit we need to also store the text that was moved to allow falling back to copying. This means that we cannot make the move operation any smaller than the corresponding erase/insert.

Detecting concurrent moves🔗

To detect concurrent moves, we can attach to each move edit which moves in the source (= erase) range it is already aware of, so we can detect concurrent moves by encountering a difference in moved-by edits between the document and the ones referenced in the move edit for a text range. This can grow large if moves are frequent, but there can be mitigating circumstances based on the overall setup, e.g. if we have guaranteed per-client order of arrival, we can only store the last relevant move-edit in the range per client.

The approach just presented suffers from over-pessimization in case we have a move whose source range is completely inside another concurrent move. In this case we can actually preserve the semantics of both moves, so a good implementation also neeeds to handle this case, both if the contained move appears before and if it appears after the containing one in the document history.

If two concurrent move-sources overlap, but none of them is contained in the other, the later one can be applied as a combination of a copy (for the overlapping part) and a move (for the non-overlapping part).

Dealing with Undo/Redo🔗

Some common strategies for undos are (0) to just consider the undone edit as non-existent (and to discard any dependent edits), (1) insert the edit as a tombstone for inserts and to apply dependent inserts based on that tombstone. I have a strong preference towards (1) as I consider it to be more intention preserving, which is why I will not discuss anything related to (0).

With the move-edits introduced above, we add another element to this. When we undo the first move, the next move on the text should become the winning one. There can be edits which build on the moved version and some which build on the copied version.

For implementations, this means that a lookup from the move-insert to the original ID needs to be supported.

Supporting erase + insert as move (CTRL+X use case)🔗

I rarely see people using drag & drop for text (this might be because I live in a text drag & drop-free bubble), so to actually make use of text moves, another trick has to be introduced.

By transforming any erase into a move when it is inserted, we can support other workflows like:

select text range, CTRL+X, CTRL+V somewhere else
select text range, CTRL+C, Backspace/Delete/insertion of text, CTRL+V somewhere else

The second might seem esoteric, but some people have CTRL+C, Backspace, and CTRL+V on their mouse, therefore this allows them to do much more flexible reordering of text than drag & drop while still not having to use the keyboard.

Recap🔗

To add move-edits to a CRDT-based collaborative editor, only one additional operation needs to be added: The move-insert, which refers to an erase.

Conceptually, a move-insertion converts an erase into a cut operation for the parts of the erase for which this is possible and for the non-cuttable parts of the erase into a copy.

To allow to easily reconstruct what the user doing the move-insert actually erased, either the erase or the move-insert need to contain the IDs (& ranges if a block-based approach is used) of the erased text.

Future work🔗

This approach needs an implementation to see how this approach is perceived by real users. This also has the potential to get more insights on how people would use splicing as currently they probably refrain from doing these kinds of changes when they expect someone to concurrently edit things.

Additionally, in the "Moving Elements in List CRDTs" paper, the duplication of elements when moving twice is criticized as unituitive and it definitely is the case for an element moving in a list. To get the advantages of both approaches, probably a combination is needed with e.g. both a text-move and a paragraph-move operation. How this can be done in a way which is easy to expose and communicate to the user is an open question.

An extremely interesting idea, especially for not purely text-oriented applications, is to actually have objects which can be reused in different places and where changes in any of those places would be applied to all usages. E.g. presentations are often made with different slides which need to have matching elements and these are currently often duplicated, which makes them hard to change consistently.

Loading comments...