Diff-able documents #16290

davidbrochart · 2024-05-06T14:11:27Z

Problem

There is work happening on supporting suggestions in RTC, just like in Google Docs. In the backend, the architecture consists of creating a fork of a shared document, that is used to receive the suggestion changes as well as the changes from the root document. When the suggestion is accepted, the fork is merged back into the root document.
In the current solution, the frontend is quite limited: one chooses to view either the root document or the fork. This has the advantage of fitting easily into the existing architecture (we just unplug the shared document provider from the root and plug it into the fork), but the UI is quite poor: one cannot see the differences e.g. side by side (like in nbdime) or in a unified view like in Google Docs.

Proposed Solution

To be able to show document diffs, I think that we would need a document model that can hold multiple shared documents. This would be very dependent on the underlying CRDT technology (i.e. Yjs), because we need to interpret deltas in order to show diffs (e.g. this cell was deleted, this text was inserted, etc.).
This issue is to start a discussion on whether to change the existing document models so that they support diffs, or to start new document models. Starting new models would probably result in a lot of code duplication from the existing ones, but changing existing models would make them rely on CRDTs even more.
I might miss other points, so I'd like to hear what people think about that.

fcollonval · 2024-05-06T16:11:28Z

I don't understand the solution.

Displaying a diff between two models, is comparing instantaneous snapshots of two models and producing a diff model (unrelated to the document model) for it (this is for example what nbdime is doing). What is the advantage of having a parent document?

An advantage I see to manipulate CRDT models is in the creation of the diff model as it could be simplified if the common base version is known and the modifications compressed/simplified between that common version and the current versions.

davidbrochart · 2024-05-06T17:15:06Z

But nbdime is trying to infer changes, right? With CRDTs, and the root-fork model we have more information, because the diff is just the result of all deltas. For instance, moving a cell is a CRDT operation that is not ambiguous, but without this information we just have two cells that seem identical at different indices, that look like a move operation. There might be other cases that are impossible to infer exactly, while it is possible with CRDTs because we have the history of all changes.
Another case that comes to mind, some text that is deleted and replaced with some other text, maybe several times. We will be able to show the deletions (with strikethrough) and the additions separately.
Also, without CRDTs there is a performance cost, as the diff has to be computed each time there is a change, and this is going to be more and more complex with time. With CRDTs, we just interpret deltas as they come, live.

fcollonval · 2024-05-07T08:06:31Z

But nbdime is trying to infer changes, right?

Yes

With CRDTs, and the root-fork model we have more information, because the diff is just the result of all deltas. For instance, moving a cell is a CRDT operation that is not ambiguous, but without this information we just have two cells that seem identical at different indices, that look like a move operation. There might be other cases that are impossible to infer exactly, while it is possible with CRDTs because we have the history of all changes. Another case that comes to mind, some text that is deleted and replaced with some other text, maybe several times. We will be able to show the deletions (with strikethrough) and the additions separately.

I agree with that. This was my last comment.

Note

Move object (aka cell) does not exist in yjs

Also, without CRDTs there is a performance cost, as the diff has to be computed each time there is a change, and this is going to be more and more complex with time. With CRDTs, we just interpret deltas as they come, live.

You will need to compute a diff anyway.

For example if you have the text evolution:

Hello John
Hello Mary
Hello Johnny

When diffing 3 and 1, the user should see: addition of 'ny' at position 10. I don't know how easy that can be inferred from modification 1 (replace 'John' by 'Mary') and modification 2 (replace 'Mar' by 'Johnn').

Nevertheless, could you elaborate how having a document model that holding multiple shared documents will help?

fcollonval · 2024-05-07T08:12:24Z

One additional point, producing an easy-to-understand diff model is hard as a efficient diff model for the computer (like CRDTs) may not be easy to understand for the end user. I'm typically thinking of character based diff vs word/semantic group diff. You can look for pathological examples in the work done in VS Code for their second diff viewer: https://code.visualstudio.com/updates/v1_81#_diff-editor

davidbrochart · 2024-05-07T08:43:16Z

Move object (aka cell) does not exist in yjs

Hmm wasn't it released as an alpha in Yjs? At least y-crdt has move_to for Array, see this issue.

When diffing 3 and 1, the user should see: addition of 'ny' at position 10

Or should the user see "Hello ~~John~~ ~~Mary~~ Johnny", meaning all intermediate deltas should be displayed? Then the processing doesn't increase with time, since we're just incrementally displaying the deltas.

Nevertheless, could you elaborate how having a document model that holding multiple shared documents will help?

I'm not sure about that, that's also why I would like your feedback. Do you think we could keep the current document models, and have a new diff-able "meta" document model that would hold instances of individual document models? It seems to me that at the minimum we need a document model that knows how to display diffs between individual document models.

fcollonval · 2024-05-14T15:29:07Z

Hmm wasn't it released as an alpha in Yjs? At least y-crdt has move_to for Array, see this issue.

I don't see it in YArray. And the reference move PR on Yjs, I have been monitoring has not seen much activity.

Or should the user see "Hello ~~John~~ ~~Mary~~ Johnny", meaning all intermediate deltas should be displayed? Then the processing doesn't increase with time, since we're just incrementally displaying the deltas.

I don't think so. Because I gave you a case with only few changes. But if we get 10 modifications on the same line it won't be readable. And again, I'm not sure the CRDT will be user friendly like ~~John~~ ~~Mary~~ Johnny.

Nevertheless, could you elaborate how having a document model that holding multiple shared documents will help?

I'm not sure about that, that's also why I would like your feedback. Do you think we could keep the current document models, and have a new diff-able "meta" document model that would hold instances of individual document models? It seems to me that at the minimum we need a document model that knows how to display diffs between individual document models.

We certainly gonna need some kind of new model. But does it need to keep track of the individual document model. I don't know. By default I would say no on the ground of keeping thing simple and less coupling is simpler.

davidbrochart · 2024-05-15T08:11:30Z

I don't think so. Because I gave you a case with only few changes. But if we get 10 modifications on the same line it won't be readable. And again, I'm not sure the CRDT will be user friendly like ~~John~~ ~~Mary~~ Johnny.

In Google Docs, the "Editing" mode corresponds to what you are suggesting: only the final result is displayed. But the "Suggesting" mode shows all intermediate changes, just like I shown.

We certainly gonna need some kind of new model. But does it need to keep track of the individual document model. I don't know. By default I would say no on the ground of keeping thing simple and less coupling is simpler.

I guess my question was really focused on a suggesting mode, where a user must be notified of other users making suggestions on the same document. In Google Docs, suggestion notifications appear on the right with a window allowing to accept it or discard it. So I would say that the new document model has to keep track of all the fork models that correspond to suggestions.

echarles · 2024-05-15T08:30:12Z

Thx for starting the discussion @davidbrochart. Is the discussion around a general diffing feature, or specific to the suggesting feature. I would assume those feature to be quite different (even if they have intersections). We could add commenting to the picture if we want to expand.

davidbrochart · 2024-05-15T08:44:32Z

My use-case is implementing an RTC suggestion system, where users can see the diffs in a unified view. Whether this can go into a general diffing system is the question.
For instance, nbdime is a diffing system for notebooks, but it doesn't fit the requirements of being real-time, nor providing a unified view (diffs are shown side by side).

davidbrochart added the enhancement label May 6, 2024

jupyterlab-probot bot added the status:Needs Triage Applied to new issues that need triage label May 6, 2024

JasonWeill added status:Needs Discussion and removed status:Needs Triage Applied to new issues that need triage labels May 7, 2024

JasonWeill mentioned this issue May 7, 2024

Weekly Triage meetings: Jan–Jun 2024 jupyterlab/frontends-team-compass#228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diff-able documents #16290

Diff-able documents #16290

davidbrochart commented May 6, 2024

fcollonval commented May 6, 2024

davidbrochart commented May 6, 2024

fcollonval commented May 7, 2024

fcollonval commented May 7, 2024 •

edited

davidbrochart commented May 7, 2024 •

edited

fcollonval commented May 14, 2024

davidbrochart commented May 15, 2024

echarles commented May 15, 2024

davidbrochart commented May 15, 2024

Diff-able documents #16290

Diff-able documents #16290

Comments

davidbrochart commented May 6, 2024

Problem

Proposed Solution

fcollonval commented May 6, 2024

davidbrochart commented May 6, 2024

fcollonval commented May 7, 2024

fcollonval commented May 7, 2024 • edited

davidbrochart commented May 7, 2024 • edited

fcollonval commented May 14, 2024

davidbrochart commented May 15, 2024

echarles commented May 15, 2024

davidbrochart commented May 15, 2024

fcollonval commented May 7, 2024 •

edited

davidbrochart commented May 7, 2024 •

edited