Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add offset to pull() #23

Open
Fishrock123 opened this issue Jun 5, 2019 · 7 comments
Open

Proposal: add offset to pull() #23

Fishrock123 opened this issue Jun 5, 2019 · 7 comments

Comments

@Fishrock123
Copy link
Owner

From the collab summit, after talking with @jasnell extensively about QUIC, I think it would be useful to have pull support an offset to ask the source to read from (the source may choose to still return whatever data it chooses).

This should allow for ACKs in a network implementation (i.e. if you need to ACK, you just request the last / desired offset again).

It also would support a level of content-addressability. A sink or downstream intermediate could be the one to inform a file source of where to read from.

Relatedly, this may also make disk-based sources require less state...

Fishrock123 added a commit that referenced this issue Jun 5, 2019
Fishrock123 added a commit to Fishrock123/fs-source that referenced this issue Jun 6, 2019
@Fishrock123
Copy link
Owner Author

Example in fs-source: Fishrock123/fs-source#1

@jasnell
Copy link
Collaborator

jasnell commented Jun 6, 2019

For QUIC, the pattern will be:

  1. Sink pulls multiple chunks of data from Source
  2. At some arbitrary point in time in the future, Sink notifies Source that a given chunk has been acknowledged (by offset and length). These offsets may not necessarily match the precise offsets and lengths from the read. That is, the Sink may perform three pulls covering ranges 0-9,10-19,20-29 and may give two acks covering ranges 0-14,15-29.
  3. Once a given range has been Ack'd, it should no longer be possible to pull from that range again. For instance, once the above Acks are received, pulling from any offset <=29 should be an error.
  4. At some arbitrary point in time in the future, the Sink may attempt to re-pull chunks that have not been acked.

To illustrate a different way:

Imagine a Source with a 50 byte buffer of data available to send.

Sink performs five pulls of 10 bytes each.

Within 1 second, Sink receives Acks for 0-29 and 40-49, but no Ack for range 30-39. The Sink would then re-pull 30-39 and send it again, but any attempt to send from an already acknowledged range would ideally be an error.

To simplify this case, it is acceptable to say that pulls for any range larger than the lowest unacknowledged range are ok to read from again... that is, given the above scenario, even tho we already have an Ack for range 40-49, it is ok for the Sink to pull that range again. This allows for a more naive retransmission approach.

It must be possible to repeatedly pull a unacknowledged chunks as many times as necessary.

PULL 0-9, CONTINUE!
PULL 10-19, CONTINUE!
PULL 20-29, CONTINUE!
PULL 30-39, CONTINUE!
PULL 40-49, END!
...
ACK 0-19
ACK 20-29
ACK 40-49
...
<QUIC DATA LOSS TIMER FIRES>
PULL 30-49, END!
PULL 0-29, FAIL! (Data is not available)

In this case, the ACK is an explicit statement that that chunk of data was successfully received and processed and can be discarded. I'm not sure I follow the idea that if you need to ACK, you just request the last / desired offset again

@Fishrock123
Copy link
Owner Author

Fishrock123 commented Jun 6, 2019

  1. Presently incompatible with bob
    • This was done to ensure that flow could always be a single logical ongoing operation
  2. I am unsure what if any implication this has for us at this level.
    • An intermediate passthrough should be able to handle this?
  3. An intermediate passthrough will have to handle this.
    • Erroring would kill the whole stream, though. Maybe that component would have to only feed back a network error packet, or did you mean something more?
    • EDIT: Would also require an extra (probably optional) parameter to pull().
  4. Offset should handle this part, assuming your source supports it.

I'm not sure I follow the idea that if you need to ACK, you just request the last / desired offset again

I misunderstood ACK as the retransmission request, from this example it seems it is the finalization notification, i.e. cache clear notification.

That would require something else other than just offset...

Could that be an extension? (If you even think the idea of extensions is reasonable!) - i.e. Is this applicable to other stream types?

@Fishrock123
Copy link
Owner Author

e.g.

const socket_outgoing = Stream(ack_handler, udp_sink)

Stream(fs_source, socket_outgoing).start()

@jasnell
Copy link
Collaborator

jasnell commented Jun 6, 2019

This could be implemented with an additional flag parameter on Pull().

class Base {
 public:
  typedef enum PullFlags {
    BOB_FLAG_CLEAR = 0;		// Source may clear the data once read
    BOB_FLAG_RETAIN = 1;    // Source should retain the data once read
    BOB_FLAG_IGNORE = 2;    // Sink will not be waiting for the pushed data
  };
  virtual ~Base() = 0;

  virtual Base* BindSource(Base* source) = 0;
  virtual void BindSink(Base* sink) = 0;
  virtual void Next(
    int status,
    void** error,
    char* data,
    size_t bytes,
    uint64_t offset) = 0;
  virtual void Pull(
    void** error,
    char* data,
    size_t size,
    uint64_t offset,
    int flags = 0) = 0;
};

Given this definition if I call:

source->Pull(&err, &buf, 10, 0)

It continues to act exactly as it does today...

But if I call:

source->Pull(&err, &buf, 10, 0, BOB_FLAG_RETAIN)

The Source [edited] is being instructed to hold on to that chunk of data.

To Ack (and release the chunk) without having to wait for it to be delivered, I would call

source->Pull(&err, &buf, 10, 0, BOB_FLAG_CLEAR | BOB_FLAG_IGNORE)

The Source implementation would determine for itself if ranges can be cleared out of order, or whether it even supports the retain and ignore flags. If it doesn't, then a simple error can be returned.

@Fishrock123
Copy link
Owner Author

The Sink is being instructed to hold on to that chunk of data.

(I assume you mean source.)

The source may not support that - i.e. it may not buffer - what happens then?


Also I made a slight mistake for 3. - we would need some extra, probably optional, parameter to pull()...

@jasnell
Copy link
Collaborator

jasnell commented Jun 6, 2019

Yes, I meant Source. And if the Source does not support retaining the data, then upon seeing the BOB_FLAG_RETAIN flag used, it can simply error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants