Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

Set metadata directly in the LDPC #141

Open
sylvainlb opened this issue Feb 21, 2015 · 15 comments
Open

Set metadata directly in the LDPC #141

sylvainlb opened this issue Feb 21, 2015 · 15 comments

Comments

@sylvainlb
Copy link

It would save a lot of web requests if we could include in the LDPC some metadata about the LDPRs is contains. The basic example would be a rdfs:label that would enable to list the elements to the user without fetching all of them first.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

yes, the problem is how can the server decide which metadata it should set for all resources?
With NQuads or N3 one could just quote part of the content of the files ( but which part? ). Eg:

<> ldp:contains <res1> .
<res1> log:includes { <res1> a foaf:PersonalProfilePage . }

But that could slow down quite dramatically the response of the server - and the response feels arbitrary.

The best solution imho is to have as metadata just what is usually considered file system metadata. Ie: data that answers:
• what type of file is there file ( image, video, text, data )
• who made the file ( the creator )
• what size is it
• ...

We currently do serve some of this infomration with some pretty simple ontology that other rww servers use, but that is not very satisfactory, as you can see if you curl an LDPC

<test/> <http://www.w3.org/ns/posix/stat#mtime> "1382465690000"^^<http://www.w3.org/2001/XMLSchema#integer> ;
    a <http://www.w3.org/ns/ldp#Container> .

One may also want to use some of the vocabulary used by Activity Streams 2.0 Working Draft which is I think an rdfized version of Atom.

But then the other solution would be to instead make use of the QUERY/SEARCH method that I have implemented, though that would require that a SEARCH on a container can return information about its ldp:contents . If the SEARCH query is not too complex that should at least allow the client to get what he needs for a particular query. Note that that means the etag of a container must change if any of the contents changes.

Note that usually I think LDP Containers are not what other resources of interest are pointing at. They will be pointing at the contents of LDPRs that are contained within. So one other answer may be to just create an LDPR and put all the contents in there. For example if I create my personal profile, I don't think it is a good idea to think of my Profile document - call it <card> - as a container, such that when I add a relation of foaf:knows I am doing anything like POSTing a document to a container. Because there are then an infinite amount of relations I may want to add to my <card> and each of those relations. Note also that most people will not be pointing to my container in order to point at me, but should be using <card#me> directly. The LDPC group got quite tangled in those scenarios, in the end creating a number of what the called LDP Direct containers in order to allow each one to create a different type of relation. That is the way to craziness it seems to me.

As you can see there are a number of ways of getting things done here.
What is your particular use case?

@sylvainlb
Copy link
Author

Right. Probably a combination of basic metadata and a QUERY for more advanced info would work.

My use case is a todo list. As explained in #140, I'm using a LDPC so that we can include todos from other servers. Currently I need to fetch the LDPC, then fetch every single todo so that I can display its rdfs:label in the html.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

I suppose you want to have each TODO in its own file so that you can give each one its own ALCs.
Otherwise one could just create a file of TODOs and PATCH that file with new TODOs. That would also make it easy to to link to TODO's on other servers.

@sylvainlb
Copy link
Author

Yes, actually that's what I first did. But I figured it would make more sense to have each todo being a LDPR, for acl reasons, but also for the modularity reason explained in #140.

In general, I wonder if there is a best practice on how to decide at which level to put LDPC and LDPR.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

These are all good questions and we should somehow capture this on the wiki.
I think the choice between putting info in one LDPR or creating more of them, has to be answered by looking at HTTP and caching. The more different LDPRs you have the more requests it may take to download them. But if they don't change a lot, you'll get a lot of improvement with caching. So presumably each LDPR should be rather atomic, which would go in the direction of one todo per LDPR.
For pictures it is really easy: you cannot put two pictures at one URL. For data it is always more complex. I suppose each document should be consistent, in that if any data in it changes, anyone following that document should get a new copy.

@sylvainlb
Copy link
Author

That makes sense. Thinking of documents and user interest in data is probably the way to go. In this example it is probably on LDPR per todo. So getting the list with all labels in a single request makes sense.

Is it possible to get all the content of a LDPC in one graph in one request?

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

Is it possible to get all the content of a LDPC in one graph in one request?

I think that will require adding the SEARCH feature that goes beyond the LDPC.

But then I don't think a SPARQL 1.1 CONSTRUCT query can do this btw... The CONSTRUCT query does not allow subgraphs in the result - which is pretty weird given that this has been available in N3 since the beginning.

What would be needed would be something like this

CONSTRUCT { <> ldp:contains ?ldpr .
             ?ldpr log:semantics { ?a ?r ?b} }
WHERE {
   <> ldp:contains ?ldpr.
   ?ldpr a ldp:Source;
   Graph ?ldpr { ?a ?r ?b }
}

But that does not work.
Perhaps @betehess can think of something.

Sadly the point to bring these issues up was before the end of the WG. Still these can be brought up for the next LDP WG, and for the Social Web one and others.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

One could argue that if one requested the LDPC in a format that allowed for subgraphs, then that would be equivalent to allowing the LDPC to return the content of the contains. But json-ld is a quad format, and it is specified for LDP and there is no requirement that all the contents should also be downloaded. In any case requiring that behavior for all LDPCs would be pretty bad, as it would probably make it impossible for a server to respond for large containers - unless paging is included...

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

Ok, so a SEARCH with a SPARQL Construct won't do, so then a well defined SPARQL SELECT that would just return the LDPRs that match a certain pattern, and one could then do a GET on all of those in one go using SPEEDY. We don't suppport SPEEDY yet, but that may not be that far out.

@betehess
Copy link

While at Pellucid, we had to tackle the exact same problem. Our solution was very simple: the data for the TODOs would be inlined along with the data related to the container. No named graph. No reification.

This works very well if you assume that everybody does Linked Data, which is a bit more that just passing some RDF around. The assumption is that the authoritative source for the data attached to a RDF resource is the corresponding document for that resource.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

That works @betehess if you are in a closed world, not if you are in an open world, where others may add data to the database, as that would otherwise potentially lead to contradictions in the data. You can not disquote data automatically. Mergers between graphs can only occur in RDF if both graphs are thought to be true, and one cannot assume that for all data.

@betehess
Copy link

In that case I think I have misunderstood the initial question.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

Mhh, I should not put what I said in terms of "closed" and "open" world as that is a different issue.

Rather as the conclusion of my argument pointed out, the ability to inline the data without quotation ( ie. without named graphs ) in an automatic way, requires the server to know that all the contents are in principle coherent and true. This requires a high degree of oversight of the data, which might just be the case in a company like Pellucid. But on the web if I want an LDPC to be writeable by different agents, or if I don't want to do a full coherence check of all the data in an LDPC ( which would require specialised knowledge for each vocabulary) then I need to be more careful about when I merge data. I described these issues in detail on the LDP mailing list as these issues came up.

In a generic way a client ( or a server ) cannot tell from a container being an LDP Basic Container, what types of constraints need to be put on the data for it to be coherent. So as far as BasicContainers go it is not really possible to automate the process of inlining without making assumptions. Those assumptions may hold but they have to be passed out of band, or one would need another ontology and some way of giving that info to the server, and that would require a more complex server, as well as agreement of how such containers would work.

To keep things simple and flexible I think it is easier to allow a query on a container to run over the named graphs it contains, and then for explicit GETs to be made for each resource by the client. But that does require adding that feature to the server.

@betehess
Copy link

Mhh, I should not put what I said in terms of "closed" and "open" world as that is a different issue.

That is what I initially thought. I still have some troubles to reconnect your comments with the problem outlined by Sylvain.

The thing is: the burden of "assessing the truth" is always on the client. Using named graphs, or triple/graph reification techniques, or provenance ontology, etc. only help you encoding assessments about the truth, but they do not help you making decision about the truth. You cannot change that.

At the end of the day, the only good way to know the truth is to interact with the relevant resources on the Web. That is something outside of the RDF model/semantics, and that's where Linked Data begins. In that world, I do not see what named graphs really bring to the party, and inlining linked data is just plain enough. That's what we were doing at my previous job, and it really was ok.

Anyway, that is my take on that and I don't think I can provide a better feedback than that.

@bblfish
Copy link
Member

bblfish commented Feb 21, 2015

The thing is: the burden of "assessing the truth" is always on the client. Using named graphs, or triple/graph reification techniques, or provenance ontology, etc. only help you encoding assessments about the truth, but they do not help you making decision about the truth. You cannot change that.

Indeed. That was my argument at Scala eXchange 2014 in London.
That is why the server cannot inline information in the LDPC, which would require it to make a decisions as to the truth of the contents. It is in no danger if it quotes the content of the graph. (If you look carefully that is what the Atom Syntax does with its content element. It quotes the content of the resources.)

At the end of the day, the only good way to know the truth is to interact with the relevant resources on the Web. That is something outside of the RDF model/semantics, and that's where Linked Data begins. In that world, I do not see what named graphs really bring to the party, and inlining linked data is just plain enough. That's what we were doing at my previous job, and it really was ok.

Named graphs or just N3 graphs allow you to quote, or in other words to work with contexts ( that can also be thought of as a form of modal logic ). It is really just the same as using a literal, except that it is easy to parse. It is really worth looking at Guha's thesis Contexts: a Formalisation and some applications. It's the difference between saying "Laura Lane believes Klark Kent is a boring journalist" and "Laura Lane believes Superman is not a journalist". Even though Clark Kent and superman are names that co-refer, and you can in logic subsitute co-referential terms sava veritate, you cannot do so in belief contexts. And this is what one has to take care of in these areas.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants