Skip to content

Regexes in ACLs

Henry Story edited this page Feb 1, 2016 · 15 revisions

Describing groups of resources using regexes is very powerful and it can be useful for setting AccessControl rules over large sets of resources.

rww-play's WACAuthZ class does shows how easy it is to build a naïve implementation, useful for testing out use cases, but probably too powerful for the main use cases.

The problems with full regexes are:

  • one needs to know the full url of the resource to create such regexes
  • different languages have different implementations of regexes
  • they can be turing complete

This should not stop one. Regexes are already standardised by the W3C in RDF via the POWDER spec, which also provided simpler less powerful vocabularies to enable use cases that did not require the full regex power. So one could invent a simple regular expression based on globbing such as "/*" for all resources in a folder, or "/**" for all resources in a folder and sub-folders. This could look like the following:

[] acl:accessToClass [ acl:urlPattern [  acl:base <.>; acl:match "*.acl" ]];
   acl:mode acl:Read;
   acl:agentClass foaf:Agent .

The advantage of such a pattern language is that it allows the pattern to be relative to a resource, and so to be written out even for a client that does not know the full url of the resource.

Use cases

ACLs for all resources on a server

To set access control rules on all resources on a server one could use the following regexp:

[] acl:accessToClass [ acl:urlPattern [ acl:base </>; acl:match "**" ]];
   acl:mode acl:Write, acl:Read;
   acl:agent <card#me> .

This can be very useful in allowing one to give read/write access to the owner of a domain in one acl, without needing that acl to be repeated in the acl file for every resource.

Describing the group of all ACLs

An example given in issue 51: Do we need wac:Control is the following that would allow one to define the class of all acls on a server:

[] acl:accessToClass [ acl:urlPattern [ acl:base </>; acl:match "**.acl" ]];
   acl:mode acl:Read;
   acl:agentClass foaf:Agent .

This does lead one to the question as to how the client can know what the set of all acls on a server are, given that each server can have a different name for them. The use case for this would be if say the client wants to make all acls world readable.

One answer would be that the server somehow make available in a .well-known the group of all acls like this:

<#acls> a AclGroup;
    acl:urlPattern [ acl:base </>; acl:match "**.acl" ] .

But it would be great if one could find a way to do the same without resorting to .well-known files.

Algorithm

Any resource <doc>'s acl link relation (given by the http header Link: <doc.acl>; rel="acl") specifies the acl starting point for that resource. This must be the case as that is the only way a client can find out about an acl for a resource. The client and the server must therefore start from that acl and follow any wac:include relations which should be logically mergeable monotonically to create the complete acl for that resource. Given that these mergers happen monotonically the server can stop at any moment it finds an acl that gives the user permission, in the knowledge that any other acl included cannot undo the statement added. Note: if it can be shown that ACLs can be inconsistent, then a SoLiD server may want to check the consistency of these acls before allowing a write to succeed.

Given the above, here is one way for a container to deal with defaults for new resources that works by reference using a relation wac:defaultInclude that points from an acl file to another one that will be included by default in any new resource created.

This would work like this. Consider an ldp:Container named </container/>'s whose acl file </container/.acl> includes the statement

<.> wac:defaultInclude <default.acl> .

On creation of a new resource eg, by POSTing some content to </container/> with Slug "cat", thereby creating a new resource </container/cat> and the associated </container/cat.acl> the server on finding the above wac:defaultInclude statement in the container's acl will add the statement:

<> wac:include <default.acl> .

in the created resource's acl.

Pros

  • If somone wants to change the all the resources that have a certain default, they can do so just by changing <default.acl>.
  • If someone wants a particular resource to not have the defaults, they can just remove the wac:include triple.
  • Reduces storage requirements
  • Is very maintainable
  • Is monotonic
  • client and server verification mechanisms follow the same discovery process

Cons

  • verifying an ACL will often require fetching one new default acl, but that is local so it should be very fast.
  • this does require some form of regular expression (e.g. a simple version based on globbing) as the wac:includeed default.acl needs to make statements about sets of resources such as
 [] acl:accessToClass [ acl:regex "https://jack.example/.*[.]acl" ];
   acl:mode acl:Read;
   acl:agentClass foaf:Agent .

Issues

Truth

It does mean that on a naive reading of wac:regex some acls will not actually be true of all files specified in the regular expression, as they are only valid if the resource's acl includes them using wac:include. Perhaps there is a way of thinking of the acl:regex relation in way that does not create such false statements. Perhaps it should be read as defining the subclass of resources that fit the given pattern and that whose acls are linked to via a set of wac:includes to the resource that contains the regular expression. On this reading one cannot deduce that https://jack.example/cat.acl is readable by everyone only from the acl shown in the cons section above. One also needs to know:

  • that <https://jack.example/cat.acl> exists
  • that <https://jack.example/cat.acl> has an acl link header to a resource that through a chain of wac:includes refers back to <default.acl>

Better Pattern Languages

The problems with full regexes are:

  • one needs to know the full url of the resource
  • different languages have different implementations of regexes
  • they can be turing complete

This should not stop one. Regexes are already standardised by the W3C in RDF via the POWDER spec, which also provided simpler less powerful vocabularies to enable use cases that did not require the full regex power. So one could invent a simple regular expression based on globbing such as "/*" for all resources in a folder, or "/**" for all resources in a folder and sub-folders. This could look like the following:

[] acl:accessToClass [ acl:urlPattern [  acl:base <.>; acl:match "*.acl" ]];
   acl:mode acl:Read;
   acl:agentClass foaf:Agent .

This should be read as saying that everybody can read all resources that match the pattern "*.acl" in the current directory, and for which this is an acl through wac:include chain from the resource's acl. (in this case this is a rule on acls)

To allow all files to be readable and writeable by the owner in this folder and sub-folders one could use

[] acl:accessToClass [ acl:urlPattern [  acl:base <.>; acl:match "**" ]];
   acl:mode acl:Read, acl:Write;
   acl:agent </card#i> .

assuming of course the user's WebID is </card#i> . Note again that the acl:urlPattern gives a class that is larger than the class of resources for which this is true, as the only resources for which that rule is valid are those whose acls link to the acl in which this is written.

The advantage of such a pattern language is that it allows the pattern to be relative to a resource, and so to be written out even for a client that does not know the full url of the resource.