Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to parse a csv file with a metadata file #13

Open
bblfish opened this issue Jul 5, 2022 · 6 comments
Open

how to parse a csv file with a metadata file #13

bblfish opened this issue Jul 5, 2022 · 6 comments

Comments

@bblfish
Copy link

bblfish commented Jul 5, 2022

I have a csv file and a metadata file on my file system. With csv2rdf I can write

csv2rdf -t data/pplEx.csv -u pplEx.csv-meta.json -m minimal

in order to transform data/pplEx.csv using the pplEx.csv-meta.json metadata file. This will return ntriples for the csv file.

I can't work out how to do the same with rdf-tabular. All the examples use http urls which gives me the impression that one first has to setup special headers in the csv http headers. IS that right, or have I missed the command line needed when debugging.

I wanted to see if rdf-tabular had more advanced features than csv2rdf. For example I was interested to see what I need to do to get foreign keys to work. This is the csv file

Id,Name,DoB,Sex,mother
1,Linus,02-07-2016,male,4
2,Oliver,02-07-2016,male,4
3,Anaïs,10-09-2014,female,4
4,Gordana,30-05-1982,female,

this is the metadata file

{ 
  "@context": [ "http://www.w3.org/ns/csvw", { "@language": "en"} ],
  "dc:title": "example people data",
  "tableSchema": {
   "@id" : "http://example.com/",
   "columns": [
     {
        "name": "Id"
      }, {
        "name": "Name",
        "datatype": "string"
     },  {
        "name": "DoB",
        "datatype": {
          "base": "date",
          "format": "dd-MM-yyyy"
        }
      }, {
        "name": "Sex",
        "datatype": "string"
     }, {
        "name": "mother"
      } ],
    "primaryKey":"Id",  
    "foreignKeys": [{
        "columnReference": "mother",
        "reference": {
	  "schemaReference": "http://example.com/",
          "columnReference": "Id"
        }
      }]
   }
}
@gkellogg
Copy link
Member

gkellogg commented Jul 5, 2022

Typically, the metadata is looked for in the same place as the CSV (or visa-versa). For example, if you were to clone the repo and install rdf-tabular, rdf-turtle, and rdf gems, you can do the following:

rdf serialize --input-format tabular --output-format ttl etc/doap.csv

You should also be able to do essentially the same thing on the distiller, using the text forms for both the CSV and Metadata.

@gkellogg
Copy link
Member

gkellogg commented Jul 5, 2022

Note that your metadata will need a url referencing the CSV (a requirement I never was really on board with), and you may have issues with your foreign keys referencing a non-specified table. Start of simple, and add features as you go. The CSVW Repo has a bunch of examples.

@bblfish
Copy link
Author

bblfish commented Jul 5, 2022

yes, I am trying to put together the simplest foreign key example: the one where the key is in the same table schema as the table itself. I could not find that described anywhere in the w3c docs. I did I think find an example in the github repo, but I did not seem to get any interesting result with csv2rdf. So I am not sure...

My use case is also to think of csvw as a schema language which could be reused on an open-ended number of matching CSV files. So for that it helps to be able to pass the metadata file from the command line, not just the http header. In any case I think for command line exploration of the tool being able to pass it as an argument is really useful. It would have been too difficult to get going without that. (I opened a similar issue on the python implementation)

@bblfish
Copy link
Author

bblfish commented Jul 5, 2022

After placing the metadata file with the name pplex.csv-metadata.json in the same directory as the csv file i was able to produce the RDF with

rdf serialize --input-format tabular --output-format ttl  pplex.csv 

What I was hoping to get using the foreignkeys construct was to see if it would align the blank nodes without specifying any URL.
But I guess foreignkeys only really work with the valueUrl and aboutUrl fields set.

I am trying to put together a demonstration where I can show that just by setting <#Id> to be an owl:InverseFunctional property I can avoid having to specify URLs for subjects or values. Ie I was trying to get the following output from the table above:

[]  <#Name> "Anaïs" ;
    <#DoB> "2014-09-10"^^<http://www.w3.org/2001/XMLSchema#date> ;
    <#Id> 3 ;
    <#Sex> "female" ;
    <#mother> [ <#Id>  4 ] .

But I don't think that is possible even with virtual columns.

@gkellogg
Copy link
Member

gkellogg commented Jul 5, 2022

That's probably worth filing in w3c/csvw for some hypothetical future group to take up, and to get more visibility from those who watch it.

@bblfish
Copy link
Author

bblfish commented Jul 5, 2022

I write something up here: w3c/csvw#885

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants