The CSV on the Web W3C recommendations specify, how to describe CSV files published on the Web using JSON-LD descriptors containing important metadata, such as column names, data types and more. The goal of this project is to implement a validator of CSV files based on the CSV on the Web W3C recommendations.
This project achieves the goal and adds several extra features. Most of the project is implemented in Java, specifically Java version 17.0.1 is used and tested. The core is in a library. The library can be used with the included command line interface application. The project also contains a web application implemented in JavaScript. The web application provides a graphical user interface for better user experience. There is a Java web worker in the backend of the web application. It uses the library to process requests from the web. The project uses Docker to simplify development and deployment, but it can be used without Docker as well.
The library contains the core functionality of this project. It performs the validation and other supporting tasks. The library is used via the manager in the class Manager
in the main package io.github.janjanda.otava.library
. The manager provides simple public methods for users, and it hides the internal complexity of the library. The names of the methods are self-explanatory, and there are comments with additional instructions. The library is ready for international environment. Users can set a locale in the manager. Users can use implemented locales in the subpackage locales
or implement their own locale. The default locale is English, and there is also a Czech locale.
There are three styles of the validation: tables only, descriptors only, and full validation. Each style has a corresponding method with comments and descriptions in the manager. The validation itself is implemented in many individual classes. This ensures great extensibility. Each class represent a so-called validation check, and it checks a particular feature of tables and descriptors. Some checks do not make sense if other checks fail. For this reason, a check can depend on other checks. These dependencies create a tree, and the checks in the tree are executed from leaves to the root. If more checks depend on one check, the one check appears on more places in the tree, but it is executed only once for effectiveness. The manager can print the validations trees for each style of the validation. The methods for the validation itself require one parameter with a validation request. A validation request contains documents and other input for the validation, and it has a convenient builder. The validation returns a validation report with results and other data. Each validation check creates one result. The report can be shown in several formats. If a validation cannot be performed, an exception is thrown. The exception can also be shown in several formats just like the report.
It is important to understand the documents in the library. There are following types of documents: active table, passive table, active descriptor, and passive descriptor. An active table attempts to locate and load its descriptor, a passive table does not. An active descriptor attempts to locate and load its tables, a passive descriptor does not. Every document is either local or online. Every document has a name. A name is an actual location of the document (path in a filesystem or URL). Every document may have an alias. An alias is an alternative name. If an alias is present, it is used in the validation instead of the real name. This is useful when a user wants to validate a table which has not been published yet. The name of the table is the path to the actual file, and the alias is the future URL of the table. The table is located and loaded according to its name, but it pretends that it is on a different location according to the alias. The same thing can be done with a descriptor. Tables are either in memory or out of memory. In-memory tables are loaded completely into memory, out-of-memory tables are not. This can be used to adjust performance of the library.
The command line application provides user access to the features of the library. It can be compiled and executed respectively with the following commands in the directory java-code
.
mvn package
java -jar cli/target/cli-1.0-jar-with-dependencies.jar -help
The option -help
shows possible options and their syntax with arguments. The names and descriptions of the command line options clearly refer to the corresponding concepts in the library from the previous section.
Perform full validation with the provided active online descriptor, and write the report as plain text.
java -jar cli/target/cli-1.0-jar-with-dependencies.jar -full -text -aod https://w3c.github.io/csvw/tests/test011/tree-ops.csv-metadata.json
The following examples are executed in the directory java-code/library/src/test/resources
.
Perform full validation with the provided passive local in-memory table and the provided passive local descriptor, and write the report as plain text. The table and the descriptor have both name and alias because the links in the descriptor would not be valid otherwise.
java -jar ../../../../cli/target/cli-1.0-jar-with-dependencies.jar -full -text -plit tables/table005.csv;https://example.org/tree-ops.csv -pld metadata/metadata001.json;https://example.org/metadata001.json
Perform tables only validation with the provided tables, and write the report as plain text. It is possible to validate multiple documents together.
java -jar ../../../../cli/target/cli-1.0-jar-with-dependencies.jar -tables -text -plit tables/table002.csv -plit tables/table013.csv -plit tables/table014.csv -plit tables/table015.csv
This section describes the deployment process of the web application in ten easy steps.
- Make sure that Docker Engine and Docker Compose are running and ready on your machine.
- Choose a directory for the application.
- Open your command line in that directory.
- Run
git clone https://github.com/JanJanda/otava.git
to download the source code of the project. - Use
cd otava
to go to the directory of the project. - Edit the downloaded file
docker-compose.yml
like so: Come up with an original password, and replace every occurrence ofmy-password
in the file with your password. - Optional: Increase the
replicas
value underweb_worker
in the filedocker-compose.yml
to spawn more workers and increase validation performance. - Run
docker compose up
in the directoryotava
to start everything and wait. It can take a long time. Command line prints "Database is ready" and stops printing other text when the application is ready. - Open your web browser and go to the address
localhost
to see the web application. - Open another command line in the
otava
directory and rundocker compose down
to stop everything.
The application uses the created directory database-volume
to store its data. An instruction manual is in the web application. Repeat the steps 7-10 to start the application again. If you want to deploy a new version of the application, delete the two built otava images, run git pull
in the directory otava
to download the new version from this repository, and then use the steps 7-10.
There is a C4 diagram of the web application in the directory web-app
. It can be rendered with Structurizr. There is an Openapi documentation of the server web interface in the directory web-app/server
. It can be rendered with Swagger.