Skip to content

isdata-org/Industrial-Symbiosis-Word-Vectors

Repository files navigation

Industrial-Symbiosis-Word-Vectors

Data accompanying the paper "Machine Learning Assisted Industrial Symbiosis: Testing the Ability of Word Vectors to Estimate Similarity for Material Substitutions" Davis, C. & Aid, G. (2021.)

We have included the derived data to produce the figures and tables here. Derived data used to create figures and charts

Data used to derive the article figures, Table 3, as well as further figures in this Supporting Information document are stored on this Industrial Symbiosis Data Github repository at https://github.com/isdata-org/Industrial-Symbiosis-Word-Vectors. This repository holds several sets of tabular (csv) data which are explained here as well as in the repository’s introduction.

Figure 3, comparing the NPMI vs cosine similarity of word vectors for the TEDA EIP was created using the following three files in the repository: cosine sim site matrix.csv (cosine similarity values) npmi site matrix.csv (NPMI values) verified site matrix.csv (boolean matrix of verified exchanges)

Figure 4, Looking at Precision, Recall, and F-Measure was produced from the data labeled NPMI vs Word Vectors cutoff metrics.csv which contains statistics on Precision, Recall, F-measure, along with true positives, false positives, true negatives, false negatives.

Figure 5, showing the Receiver Operating Characteristic (ROC) Curved was produced from the data labeled NPMI vs Word Vectors cutoff metrics.csv in the repository.

Table 3, showing exchange recommendations for the TEDA EIP was produced from the data la- beled all exchanges.csv in the repository.

Supporting Figures in S.4 were produced from the data labeled NPMI vs Word Vectors cutoff metrics.csv in the repository.

Supporting Figures in S.5 comparing the NPMI vs cosine similarity of word vectors for eco-industrial parks were created using the following three files in the repository: cosine sim site matrix.csv (cosine similarity values), npmi site matrix.csv (NPMI values), verified site matrix.csv (boolean matrix of verified exchanges).

Regarding the corpus data (the patent and academic articles), we cannot share the full corpus of this literature, as much of the material is from copyrighted (closed or restricted access) academic and patent databases. This corpus data are available from AcclaimIP (patents corpus) and the Elsevier's Text Mining API (scientific articles). Restrictions apply to the availability of these data, which were used under license for this study. Data are available at https://www.acclaimip.com/ and https://dev.elsevier.com/ respectively with the access granted from Acclaim and Elsevier.

About

Data accompanying the paper "Machine Learning Assisted Industrial Symbiosis: Testing the Ability of Word Vectors to Estimate Similarity for Material Substitutions"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published