-
Notifications
You must be signed in to change notification settings - Fork 0
/
publications.html
349 lines (313 loc) · 17.9 KB
/
publications.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="icon" type="image/x-icon" href="favicon.ico">
<title>Sustainable use of Brazilian Biodiversity – Using Linked Data for Natural Product Discovery</title>
<!-- Styles -->
<link rel="stylesheet" href="assets/css/simple-grid.css">
<link rel="stylesheet" href="assets/css/dinobbio.css">
<style>.en {display:block}</style>
</head>
<body class="fixed-top-nav ">
<!-- Header -->
<header>
<div class="page">
<!-- Nav Bar -->
<nav id="navbar">
<div id="menu" class="row">
<div class="col-12 header-color">
</div>
<div class="col-6 logo">
<a href="index.html" target="_self">
<img src="images/Logo-Dinobbio.png"/>
<div class="logo-name">
DINOBBIO
</div>
</a>
<div class="logo-text hidden-sm hidden-md">
Discovering New Natural Inspired<br/>Products from Brazilian Biodiversity
</div>
</div>
<div id="language" class="col-6">
</div>
</div>
<div class="row main-menu-nav">
<!-- Navbar Toggle für mobile-->
<div class="col-2 hidden-sm hidden-md"></div>
<input id="navbar-toggler" type="checkbox" style="display:none"/>
<label id="burger-menu" for="navbar-toggler" class="hidden-full hidden-md"><span></span></label>
<div class="col-8 col-12-md col-12-sm right" id="main-menu">
<ul class="navbar-nav main-menu ml-auto flex-nowrap">
<!-- About Startseite -->
<li class="nav-item">
<a class="nav-link" href="index.html">
about
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="projects.html">
projects
</a>
</li>
<li class="nav-item active">
<a class="nav-link" href="publications.html">
publications
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="talks.html">
talks
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="partners.html">
partners
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="blog.html">
blog
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="contact.html">
contact
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="privacy.html">
privacy
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="imprint.html">
impress
</a>
</li>
</ul>
</div>
<div class="col-2 hidden-sm hidden-md"></div>
</div>
</nav>
</div>
</header>
<content>
<div class="page">
<div class="row">
<div class="col-2 hidden-sm hidden-md"></div>
<div class="col-8 col-12-md title">
<h1>
<div>Sustainable use of Brazilian Biodiversity</div>
<div class="title-line hidden-sm"></div>
<div>Using Linked Data for</br>Natural Product Discovery</div>
</h1>
</div>
<div class="col-2 hidden-sm hidden-md"></div>
</div>
<!-- colored line before content-->
<div class="row">
<div class="col-1">
<div class="content-line"></div>
</div>
<div class="col-1"></div>
<div class="col-10">
<div class="content-line hidden-sm"></div>
</div>
</div>
<div class="row">
<div class="col-2 content-left">
<h2 class="page-title">
publications
</h2>
</div>
<!--content-->
<div class="col-8 content-center">
<div class="row">
<div class="col-10 publications">
<h4>Development of a novel chemoinformatic tool for natural product databases
</h4>
<p>
Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx<br/>
</p>
<p class="abstract">
Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.
</p>
<pre class="bibtex">
@inproceedings{bike2023,
author={Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx},
booktitle={Future Drug Discovery, Vol. 5, No. 2},
title={Development of a novel chemoinformatic tool for natural product databases},
year={2023},
}
</pre>
<h4>Preface of the First International Biochemical Knowledge Extraction Challenge (BiKE)
</h4>
<p>
Edgard Marx, Marilia Valli, Joao da Silva e Silva, Sanju Tiwari, Paulo do Carmo<br/>
</p>
<p class="abstract">
The knowledge of over 50 years of studies on biodiversity available in scientific articles can
become easier accessible when organized and shared through knowledge graphs. It can assist
in the development of different fields of science and bio-friendly products with high added
value as well as guide public policies to bring benefits both to science and to strengthen the
bio-economy. However, to date, most of the structured biochemical information available on
the Web is manually curated, and it is practically impossible to keep pace with the research
being constantly published in scientific articles.
The First International Biochemical Knowledge Extraction Challenge (BiKE) aims at accelerating and promoting the research on automatic biochemical knowledge extraction mechanisms
by the Semantic Web scientific community to increase the information available on natural
products and contribute to the development of environmental-friendly products while increasing the community awareness of the biodiversity value. The following papers were accepted
for publication and presented at the workshop:
• BiKE Challenge: Result of ChemiScope by using ChatGPT
• Improving Natural Product Automatic Extraction with Named Entity Recognition
• Enhancing Biochemical Extraction with BFS-driven Knowledge Graph Embedding approach
</p>
<pre class="bibtex">
@inproceedings{bike2023,
author={Edgard Marx, Marilia Valli, Joao da Silva e Silva, Sanju Tiwari, Paulo do Carmo},
booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)},
title={Preface of the First International Biochemical Knowledge Extraction Challenge (BiKE)},
year={2023},
}
</pre>
<h4>Improving Natural Product Automatic Extraction With Named Entity Recognition</h4>
<p>
Stefan Schmidt-Dichte, István Mócsy<br/>
</p>
<p class="abstract">
Knowledge graphs (KGs) play a vital role in providing structured data for various applications, but their creation is time-consuming and prone to errors. To address these challenges, automatic knowledge extraction methods using machine learning (ML) have gained attention. ML algorithms have shown promise in capturing subtle nuances in language data, offering comprehensive and robust solutions. In the field of biochemistry, knowledge extraction is crucial for advancing scientific research, product development, and policy-making. The First International Biochemical Knowledge Extraction Challenge focuses on extracting biochemical knowledge from scientific articles. This paper presents an updated approach that incorporates named entity recognition (NER) using scispaCy models to improve the accuracy and relevance of extracted entities. The evaluation of the approach utilizes the NatUKE benchmark and demonstrates improved performance in extracting bioactivity and isolation type. However, challenges remain in identifying compound names and species. Future research may explore hybrid approaches combining different techniques to address these specific challenges.
</p>
<pre class="bibtex">
@inproceedings{bike2023,
author={Schmidt-Dichte, Stefan and M{\'o}csy, Istv{\'a}n J},
booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)},
title={Improving Natural Product Automatic Extraction With Named Entity Recognition},
year={2023},
}
</pre>
<h4>Leveraging ChatGPT API for Enhanced Data Preprocessing in NatUKE</h4>
<p>
Pit Fröhlich, Jonas Gwozdz, Matthias Jooß<br/>
</p>
<p class="abstract">
This scientific paper presents an approach for enhancing the performance of machine learning models by
utilizing ChatGPT, a state-of-the-art language model developed by OpenAI, for data preprocessing. The
study focuses on the existing Project NatUKE (A Benchmark for Natural Product Knowledge Extraction
from Academic Literature) and investigates the impact of incorporating ChatGPT in the preprocessing
pipeline. By leveraging the natural language processing capabilities of ChatGPT, we aim to improve the
quality and relevance of the data used as input for the knowledge graph embedding algorithms. This
paper provides a detailed description of the methodology employed, the experimental setup, and the
results obtained, highlighting the benefits and limitations of this approach.
</p>
<pre class="bibtex">
@inproceedings{bike2023,
author={Fr{\"o}hlich, Pit and Gwozdz, Jonas and Joo{\ss}, Matthias},
booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)},
title={Leveraging ChatGPT API for Enhanced Data Preprocessing in NatUKE},
year={2023},
}
</pre>
<h4>Assessing Bias on Entity Retrieval Models through Conjunctive Fallacies</h4>
<p>
Edgard Marx<br/>
International Conference on Semantic Computing, 2023
</p>
<p class="abstract">
Information retrieval methods, machine learning models, and humans can suffer from a failure in judging information representativeness. We refer to this problem as information bias. In this work, we propose a method to evaluate information bias through conjunctive fallacies. An experimental evaluation of different state-of-the-art entity retrieval models and human-curated benchmarks shows that both methods perform poorly on judging query-entity representativeness while statistically based methods perform considerably better than humans.
</p>
<pre class="bibtex">
@inproceedings{icsc2023informationBias,
author={Marx, Edgard},
booktitle={2023 IEEE 17th International Conference on Semantic Computing (ICSC)},
title={Assessing Bias on Entity Retrieval Models through Conjunctive Fallacies},
year={2023},
volume={},
number={},
pages={260-261},
doi={10.1109/ICSC56153.2023.00050}
}
</pre>
<h4>NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature</h4>
<p>
Paulo Viviurka do Carmo, Edgard Marx, Ricardo Marcacini, Marilia Valli, João Victor Silva e Silva, Alan Pilon <br/>
International Conference on Semantic Computing, 2023
</p>
<p class="abstract">
This work introduces a benchmark for natural product knowledge extraction from academic literature and evaluates different, state-of-the-art unsupervised embedding generation methods for this task. We show that it can automatically extract chemical compound characteristics from academic literature with an unsupervised pipeline based on graph embedding methods. We evaluated Four methods (DeepWalk, Node2Vec, Metapath2Vec, and EPHEN) in a similarity-based graph completion evaluation scenario. EPHEN achieves reasonable hits@k performance at bioactivity and isolation type extraction with 0.64 when k = 5 and 0.75 when k = 1, respectively. Meanwhile, Metapath2Vec was the best performer, but with underwhelming results, when extracting compound name and specie with 0.20 and 0.44 when k = 50, respectively. These results show that using text data and previously extracted knowledge from the knowledge graph provides the most stable performance. They also show us that some characteristics from these papers are more challenging to extract than others, and using the knowledge graph topology as context data helps in these scenarios. </p>
<pre class="bibtex">
@inproceedings{icsc2023natuke,
author={Do Carmo, Paulo Viviurka and Marx, Edgard and Marcacini, Ricardo and Valli, Marilia and Silva e Silva, João Victor and Pilon, Alan},
booktitle={2023 IEEE 17th International Conference on Semantic Computing (ICSC)},
title={NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature},
year={2023},
volume={},
number={},
pages={199-203},
doi={10.1109/ICSC56153.2023.00039}
}
</pre>
</div>
<div class="col-2">
<h3>2023</h3>
</div>
</div>
</div>
</div>
<div class="col-2 content-right"></div>
</div>
<!--content-->
<div class="row content-partners">
<div class="col-12 partners-color">
</div>
<div class="col-12 partners-logo">
<div class="col-4 partners-logo-left">
<div class="logo-htwk">
<img src="images/HTWK_400.png"/>
<div>
<p>
Hochschule für Technik, Wirtschaft und Kultur Leipzig (HTWK)<br/>
Leipzig University of Applied Sciences<br/>
Faculty of Computer Science and Media<br/>
Gustav-Freytag-Str. 42A<br/>04277 Leipzig | Germany
</p>
</div>
</div>
</div>
<div class="col-4 partners-logo-center">
<div class="logo-unesp">
<img src="images/Unesp_400.png"/>
<p>
Unesp
Portal da Universidade Estadual Paulista<br/>
Rua Quirino de Andrade, 215<br/>Centro - São Paulo | SP, Brazil
</p>
</div>
</div>
<div class="col-4 partners-logo-right">
<div class="logo-usp">
<img src="images/USP_400.png"/>
<div class="logo-partners-address">
<p>
USP<br/>
R. da Reitoria, 374<br/>Cidade Universitária<br/>Butantã, São Paulo | SP, Brazil
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</content>
<footer>
<div class="page">
<div class="row footer-funding">
<div class="col-12">
DINOBBIO is project funded by DFG and FAPESP 2021–2024
</div>
</div>
</div>
</footer>
</body>
<script src="assets/js/dinobbio.js"></script>
</html>