publications.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="icon" type="image/x-icon" href="favicon.ico">

    <title>Sustainable use of Brazilian Biodiversity – Using Linked Data for Natural Product Discovery</title>

		<!-- Styles -->
    <link rel="stylesheet" href="assets/css/simple-grid.css">
		<link rel="stylesheet" href="assets/css/dinobbio.css">
    <style>.en {display:block}</style>
  </head>

  <body class="fixed-top-nav ">

    <!-- Header -->

    <header>
      <div class="page">
      <!-- Nav Bar -->
          <nav id="navbar">

          <div id="menu" class="row">
            <div class="col-12 header-color">
            </div>
            <div class="col-6 logo">
              <a href="index.html" target="_self">
                <img src="images/Logo-Dinobbio.png"/>
                <div class="logo-name">
                  DINOBBIO
                </div>
              </a>
              <div class="logo-text hidden-sm hidden-md">
                Discovering New Natural Inspired<br/>Products from Brazilian Biodiversity
              </div>

            </div>
            <div id="language" class="col-6">
            </div>
          </div>


          <div class="row main-menu-nav">
      <!-- Navbar Toggle für mobile-->
            <div class="col-2 hidden-sm hidden-md"></div>
            <input id="navbar-toggler" type="checkbox" style="display:none"/>
            <label id="burger-menu" for="navbar-toggler" class="hidden-full hidden-md"><span></span></label>
            <div class="col-8 col-12-md col-12-sm right" id="main-menu">
                <ul class="navbar-nav main-menu ml-auto flex-nowrap">
      <!-- About Startseite -->

                <li class="nav-item">
                  <a class="nav-link" href="index.html">
                    about
                  </a>
                </li>

                <li class="nav-item">
                  <a class="nav-link" href="projects.html">
                    projects
                  </a>
              </li>

              <li class="nav-item active">
                <a class="nav-link" href="publications.html">
                  publications
                </a>
              </li>

                <li class="nav-item">
                    <a class="nav-link" href="talks.html">
                      talks
                    </a>
                </li>


                <li class="nav-item">
                    <a class="nav-link" href="partners.html">
                      partners
                    </a>
                </li>

                <li class="nav-item">
                  <a class="nav-link" href="blog.html">
                    blog
                  </a>
                </li>

                <li class="nav-item">
                  <a class="nav-link" href="contact.html">
                    contact
                  </a>
                </li>

                <li class="nav-item">
                  <a class="nav-link" href="privacy.html">
                    privacy
                  </a>
                </li>

                <li class="nav-item">
                  <a class="nav-link" href="imprint.html">
                    impress
                  </a>
                </li>

              </ul>
            </div>
            <div class="col-2 hidden-sm hidden-md"></div>
          </div>
        </nav>
      </div>
    </header>
    <content>
      <div class="page">
        <div class="row">
          <div class="col-2 hidden-sm hidden-md"></div>
          <div class="col-8 col-12-md title">
            <h1>
              <div>Sustainable use of Brazilian Biodiversity</div>
              <div class="title-line hidden-sm"></div>
              <div>Using Linked Data for</br>Natural Product Discovery</div>
            </h1>
          </div>
          <div class="col-2 hidden-sm hidden-md"></div>
        </div>
        <!-- colored line before content-->
        <div class="row">
          <div class="col-1">
            <div class="content-line"></div>
          </div>
          <div class="col-1"></div>
          <div class="col-10">
            <div class="content-line hidden-sm"></div>
          </div>
        </div>
        <div class="row">
          <div class="col-2 content-left">
            <h2 class="page-title">
              publications
            </h2>
          </div>
          <!--content-->
          <div class="col-8 content-center">
            <div class="row">
                <div class="col-10 publications">

                  <h4>Development of a novel chemoinformatic tool for natural product databases
                  </h4>
                <p>
                  Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx<br/> 
                </p>
                <p class="abstract">
                  Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.
                                                        </p>
                <pre class="bibtex">

@inproceedings{bike2023,
author={Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx},
booktitle={Future Drug Discovery, Vol. 5, No. 2}, 
title={Development of a novel chemoinformatic tool for natural product databases},
year={2023},
}
                </pre>

                  <h4>Preface of the First International Biochemical Knowledge Extraction Challenge (BiKE)
                    </h4>
                  <p>
                    Edgard Marx, Marilia Valli, Joao da Silva e Silva, Sanju Tiwari, Paulo do Carmo<br/> 
                  </p>
                  <p class="abstract">
                    The knowledge of over 50 years of studies on biodiversity available in scientific articles can
                    become easier accessible when organized and shared through knowledge graphs. It can assist
                    in the development of different fields of science and bio-friendly products with high added
                    value as well as guide public policies to bring benefits both to science and to strengthen the
                    bio-economy. However, to date, most of the structured biochemical information available on
                    the Web is manually curated, and it is practically impossible to keep pace with the research
                    being constantly published in scientific articles.
                    The First International Biochemical Knowledge Extraction Challenge (BiKE) aims at accelerating and promoting the research on automatic biochemical knowledge extraction mechanisms
                    by the Semantic Web scientific community to increase the information available on natural
                    products and contribute to the development of environmental-friendly products while increasing the community awareness of the biodiversity value. The following papers were accepted
                    for publication and presented at the workshop:
                    • BiKE Challenge: Result of ChemiScope by using ChatGPT
                    • Improving Natural Product Automatic Extraction with Named Entity Recognition
                    • Enhancing Biochemical Extraction with BFS-driven Knowledge Graph Embedding approach
                                                          </p>
                  <pre class="bibtex">

@inproceedings{bike2023,
  author={Edgard Marx, Marilia Valli, Joao da Silva e Silva, Sanju Tiwari, Paulo do Carmo},
  booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)}, 
  title={Preface of the First International Biochemical Knowledge Extraction Challenge (BiKE)},
  year={2023},
}
                  </pre>


                  <h4>Improving Natural Product Automatic Extraction With Named Entity Recognition</h4>
                  <p>
                    Stefan Schmidt-Dichte, István Mócsy<br/> 
                  </p>
                  <p class="abstract">
                    Knowledge graphs (KGs) play a vital role in providing structured data for various applications, but their creation is time-consuming and prone to errors. To address these challenges, automatic knowledge extraction methods using machine learning (ML) have gained attention. ML algorithms have shown promise in capturing subtle nuances in language data, offering comprehensive and robust solutions. In the field of biochemistry, knowledge extraction is crucial for advancing scientific research, product development, and policy-making. The First International Biochemical Knowledge Extraction Challenge focuses on extracting biochemical knowledge from scientific articles. This paper presents an updated approach that incorporates named entity recognition (NER) using scispaCy models to improve the accuracy and relevance of extracted entities. The evaluation of the approach utilizes the NatUKE benchmark and demonstrates improved performance in extracting bioactivity and isolation type. However, challenges remain in identifying compound names and species. Future research may explore hybrid approaches combining different techniques to address these specific challenges.
                                      </p>
                  <pre class="bibtex">

@inproceedings{bike2023,
  author={Schmidt-Dichte, Stefan and M{\'o}csy, Istv{\'a}n J},
  booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)}, 
  title={Improving Natural Product Automatic Extraction With Named Entity Recognition},
  year={2023},
}
                  </pre>


                  <h4>Leveraging ChatGPT API for Enhanced Data Preprocessing in NatUKE</h4>
                  <p>
                    Pit Fröhlich, Jonas Gwozdz, Matthias Jooß<br/>
                  </p>
                  <p class="abstract">
                    This scientific paper presents an approach for enhancing the performance of machine learning models by
                    utilizing ChatGPT, a state-of-the-art language model developed by OpenAI, for data preprocessing. The
                    study focuses on the existing Project NatUKE (A Benchmark for Natural Product Knowledge Extraction
                    from Academic Literature) and investigates the impact of incorporating ChatGPT in the preprocessing
                    pipeline. By leveraging the natural language processing capabilities of ChatGPT, we aim to improve the
                    quality and relevance of the data used as input for the knowledge graph embedding algorithms. This
                    paper provides a detailed description of the methodology employed, the experimental setup, and the
                    results obtained, highlighting the benefits and limitations of this approach.
                                      </p>
                  <pre class="bibtex">

@inproceedings{bike2023,
  author={Fr{\"o}hlich, Pit and Gwozdz, Jonas and Joo{\ss}, Matthias},
  booktitle={Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023)}, 
  title={Leveraging ChatGPT API for Enhanced Data Preprocessing in NatUKE},
  year={2023},
}
                  </pre>


                  <h4>Assessing Bias on Entity Retrieval Models through Conjunctive Fallacies</h4>
                  <p>
                    Edgard Marx<br/>
                    International Conference on Semantic Computing, 2023 
                  </p>
                  <p class="abstract">
                    Information retrieval methods, machine learning models, and humans can suffer from a failure in judging information representativeness. We refer to this problem as information bias. In this work, we propose a method to evaluate information bias through conjunctive fallacies. An experimental evaluation of different state-of-the-art entity retrieval models and human-curated benchmarks shows that both methods perform poorly on judging query-entity representativeness while statistically based methods perform considerably better than humans. 
                  </p>
                  <pre class="bibtex">
@inproceedings{icsc2023informationBias,
  author={Marx, Edgard},
  booktitle={2023 IEEE 17th International Conference on Semantic Computing (ICSC)}, 
  title={Assessing Bias on Entity Retrieval Models through Conjunctive Fallacies}, 
  year={2023},
  volume={},
  number={},
  pages={260-261},
  doi={10.1109/ICSC56153.2023.00050}
}
                  </pre>

                  <h4>NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature</h4>
                  <p>
                    Paulo Viviurka do Carmo, Edgard Marx, Ricardo Marcacini, Marilia Valli, João Victor Silva e Silva, Alan Pilon <br/>
                    International Conference on Semantic Computing, 2023 
                  </p>
                  <p class="abstract">
                    This work introduces a benchmark for natural product knowledge extraction from academic literature and evaluates different, state-of-the-art unsupervised embedding generation methods for this task. We show that it can automatically extract chemical compound characteristics from academic literature with an unsupervised pipeline based on graph embedding methods. We evaluated Four methods (DeepWalk, Node2Vec, Metapath2Vec, and EPHEN) in a similarity-based graph completion evaluation scenario. EPHEN achieves reasonable hits@k performance at bioactivity and isolation type extraction with 0.64 when k = 5 and 0.75 when k = 1, respectively. Meanwhile, Metapath2Vec was the best performer, but with underwhelming results, when extracting compound name and specie with 0.20 and 0.44 when k = 50, respectively. These results show that using text data and previously extracted knowledge from the knowledge graph provides the most stable performance. They also show us that some characteristics from these papers are more challenging to extract than others, and using the knowledge graph topology as context data helps in these scenarios.                  </p>
                  <pre class="bibtex">
@inproceedings{icsc2023natuke,
  author={Do Carmo, Paulo Viviurka and Marx, Edgard and Marcacini, Ricardo and Valli, Marilia and Silva e Silva, João Victor and Pilon, Alan},
  booktitle={2023 IEEE 17th International Conference on Semantic Computing (ICSC)}, 
  title={NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature}, 
  year={2023},
  volume={},
  number={},
  pages={199-203},
  doi={10.1109/ICSC56153.2023.00039}
}
                  </pre>
                </div>
                <div class="col-2">
                  <h3>2023</h3>
                </div>
              </div>
            </div>
          </div>
          <div class="col-2 content-right"></div>
        </div>
        <!--content-->

        <div class="row content-partners">
          <div class="col-12 partners-color">
          </div>
          <div class="col-12 partners-logo">
            <div class="col-4 partners-logo-left">
              <div class="logo-htwk">
                <img src="images/HTWK_400.png"/>
                  <div>
                    <p>
                    Hochschule für Technik, Wirtschaft und Kultur Leipzig (HTWK)<br/>
                    Leipzig University of Applied Sciences<br/>
                    Faculty of Computer Science and Media<br/>
                    Gustav-Freytag-Str. 42A<br/>04277 Leipzig | Germany
                    </p>
                  </div>
              </div>
            </div>
            <div class="col-4 partners-logo-center">
              <div class="logo-unesp">
                <img src="images/Unesp_400.png"/>
                <p>
                  Unesp
                  Portal da Universidade Estadual Paulista<br/>
                  Rua Quirino de Andrade, 215<br/>Centro - São Paulo | SP, Brazil
                </p>
              </div>
            </div>
            <div class="col-4 partners-logo-right">
              <div class="logo-usp">
                <img src="images/USP_400.png"/>
                <div class="logo-partners-address">
                  <p>
                    USP<br/>
                    R. da Reitoria, 374<br/>Cidade Universitária<br/>Butantã, São Paulo | SP, Brazil
                  </p>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </content>

    <footer>
      <div class="page">
        <div class="row footer-funding">
          <div class="col-12">
            DINOBBIO is project funded by DFG and FAPESP 2021–2024
          </div>
        </div>
      </div>
    </footer>
  </body>
  <script src="assets/js/dinobbio.js"></script>
  </html>