Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include link to dataset description based, potentially rendered from EML if available #954

Open
jhpoelen opened this issue Dec 21, 2023 · 12 comments

Comments

@jhpoelen
Copy link
Member

as suggested by Debora D.

@jhpoelen
Copy link
Member Author

e.g.,

image

@jhpoelen
Copy link
Member Author

suggest to convert eml / xml to html or markdown somehow.

@jhpoelen
Copy link
Member Author

@jhpoelen
Copy link
Member Author

@jhpoelen
Copy link
Member Author

hi @mbjones -

I tried to re-use the xsl you created to render eml as html (e.g., https://github.com/NCEAS/metacat/blob/a8414e69e6402847ca44b1e682a75a9759af8a08/lib/style/skins/metacatui/eml-2/eml-dataset.xsl) and I realize there's more to the nice dataone dataset pages than meets the eye (interactive features, many css libraries etc). Did you ever consider building a tool that renders standalone html pages from eml files?

@jhpoelen
Copy link
Member Author

jhpoelen commented Dec 28, 2023

Applying the DataONE XSL templates to an EML from

Debora Drucker. (2010). Abundância e Distribuição de Ervas Terrestres em Parcelas Ripárias na Reserva Ducke: Variação Lateral. Programa de Pesquisa em Biodiversidade (PPBio). drucker.3.11. https://search.dataone.org/view/drucker.3.11

yielded the screenshot below.

image

Note that styling and interactive features seen on the DataONE website are not automatically included. However, it appears that metadata information elements are visible in html - just not as pretty as on the DataONE website . Applying the styling etc. would take some time, unless @mbjones can share some tricks that I am unaware of.

On the DataONE website the eml looks like:

image

jhpoelen pushed a commit to globalbioticinteractions/elton that referenced this issue Dec 28, 2023
jhpoelen pushed a commit to globalbioticinteractions/elton that referenced this issue Dec 28, 2023
@jhpoelen
Copy link
Member Author

Alternatively, we can opt to include a formatted version of the eml.xml files.

This would look like this -

https://github.com/globalbioticinteractions/elton/blob/8eed48af58714fa64587f03f0ff3ef2228f90ff7/src/test/resources/org/globalbioticinteractions/elton/cmd/eml-drucker.xml

or

<?xml version="1.0" encoding="UTF-8"?>
<eml:eml packageId="drucker.3.11" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" ">
    <access authSystem="knb" order="allowFirst">
        <allow>
            <principal>public</principal>
            <permission>read</permission>
        </allow>
        <allow>
            <principal>cn=datamanagers,o=BR-LTER,dc=ecoinformatics,dc=org</principal>
            <permission>all</permission>
        </allow>
    </access>
    <dataset>
        <title>Abundância e Distribuição de Ervas Terrestres em Parcelas Ripárias na Reserva Ducke: Variação Lateral
        </title>
        <creator id="1312386756228">
            <individualName>
                <givenName>Debora</givenName>
                <surName>Drucker</surName>
            </individualName>
            <organizationName>Instituto Nacional de Pesquisas da Amazônia – INPA</organizationName>
            <address>
                <deliveryPoint>Coordenação de Pesquisas em Ecologia – CPEC</deliveryPoint>
                <deliveryPoint>Avenida Efigênio Sales, 2239, Aleixo, CEP 69011-970</deliveryPoint>
                <city>Manaus</city>
                <administrativeArea>Amazonas</administrativeArea>
                <postalCode>478</postalCode>
                <country>Brasil</country>
            </address>
            <phone phonetype="voice">+ 55 92 3643 1834</phone>
            <electronicMailAddress>deboradrucker@gmail.com</electronicMailAddress>
            <onlineUrl>http://lattes.cnpq.br/6782891501006399</onlineUrl>
        </creator>
        <associatedParty id="1312387003910">
            <individualName>
                <givenName>Debora</givenName>
                <surName>Drucker</surName>
            </individualName>
            <organizationName>Universidade Estadual de Campinas - UNICAMP</organizationName>
            <address>
                <deliveryPoint>Nucleo de Estudos e Pesquisas Ambientais - NEPAM</deliveryPoint>
                <deliveryPoint>Rua dos Flamboyants, 155, CEP 13083-867</deliveryPoint>
                <city>Campinas</city>
                <administrativeArea>São Paulo</administrativeArea>
                <country>Brasil</country>
            </address>
            <phone phonetype="voice">+55 19 3211 6200</phone>
            <phone phonetype="fax">+55 19 3211 6222</phone>
            <electronicMailAddress>deboradrucker@gmail.com</electronicMailAddress>
            <onlineUrl>http://lattes.cnpq.br/6782891501006399</onlineUrl>
            <role>Custodian/Steward</role>
        </associatedParty>
        <abstract>
            <para>Os dados aqui disponibilizados são produto do trabalho realizado por Debora Drucker durante seu curso
                de mestrado. O objetivo central foi investigar a abundância e distribuição espacial de ervas terrestres
                (apenas as espécies que germinam e passam todo o seu ciclo de vida no solo, sensu Poulsen (1996)) em 20
                Parcelas ripárias paralelas aos igarapés na Reserva Florestal Adolpho Ducke.
                Referência:
                Poulsen, A. D. 1996. Species richness and density of ground herbs within a plot of lowland rainforest in
                north-west Borneo. Journal of Tropical Ecology 12: 177-190.
            </para>
        </abstract>
        <keywordSet>
            <keyword>Ervas</keyword>
            <keyword>Parcelas Ripárias</keyword>
            <keyword>Reserva Ducke</keyword>
            <keyword>Floresta de Terra Firme</keyword>
            <keyword>PELD-PPBio</keyword>
        </keywordSet>
        <intellectualRights>
            <para>Todos os dados do PPBio serão publicados no máximo 1 ano após sua coleta, desde que creditados os
                responsáveis pela mesma e disponibilização dos dados em qualquer publicação que os utilizem.
                Recomendamos aos interessados em utilizar os dados entrar em contato com os responsáveis e assim
                discutir o interesse de uso e possibilidade de participação no corpo de autores. A política de dados do
                PPBio está disponível no seguinte link: &lt;http://ppbio.inpa.gov.br/Port/docsinternos/politica_dou.pdf&gt;.
                MSc. Debora Pignatari Drucker foi responsável pela coleta destes dados, disponíveis para download a
                partir desta página
            </para>
        </intellectualRights>
        <coverage>
            <geographicCoverage>
                <geographicDescription>Os dados foram coletados na Reserva Florestal Adolpho Ducke, que cobre 10.000 ha
                    de floresta tropical úmida na periferia de Manaus, AM, Brasil
                </geographicDescription>
                <boundingCoordinates>
                    <westBoundingCoordinate>-59.59</westBoundingCoordinate>
                    <eastBoundingCoordinate>-59.53</eastBoundingCoordinate>
                    <northBoundingCoordinate>-2.55</northBoundingCoordinate>
                    <southBoundingCoordinate>-3.01</southBoundingCoordinate>
                </boundingCoordinates>
            </geographicCoverage>
            <temporalCoverage>
                <rangeOfDates>
                    <beginDate>
                        <calendarDate>2003-11-01</calendarDate>
                    </beginDate>
                    <endDate>
                        <calendarDate>2004-08-31</calendarDate>
                    </endDate>
                </rangeOfDates>
            </temporalCoverage>
            <taxonomicCoverage>
                <taxonomicClassification>
                    <taxonRankName>Kingdom</taxonRankName>
                    <taxonRankValue>Plantae</taxonRankValue>
                </taxonomicClassification>
            </taxonomicCoverage>
        </coverage>
        <contact>
            <references>1312386756228</references>
        </contact>
        <methods>
            <methodStep>
                <description>
                    <section>
                        <title>Métodos de Coleta dos Dados</title>
                        <para>As coletas foram realizadas em 20 parcelas ripárias de 100 x 2 m (0,02 ha), totalizando
                            4000 m2 (0,4 ha) amostrados. As parcelas foram instaladas em diferentes faixas de distância
                            lateral dos igarapés ao longo dos baixios. Cada parcela foi instalada com o auxílio de um
                            clinômetro, para que fosse uniformemente mantido em uma mesma curva de nível. As faixas de
                            distância do igarapé foram estabelecidas em termos de porcentagem da largura total do
                            baixio: 0, 25, 50, 75 e 100%. Em cada parcela, todos os indivíduos acima de 5 cm de altura
                            foram contados e identificados. A identificação das espécies foi facilitada pelo uso de um
                            guia florístico produzido para a RFAD baseado quase que essencialmente em caracteres
                            vegetativos (Ribeiro et al., 1999). Amostras botânicas férteis foram coletadas e depositadas
                            no Herbário do INPA. Material botânico vegetativo foi coletado como testemunha quando não
                            foi possível coletar em estado fértil.
                        </para>
                    </section>
                </description>
            </methodStep>
            <methodStep>
                <description>
                    <section>
                        <title>Referência</title>
                        <para>Ribeiro, J. E. L. da S., M. J. G. Hopkins, A. Vicentini, C. A. Sothers, M. A. S. Costa, J.
                            M. Brito, M. A. D. Souza, L. H. P. Martins, L. G. Lohmann, P. A. C. L. Assunção, E. C.
                            Pereira, C. F. Silva, M. R. Mesquita, e L. C. Procópio. 1999. Flora da Reserva Ducke: Guia
                            de identificação das plantas vasculares de uma floresta de terra-firme na Amazônia Central.
                            INPA/DFID, Manaus, Br
                        </para>
                    </section>
                </description>
            </methodStep>
        </methods>
        <project>
            <title>PELD Sítio 1</title>
            <personnel id="1312387111986">
                <individualName>
                    <givenName>Flávio J.</givenName>
                    <surName>Luizão</surName>
                </individualName>
                <organizationName>Instituto Nacional de Pesquisas da Amazônia - INPA</organizationName>
                <address>
                    <deliveryPoint>Coordenação de Pesquisas em Ecologia – CPEC</deliveryPoint>
                    <deliveryPoint>Avenida Efigênio Sales, 2239, Aleixo, CEP 69011-970</deliveryPoint>
                    <city>Manaus</city>
                    <administrativeArea>Amazonas</administrativeArea>
                    <postalCode>478</postalCode>
                    <country>Brasil</country>
                </address>
                <phone phonetype="voice">+55 92 3643 1911</phone>
                <electronicMailAddress>fluizao@inpa.gov.br</electronicMailAddress>
                <onlineUrl>http://lattes.cnpq.br/5212730639831062</onlineUrl>
                <role>Principal Investigator - CNPq/ PELD 520039/98-0</role>
            </personnel>
            <personnel id="1312387135977">
                <individualName>
                    <givenName>William E.</givenName>
                    <surName>Magnusson</surName>
                </individualName>
                <organizationName>Instituto Nacional de Pesquisas da Amazônia – INPA</organizationName>
                <address>
                    <deliveryPoint>Coordenação de Pesquisas em Ecologia – CPEC</deliveryPoint>
                    <deliveryPoint>Avenida Efigênio Sales, 2239, Aleixo, CEP 69011-970</deliveryPoint>
                    <city>Manaus</city>
                    <administrativeArea>Amazonas</administrativeArea>
                    <postalCode>478</postalCode>
                    <country>Brasil</country>
                </address>
                <phone phonetype="voice">+55 92 3643 1834</phone>
                <electronicMailAddress>bill@inpa.gov.br</electronicMailAddress>
                <onlineUrl>http://lattes.cnpq.br/1973878827354750</onlineUrl>
                <role>Principal Investigator - CNPq/ 472799/03-7</role>
            </personnel>
            <funding>
                <para>1. CNPq/ PELD 520039/98-0/ 2. CNPq/ 472799/03-7/ 3. CNPq - Bolsa de mestrado concedida a Debora
                    Drucker
                </para>
            </funding>
        </project>
        <dataTable id="1312391667091">
            <entityName>ervasnomelat.txt</entityName>
            <physical id="1312391525534">
                <objectName>ervasnomelat.txt</objectName>
                <size unit="byte">2973</size>
                <dataFormat>
                    <textFormat>
                        <numHeaderLines>1</numHeaderLines>
                        <recordDelimiter>#x0A</recordDelimiter>
                        <attributeOrientation>column</attributeOrientation>
                        <simpleDelimited>
                            <fieldDelimiter>#x09</fieldDelimiter>
                        </simpleDelimited>
                    </textFormat>
                </dataFormat>
                <distribution>
                    <online>
                        <url>ecogrid://knb/menger.23.1</url>
                    </online>
                </distribution>
            </physical>
            <attributeList>
                <attribute id="1312391667092">
                    <attributeName>codigo</attributeName>
                    <attributeDefinition>Código identificador da espécie utilizado nos arquivos ervasablat.pdf/
                        ervasablat.csv
                    </attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Código identificador da espécie utilizado nos arquivos ervasablat.pdf/
                                        ervasablat.csv
                                    </definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1312391667093">
                    <attributeName>familia</attributeName>
                    <attributeDefinition>Nome da Família</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Nome da Família</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1312391667094">
                    <attributeName>nome</attributeName>
                    <attributeDefinition>Nome da Espécie</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Nome da Espécie</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1312391667095">
                    <attributeName>autor</attributeName>
                    <attributeDefinition>Nome do Autor do Nome da Espécie</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Nome do Autor do Nome da Espécie</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
            </attributeList>
            <numberOfRecords>57</numberOfRecords>
        </dataTable>
        <dataTable id="1340138776296">
            <entityName>idrucker.3_ervaslateral.txt</entityName>
            <physical id="1340138690331">
                <objectName>idrucker.3_ervaslateral - ok.txt</objectName>
                <size unit="byte">9396</size>
                <dataFormat>
                    <textFormat>
                        <numHeaderLines>1</numHeaderLines>
                        <recordDelimiter>#x0A</recordDelimiter>
                        <attributeOrientation>column</attributeOrientation>
                        <simpleDelimited>
                            <fieldDelimiter>#x09</fieldDelimiter>
                        </simpleDelimited>
                    </textFormat>
                </dataFormat>
                <distribution>
                    <online>
                        <url>ecogrid://knb/fecosta.201.1</url>
                    </online>
                </distribution>
            </physical>
            <attributeList>
                <attribute id="1340138776297">
                    <attributeName>sitio</attributeName>
                    <attributeDefinition>Sítio de coleta.</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Sítio de coleta.</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1340138776298">
                    <attributeName>trilha</attributeName>
                    <attributeDefinition>Trilha de acesso (LO = leste - oeste).</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Trilha de acesso (LO = leste - oeste).</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1340138776299">
                    <attributeName>parcela</attributeName>
                    <attributeDefinition>Parcela de coleta.</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Parcela de coleta.</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1340138776300">
                    <attributeName>especie</attributeName>
                    <attributeDefinition>Código da espécie amostrada.</attributeDefinition>
                    <measurementScale>
                        <nominal>
                            <nonNumericDomain>
                                <textDomain>
                                    <definition>Código da espécie amostrada.</definition>
                                </textDomain>
                            </nonNumericDomain>
                        </nominal>
                    </measurementScale>
                </attribute>
                <attribute id="1340138776301">
                    <attributeName>abundancia</attributeName>
                    <attributeDefinition>Número de individuos de cada espécie amostrada.</attributeDefinition>
                    <measurementScale>
                        <ratio>
                            <unit>
                                <standardUnit>dimensionless</standardUnit>
                            </unit>
                            <numericDomain>
                                <numberType>natural</numberType>
                            </numericDomain>
                        </ratio>
                    </measurementScale>
                </attribute>
            </attributeList>
            <numberOfRecords>353</numberOfRecords>
        </dataTable>
    </dataset>
    <additionalMetadata>
        <describes>1340138690331</describes>
        <metadata>
            <consecutiveDelimiters>true</consecutiveDelimiters>
        </metadata>
    </additionalMetadata>
</eml:eml>

@jhpoelen
Copy link
Member Author

jhpoelen commented Dec 29, 2023

Another approach would be to transform EML to JATS https://en.wikipedia.org/wiki/Journal_Article_Tag_Suite . This would open up the world of publishing (JATS is defacto standard in article publishing/archiving) and the tools/expertise that comes along with it.

@jhpoelen
Copy link
Member Author

Until we have more clarify on how to integrate metadata into our review report, I've added the following section in the "Introduction" section of the review report -

For additional metadata this dataset, please visit https://github.com/globalbioticinteractions/template-dataset and inspect associated metadata files including, but not limited to, README.md, eml.xml, and/or globi.json.

Also see attached screenshot from a generated pdf review report

image

@jhpoelen
Copy link
Member Author

jhpoelen commented Jan 12, 2024

@deboradrucker shared a sketch of what it may look like to integrate EML data into the GloBI review report. Features EML elements:

dataset title, creator info (e.g., name, institution), keywords, license, geographic description, date range, taxonomic information (e.g., classification system, taxonomic range), sampling description, project title/funding, reference publication.

e.g., via https://docs.google.com/document/d/1gMzdobH8oG9zdOrueqTmdC5dljv92YLX/edit , see example integration of EML metadata into existing data review reports below.

A Review of Biotic Interactions and Taxon Names Found in globalbioticinteractions/cCarvalheiro, L.G., 2023. Plant-flower visitor network from Avon Gorge, UK
by Nomer and Elton, two naive review bots
review@globalbioticinteractions.org
https://globalbioticinteractions.org/contribute
https://github.com/globalbioticinteractions/carvalheiro2023/issues
2024-01-08

Dataset Title Plant-flower visitor network from Avon Gorge, UK
Creator name Luisa Gigante Carvalheiro
Organization name Universidade Federal de Goiás
Creator email lgcarvalheiro@gmail.com
Creator city Goiania
Creator country Brazil
Keywords plant-pollinator interactions, flower visitation
Licence Name Creative Commons Attribution 4.0 International
Licence URL https://creativecommons.org/licenses/by/4.0/
Abstract "This dataset gathers information on interactions between plants and their flower visitors collected throughout 2004 (11 surveys covering local flowering season) the Avon Gorge (England), an iconic field site well known for its rare plant populations. The study area (1480 m2 ) included a broad range of flowering plants, and overall the dataset shows information for 260 species (81 plant species, 179 insect species and morphospecies)."
Intellectual Rights Creative Commons Attribution 4.0 International
Geographic Description Avon Gorge, Bristol, England
Begin Date 2004-05-10
End Date 2004-09-27
Classification System all taxa were identified by specialist taxonomists
General Taxonomic Coverage All flower visitors detected in the study area (Hymenoptera, Diptera, Coleoptera, Heteroptera, Lepidoptera, Thysanoptera)
Sampling Description " A total of 11 survey visits were carried out from 10 May to 27 September 2004, this covering the main period of insect activity. Flower and insect surveys took place approximately every 14 days under dry conditions. In each flower abundance survey, a stratified random design was used to select 1 m2 quadrats in the study area. The area was divided into nine sub-areas based on habitat type and accessibility. Each sub-area was divided into 1 m2 quadrats and 2·5% (37) of these were randomly selected per sampling occasion. In each quadrat, the number of floral units of each plant species was recorded, defined as the distance that a small bee (c.1 cm length) would fly, rather than walk (Saville 1993). For example, in the Asteraceae, a flower unit is the entire inflorescence while in the Rosaceae, a flower unit is a single flower. Thus, the floral unit is defined from the bee’s perspective rather than by flower anatomy. Rare flowers which were missed using this method were included in the food web data as rare species with an abundance of two flower units (which was the lowest number of units observed in the plot for any species).
In the insect surveys, an observation point was chosen for each flowering plant species by randomly selecting one of the quadrats where the species was present. All the flowering units that could be surveyed by a single observer (approximately a semi-circle with 1-m radius) were observed for 20 min. On consecutive sampling occasions, plant species were rotated through three time slots, the morning (09.00–12.00 h), early afternoon (12.00–15.00 h) and late afternoon (15.00–18.00 h), to allow each species to be observed equally over time. At least two floral units were observed per plant species per sample. All flower–visitor interactions were recorded, and all visitors observed were collected for identification. To estimate the overall abundance of each plant species, the average number of flower units per 1 m 2 quadrat was multiplied by the total area of the study site. To estimate the interaction frequency for each visitor–plant species pair, we divided the total number of visits recorded by the number of flower units observed (per 20 min) and then multiplied by the total number of floral units in the study plot. By collecting the insects, we did not allow for repeated visits by the same individual; hence, some visitation frequencies may be underestimated. However, collecting specimens is essential for identification of most visitor species. Hymenoptera, Diptera, and Coleoptera were identified by taxonomists either to species or to morphospecies. Lepidoptera were identified to species by the authors and Heteroptera and parasitoids were morphotyped by the authors."
Citation Carvalheiro, LG; Barbosa, E.R.M. & Memmott, J. 2008. Pollinator networks, alien species and the conservation of rare plants: Trinia glauca as a case study. Journal of Applied Ecology, 45,1419-1427. DOI: https://doi.org/10.1111/j.1365-2664.2008.01518.x
Project Title Impact and management of invasive plant species: a food web approach
Project Funding Fundação para a Ciência e Tecnologia (FCT, Portugal)
Reference Publication Carvalheiro, LG; Barbosa, E.R.M. & Memmott, J. 2008. Pollinator networks, alien species and the conservation of rare plants: Trinia glauca as a case study. Journal of Applied Ecology, 45,1419-1427. DOI: https://doi.org/10.1111/j.1365-2664.2008.01518.x

Abstract
Life on earth is sustained by complex interactions between organisms and their environment. These biotic interactions can be captured in datasets and published digitally. We describe a review process of such an openly accessible digital interaction datasets of known origin, and discuss their outcome. The dataset under review (aka globalbioticinteractions/carvalheiro2023) has size 344KiB and contains 542 interactions with 1 (e.g., flowersVisitedBy) unique types of associations between 63 primary taxa (e.g., Scabiosa columbaria) and 171 associated taxa (e.g., Bombus pascuorum). The report includes detailed summaries of interactions data as well as a taxonomic review from multiple perspectives.

Screenshot from 2024-01-12 11-19-28

Screenshot from 2024-01-12 11-19-41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant