You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Valerio: I debugged the text extraction and modified GROBID parameters to include
section names (headers) as sentences.
I also checked all the other issues you reported, and they are mostly
GROBID extraction errors on the Cell Press STAR Protocols paper format,
which was generating a lot of errors also with the old extraction method
based on heuristics. See my replies below:
Daniela: General comments: for all papers below I spot-checked sentences from
each section (e.g. abstract, introduction, results, figure legends,
discussion)
00065855
Checked random sentences from each section, all good. The only sentences
that were not extracted were section headers, e.g. 3.1. Method
development for quantification of GSH-NEM and GSSG via LC-MS/MS
FIXED - added section names to extracted sentences
00065849
It worked great!
00065841
Sentences in SUMMARY not extracted
NOT FIXED - GROBID extraction error
Section headers not extracted, e.g.: scRNA-seq of aging C. Elegans
FIXED - added section names to extracted sentences
Or extracted in tandem with another sentence: Cell-type-specific
regulation and TF activity Differential gene expression across cell
types is driven by factors that regulate mRNA production and stability.
NOT FIXED - GROBID extraction error
Some sentences are not extracted: e.g. Gene expression drift is a common
correlate of aging.
NOT FIXED - GROBID extraction error
00065836
Headers not extracted, but not all. e.g.: 'Plasmid and cell line
generation'; 'Fluorescence imaging'
FIXED - added section names to extracted sentences
An example of an header that was extracted: Recovery of protein
synthesis after DNA damage depends on transcription-couple
NOTE - This is also a sentence in a figure legend and was extracted
for that reason, but now all section headers are added
Lots of sentences on Page 10 (paragraph Prospects and limitations of the
RPS assay) were not extracted.
NOT FIXED - The first sentence in the section is missing the second
half, but all the other sentences look ok to me.
00065832
It did not extract the section 'Before you begin' culturing worms
starting from: 'Here, we present protocols to determine binding
between..' To '..and FLAG- tagged nuclear hormone receptors.'
Starting on the following page, same section, the extraction worked.
NOT FIXED - GROBID extraction error
Other bits were not extracted. Eg.:
CRITICAL: It is imperative that all samples be exposed to each
temperature for the same amount of time for consistent results.
Transfer each sample into ice-chilled 1.5 mL microcentrifuge tubes
and spin at 20,000 3 g for 20 min at 4 C.
Load samples into four NuPage 12% Bis-Tris gels, 17-well, 1.0 mm
in 13 MES SDS running
buffer. Run at 100 V constant for about 2 h.
Note: The amount of protein to be loaded for gel electrophoresis will
depend on the efficacy of the primary antibody. For FLAG-tagged
proteins, we have had success with loading 10– 15 mL of the
protein/LDS/b-ME mixture.
NOT FIXED - GROBID extraction error
The text was updated successfully, but these errors were encountered:
Tested 5 more papers, results below. Let me know if we need more
General question:
When we click on ‘Sentence level classification’, does it process also the supplementary material or only the main PDF?
If only the main PDF we can think of adding a separate button for processing the supplementals
Abstract not extracted but the display of the abstract is peculiar, almost like in a separate box
Other than that, it worked great!
00065853
First sentence of the introduction not captured, probably because there is a charachter (T) that spans two rows: THE brain is one of the most studied and complex systems of the biological systems, because neurological disorders are closely related to changes in brain structure
Other sentences in the paragraph look fine
First sentence of figure caption #1 not captured, but it is not even possible to copy-paste manually
Caption of table III also problematic
Analysis on additional papers:
Valerio: I debugged the text extraction and modified GROBID parameters to include
section names (headers) as sentences.
I also checked all the other issues you reported, and they are mostly
GROBID extraction errors on the Cell Press STAR Protocols paper format,
which was generating a lot of errors also with the old extraction method
based on heuristics. See my replies below:
Daniela: General comments: for all papers below I spot-checked sentences from
each section (e.g. abstract, introduction, results, figure legends,
discussion)
00065855
Checked random sentences from each section, all good. The only sentences
that were not extracted were section headers, e.g. 3.1. Method
development for quantification of GSH-NEM and GSSG via LC-MS/MS
00065849
It worked great!
00065841
Sentences in SUMMARY not extracted
Section headers not extracted, e.g.: scRNA-seq of aging C. Elegans
Or extracted in tandem with another sentence: Cell-type-specific
regulation and TF activity Differential gene expression across cell
types is driven by factors that regulate mRNA production and stability.
Some sentences are not extracted: e.g. Gene expression drift is a common
correlate of aging.
00065836
Headers not extracted, but not all. e.g.: 'Plasmid and cell line
generation'; 'Fluorescence imaging'
An example of an header that was extracted: Recovery of protein
synthesis after DNA damage depends on transcription-couple
for that reason, but now all section headers are added
Lots of sentences on Page 10 (paragraph Prospects and limitations of the
RPS assay) were not extracted.
half, but all the other sentences look ok to me.
00065832
It did not extract the section 'Before you begin' culturing worms
starting from: 'Here, we present protocols to determine binding
between..' To '..and FLAG- tagged nuclear hormone receptors.'
Starting on the following page, same section, the extraction worked.
Other bits were not extracted. Eg.:
temperature for the same amount of time for consistent results.
and spin at 20,000 3 g for 20 min at 4 C.
in 13 MES SDS running
buffer. Run at 100 V constant for about 2 h.
Note: The amount of protein to be loaded for gel electrophoresis will
depend on the efficacy of the primary antibody. For FLAG-tagged
proteins, we have had success with loading 10– 15 mL of the
protein/LDS/b-ME mixture.
NOT FIXED - GROBID extraction error
The text was updated successfully, but these errors were encountered: