Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grobid with DL models natively on MacOS ARM #1108

Open
Schroedi opened this issue Apr 29, 2024 · 2 comments
Open

Grobid with DL models natively on MacOS ARM #1108

Schroedi opened this issue Apr 29, 2024 · 2 comments
Labels
macOS-specific Issue visible only on macOS environments

Comments

@Schroedi
Copy link

This is my attempt to use grobid on MacOS ARM. The docs state that MacOS is not fully supported so feel free to mark this issue as out of scope.

If anybody got it working, I would be interested in the package versions used.

Here I document what I tried and how far I got:

System

MacOS 14.4.1 (ARM M3)
java --version

openjdk 17.0.10 2024-01-16 LTS
OpenJDK Runtime Environment Zulu17.48+15-CA (build 17.0.10+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.48+15-CA (build 17.0.10+7-LTS, mixed mode, sharing)

Steps

#clone grobid
#cd grobid

# shared venv
uv venv -p 3.9
source .venv/bin/activate
uv pip install jep==4.2.0
cp .venv/lib/python3.9/site-packages/jep/jep.cpython-39-darwin.so grobid-home/lib/mac_arm-64/libjep.dylib

# prepare delft
# I think 0.3.3 is used in the container if I remember correctly
git clone --branch v0.3.3 https://github.com/kermitt2/delft
cd delft
# change requirements until delft works - very scientific
wget -O delftMacArm.patch http://sprunge.us/iFQCZx
git apply delftMacArm.patch
uv pip install -r requirements.txt
python setup.py build install
# test delft
# python delft/applications/grobidTagger.py date tag --architecture BidLSTM_CRF
# enjoy json output :)
cd ..

# build grobid
./gradlew clean install

# the patch edits grobid-home/config/grobid.yaml
# 1. change delft: install: "../delft" to: delft: install: "delft"
# 2. use delft models
wget -O grobidConf.patch http://sprunge.us/o8IbpR
git apply grobidConf.patch

# I had to include the path to the libpython from the venv here
java -Xmx4G -Djava.library.path=grobid-home/lib/mac_arm-64:/opt/homebrew/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib -jar grobid-core/build/libs/grobid-core-0.8.1-SNAPSHOT-onejar.jar -gH grobid-home -dIn /Users/ascadian/Projects/paperSegmentation/train_data/raw -dOut /Users/ascadian/Projects/paperSegmentation/train_data/anno_raw_test  -exe createTraining

Output/Error

22:38:05.157 [main] INFO  org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.161 [main] INFO  org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - Grobid config file location was not explicitly set via 'org.grobid.config' system variable, defaulting to: /Users/ascadian/Projects/grobid3/grobid-home/config/grobid.yaml
22:38:05.280 [main] INFO  org.grobid.core.main.LibraryLoader - Loading external native sequence labelling library
22:38:05.286 [main] INFO  org.grobid.core.main.LibraryLoader - Loading Wapiti native library...
22:38:05.489 [main] INFO  org.grobid.core.main.LibraryLoader - Loading JEP native library for DeLFT... /Users/ascadian/Projects/grobid3/grobid-home/lib/mac_arm-64
22:38:05.640 [main] INFO  org.grobid.core.main.LibraryLoader - Native library for sequence labelling loaded
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating dictionary
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - End of Initialization of dictionary
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating names
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - End of initialization of names
22:38:05.885 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating country codes
22:38:05.888 [main] INFO  org.grobid.core.lexicon.Lexicon - End of initialization of country codes
.DS_Store
2004.03577.pdf
NeurIPS-2023-modelling-cellular-perturbations-with-the-sparse-additive-mechanism-shift-variational-autoencoder-Paper-Conference.pdf
s41586-024-07303-5.pdf
fpsyg-07-00789.pdf
4 files to be processed.
/Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti
22:38:09.792 [main] INFO  org.grobid.core.jni.DeLFTModel - Loading DeLFT model for reference-segmenter with architecture BidLSTM_ChainCRF_FEATURES...
22:38:09.794 [pool-1-thread-1] INFO  org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.846 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.878 [pool-1-thread-1] INFO  org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.879 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.884 [main] ERROR org.grobid.core.jni.DeLFTModel - DeLFT model reference_segmenter labelling failed
java.util.concurrent.ExecutionException: java.lang.RuntimeException: JEP initialisation failed
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.grobid.core.jni.JEPThreadPool.call(JEPThreadPool.java:176)
	at org.grobid.core.jni.DeLFTModel.label(DeLFTModel.java:194)
	at org.grobid.core.engines.tagging.DeLFTTagger.label(DeLFTTagger.java:29)
	at org.grobid.core.engines.AbstractParser.label(AbstractParser.java:47)
	at org.grobid.core.engines.ReferenceSegmenterParser.createTrainingData(ReferenceSegmenterParser.java:334)
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1153)
	at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
	at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
	at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
	at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.RuntimeException: JEP initialisation failed
	at org.grobid.core.jni.JEPThreadPool.createJEPInstance(JEPThreadPool.java:135)
	at org.grobid.core.jni.JEPThreadPool.getJEPInstance(JEPThreadPool.java:151)
	at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:119)
	at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:84)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
22:38:09.886 [main] ERROR org.grobid.core.engines.Engine - An error occured while processing the following pdf: /Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
org.grobid.core.exceptions.GrobidException: [GENERAL] An exception occurred while running Grobid training data generation for full text.
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1562)
	at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
	at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
	at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
	at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.commons.lang3.tuple.Pair.getLeft()" because "result" is null
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1154)
	... 9 common frames omitted

Used patches (in case the pastebin is unavailable)

grobidConf.patch
delftMacArm.patch

@lfoppiano
Copy link
Collaborator

lfoppiano commented Apr 29, 2024

For tensorflow on ARM Apple, you should install tensorflow-deps using conda (https://github.com/lfoppiano/material-parsers?tab=readme-ov-file#set-up-on-apple-m1, you can stop before the spacy model download stuff - same scientific approach 😄 )

I use usually Conda and I install most of the packages with pip unless they are particularly annoying (e.g. try to compile -fail - etc...)

The JEP library should not need to be copied under the grobid-home because the version in the python env should be used directly. For doing that you should export the equivalent of CONDA_PREFIX directory corresponding to VENV before running grobid.

@lfoppiano lfoppiano added the macOS-specific Issue visible only on macOS environments label Apr 29, 2024
@Schroedi
Copy link
Author

Thanks for taking your time. I tried using conda but was not successful. It's the same situation as before.
About the CONDA_PREFIX I was not completely sure what you meant. It points to the venv's rood directory. Adding that to the java.library.path does not make any difference for me. And conda already exports it.

I will continue to use my remote linux machine for now. So feel free to close this issue.

Just for the record, here is what I did:

  • Install miniconda
brew install miniforge
conda init zsh
zsh
  • venv
conda create -n grobidEnv python=3.9
conda activate grobidEnv
  • requirements (transformers 4.15.0 gave a compile error for me)
tensorflow-metal==0.6.0
tensorflow-macos==2.10.0
numpy==1.22.3
transformers==4.29.1
jep==4.2.0
# this is important as v70 is incompatible (https://github.com/pypa/setuptools/issues/4376#issuecomment-2126162839)
setuptools==69.5.1
conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt
conda install scikit-learn=1.0.1
  • delft
#clone into delft
# remove all already installed packages from delft/setup.py (diff below)
pip install -e delft
  • setup.py diff
diff --git a/setup.py b/setup.py
index 456da4c..8145d05 100644
--- a/setup.py
+++ b/setup.py
@@ -15,20 +15,15 @@ setup(
     install_requires=[
         'numpy==1.22.3',
         'regex==2021.11.10',
-        'scikit-learn==1.0.1',
         'tqdm==4.62.3',
-        'tensorflow==2.9.3',
-        'h5py==3.6.0',
         'unidecode==1.3.2',
         'pydot==1.4.0',
         'lmdb==1.2.1',
-        'transformers==4.25.1',
         'torch==1.10.1',
         'truecase',
         'requests>=2.20',
     ],
     classifiers=[
         "Programming Language :: Python :: 3.8",
  • test grobid
git apply grobidConf.patch
./gradlew clean install
java -Xmx4G -Djava.library.path=grobid-home/lib/mac_arm-64:/opt/homebrew/Caskroom/miniforge/base/envs/grobidEnv2/lib/ -jar grobid-core/build/libs/grobid-core-0.8.1-SNAPSHOT-onejar.jar -gH grobid-home -dIn /Users/ascadian/Projects/paperSegmentation/train_data/rawSingle -dOut /Users/ascadian/Projects/paperSegmentation/train_data/anno_raw_test_foo  -exe createTraining

Error:

[...]
14:34:45.795 [main] INFO  org.grobid.core.main.LibraryLoader - Loading JEP native library for DeLFT... /Users/ascadian/Projects/grobid4/grobid-home/lib/mac_arm-64
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
	at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.UnsatisfiedLinkError: no jep in java.library.path: grobid-home/lib/mac_arm-64:/opt/homebrew/Caskroom/miniforge/base/envs/grobidEnv2/lib/
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2434)
	at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:818)
	at java.base/java.lang.System.loadLibrary(System.java:1993)
	at org.grobid.core.main.LibraryLoader.load(LibraryLoader.java:158)
	at org.grobid.core.factory.AbstractEngineFactory.init(AbstractEngineFactory.java:72)
	at org.grobid.core.factory.GrobidFactory.<init>(GrobidFactory.java:19)
	at org.grobid.core.factory.GrobidFactory.newInstance(GrobidFactory.java:73)
	at org.grobid.core.factory.GrobidFactory.getInstance(GrobidFactory.java:30)
	at org.grobid.core.engines.ProcessEngine.getEngine(ProcessEngine.java:46)
	at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
	... 6 more

Adding the jep dir back to the java.library.path or copying the lib as before results in the same situation as in the original comment. (Jep init failed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macOS-specific Issue visible only on macOS environments
Projects
None yet
Development

No branches or pull requests

2 participants