Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mecab-python3 and python-mecab-ko conflict #121

Open
kulogix opened this issue May 1, 2024 · 1 comment
Open

mecab-python3 and python-mecab-ko conflict #121

kulogix opened this issue May 1, 2024 · 1 comment

Comments

@kulogix
Copy link

kulogix commented May 1, 2024

Trying to run OpenVoice/demo_part3.ipynb on Apple Silicon.
Even with workarounds (listed below), it attempts to auto-install python-mecab-ko which causes conflicts with mecab-python3.
Appears to happen regardless of whether mecab or mecab-ko is installed.

Two other similar issues were closed without actual resolution.
#119
#113
Why keep closing them without addressing the issues?

Apple Silicon, python 3.10.14 virtual environment.
brew install mecab
Installed MeloTTS, first removing extra "mecab-python3==1.0.5", and removing version from 2nd one.
python -m unidic download

Using:
python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl
mecab_python3-1.0.9-cp310-cp310-macosx_11_0_arm64.whl

%set_env PYTORCH_ENABLE_MPS_FALLBACK=1
edited openvoice/se_extractor.py:22:
model = WhisperModel(model_size, device="cpu", compute_type="float32")
#device = "cuda:0" if torch.cuda.is_available() else "cpu"
device = "mps"

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.4 kB)
Requirement already satisfied: python-mecab-ko-dic in /Users/awannord/Virtualenvs/openvoice/lib/python3.10/site-packages (from python-mecab-ko) (2.1.1.post2)
Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl (348 kB)
Installing collected packages: python-mecab-ko
Successfully installed python-mecab-ko-1.3.5
you have to install python-mecab-ko. "pip install python-mecab-ko"

AttributeError                            Traceback (most recent call last)
Cell In[5], line 28
     25 speaker_key = speaker_key.lower().replace('_', '-')
     27 source_se = torch.load(f'checkpoints_v2[/base_speakers/ses/](http://localhost:8888/base_speakers/ses/){speaker_key}.pth', map_location=device)
---> 28 model.tts_to_file(text, speaker_id, src_path, speed=speed)
     29 save_path = f'{output_dir}[/output_v2_](http://localhost:8888/output_v2_){speaker_key}.wav'
     31 # Run the tone color converter

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:100](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=99), in TTS.tts_to_file(self, text, speaker_id, output_path, sdp_ratio, noise_scale, noise_scale_w, speed, pbar, format, position, quiet)
     98     t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
     99 device = self.device
--> 100 bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
    101 with torch.no_grad():
    102     x_tst = phones.to(device).unsqueeze(0)

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:23](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=22), in get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id)
     22 def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id=None):
---> 23     norm_text, phone, tone, word2ph = clean_text(text, language_str)
     24     phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str, symbol_to_id)
     26     if hps.data.add_blank:

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:12](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=11), in clean_text(text, language)
     10 language_module = language_module_map[language]
     11 norm_text = language_module.text_normalize(text)
---> 12 phones, tones, word2ph = language_module.g2p(norm_text)
     13 return norm_text, phones, tones, word2ph

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:122](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=121), in g2p(norm_text)
    118     continue
    119 # import pdb; pdb.set_trace()
    120 # phonemes = japanese_text_to_phonemes(text)
    121 # text = g2p_kr(text)
--> 122 phonemes = korean_text_to_phonemes(text)
    123 # import pdb; pdb.set_trace()
    124 # # phonemes = [i for i in phonemes if i in symbols]
    125 # for i in phonemes:
    126 #     assert i in symbols, (group, norm_text, tokenized, i)
    127 phone_len = len(phonemes)

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:69](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=68), in korean_text_to_phonemes(text, character)
     66     return text
     68 text = normalize(text)
---> 69 text = g2p_kr(text)
     70 text = list(hangul_to_jamo(text))  # '하늘' --> ['ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆯ']
     71 return "".join(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py:129](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py#line=128), in G2p.__call__(self, string, descriptive, verbose, group_vowels, to_syl)
    126 string = convert_eng(string, self.cmu)
    128 # 3. annotate
--> 129 string = annotate(string, self.mecab)
    132 # 4. Spell out arabic numbers
    133 string = convert_num(string)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py:166](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py#line=165), in annotate(string, mecab)
    165 def annotate(string, mecab):
--> 166     tokens = mecab.pos(string)
    167     if string.replace(" ", "") != "".join(token for token, _ in tokens):
    168         return string

AttributeError: 'NoneType' object has no attribute 'pos'

Running for a 2nd time after this (in a new session) results in different error. Happens anytime python-mecab-ko is installed after mecab-python3. New error:

ModuleNotFoundError                       Traceback (most recent call last)
File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:12](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=11)
     11 try:
---> 12     import MeCab
     13 except ImportError as e:

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py:1](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py#line=0)
----> 1 from .mecab import MeCab, MeCabError, mecabrc_path
      2 from .types import Dictionary, Feature, Morpheme, Span

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py:9](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py#line=8)
      8 import _mecab
----> 9 from mecab.types import Dictionary, Morpheme
     10 from mecab.utils import create_lattice, ensure_list, to_csv

ModuleNotFoundError: No module named 'mecab'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:14](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=13)
     12     import MeCab
     13 except ImportError as e:
---> 14     raise ImportError("Japanese requires mecab-python3 and unidic-lite.") from e
     15 from num2words import num2words
     17 _CONVRULES = [
     18     # Conversion of 2 letters
     19     "アァ[/](http://localhost:8888/) a a",
   (...)
    318     "・[/](http://localhost:8888/) ,",
    319 ]

ImportError: Japanese requires mecab-python3 and unidic-lite.

To revert to the first error:

pip uninstall mecab-python3 python-mecab-ko mecab
pip install mecab-python3
## without manually re-installing python-mecab-ko

Tried pip install mecab, got the following:

RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:367](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=366)
    365 _SYMBOL_TOKENS = set(list("・、。?!"))
    366 _NO_YOMI_TOKENS = set(list("「」『』―()[][]"))
--> 367 _TAGGER = MeCab.Tagger()
    370 def text2kata(text: str) -> str:
    371     parsed = _TAGGER.parse(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab.py:355](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab.py#line=354), in Tagger.__init__(self, *args)
    354 def __init__(self, *args):
--> 355     _MeCab.Tagger_swiginit(self, _MeCab.new_Tagger(*args))

RuntimeError:

Tried installing mecab-ko instead of mecab:

brew uninstall mecab
brew install mecab-ko
pip uninstall mecab-python3 python-mecab-ko mecab
pip install mecab-python3

Get the similar as the first error (attempts auto-install of python-mecab-ko, then complains mecab.pos doesn't exist):

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.4 kB)
Requirement already satisfied: python-mecab-ko-dic in /Users/awannord/Virtualenvs/openvoice/lib/python3.10/site-packages (from python-mecab-ko) (2.1.1.post2)
Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl (348 kB)
Installing collected packages: python-mecab-ko
Successfully installed python-mecab-ko-1.3.5
you have to install python-mecab-ko. "pip install python-mecab-ko"

AttributeError                            Traceback (most recent call last)
Cell In[5], line 28
     25 speaker_key = speaker_key.lower().replace('_', '-')
     27 source_se = torch.load(f'checkpoints_v2[/base_speakers/ses/](http://localhost:8888/base_speakers/ses/){speaker_key}.pth', map_location=device)
---> 28 model.tts_to_file(text, speaker_id, src_path, speed=speed)
     29 save_path = f'{output_dir}[/output_v2_](http://localhost:8888/output_v2_){speaker_key}.wav'
     31 # Run the tone color converter

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:100](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=99), in TTS.tts_to_file(self, text, speaker_id, output_path, sdp_ratio, noise_scale, noise_scale_w, speed, pbar, format, position, quiet)
     98     t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
     99 device = self.device
--> 100 bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
    101 with torch.no_grad():
    102     x_tst = phones.to(device).unsqueeze(0)

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:23](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=22), in get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id)
     22 def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id=None):
---> 23     norm_text, phone, tone, word2ph = clean_text(text, language_str)
     24     phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str, symbol_to_id)
     26     if hps.data.add_blank:

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:12](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=11), in clean_text(text, language)
     10 language_module = language_module_map[language]
     11 norm_text = language_module.text_normalize(text)
---> 12 phones, tones, word2ph = language_module.g2p(norm_text)
     13 return norm_text, phones, tones, word2ph

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:122](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=121), in g2p(norm_text)
    118     continue
    119 # import pdb; pdb.set_trace()
    120 # phonemes = japanese_text_to_phonemes(text)
    121 # text = g2p_kr(text)
--> 122 phonemes = korean_text_to_phonemes(text)
    123 # import pdb; pdb.set_trace()
    124 # # phonemes = [i for i in phonemes if i in symbols]
    125 # for i in phonemes:
    126 #     assert i in symbols, (group, norm_text, tokenized, i)
    127 phone_len = len(phonemes)

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:69](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=68), in korean_text_to_phonemes(text, character)
     66     return text
     68 text = normalize(text)
---> 69 text = g2p_kr(text)
     70 text = list(hangul_to_jamo(text))  # '하늘' --> ['ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆯ']
     71 return "".join(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py:129](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py#line=128), in G2p.__call__(self, string, descriptive, verbose, group_vowels, to_syl)
    126 string = convert_eng(string, self.cmu)
    128 # 3. annotate
--> 129 string = annotate(string, self.mecab)
    132 # 4. Spell out arabic numbers
    133 string = convert_num(string)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py:166](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py#line=165), in annotate(string, mecab)
    165 def annotate(string, mecab):
--> 166     tokens = mecab.pos(string)
    167     if string.replace(" ", "") != "".join(token for token, _ in tokens):
    168         return string

AttributeError: 'NoneType' object has no attribute 'pos'

As before, running a new session (no changes to installs -- other than the previously auto-installed python-mecab-ko), get the following (different) error:

ModuleNotFoundError                       Traceback (most recent call last)
File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:12](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=11)
     11 try:
---> 12     import MeCab
     13 except ImportError as e:

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py#line=0)
----> 1 from .mecab import MeCab, MeCabError, mecabrc_path
      2 from .types import Dictionary, Feature, Morpheme, Span

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py:9](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py#line=8)
      8 import _mecab
----> 9 from mecab.types import Dictionary, Morpheme
     10 from mecab.utils import create_lattice, ensure_list, to_csv

ModuleNotFoundError: No module named 'mecab'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:14](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=13)
     12     import MeCab
     13 except ImportError as e:
---> 14     raise ImportError("Japanese requires mecab-python3 and unidic-lite.") from e
     15 from num2words import num2words
     17 _CONVRULES = [
     18     # Conversion of 2 letters
     19     "アァ[/](http://localhost:8888/) a a",
   (...)
    318     "・[/](http://localhost:8888/) ,",
    319 ]

ImportError: Japanese requires mecab-python3 and unidic-lite.
@kulogix
Copy link
Author

kulogix commented May 1, 2024

Update: On Mac (and Windows), the file system is case insensitive by default.
python-mecab-ko tries to install to mecab, and mecab-python3 to MeCab. If you're not on a case-sensitive file system, one install will overwrite/merge with the other.

On Mac, create a new case-sensitive volume and setup your virtual environment there:
Diskutils: + Volume, Name: Playground, Format: APFS (Case-sensitive)
Create a symbolic link in your home folder: ln -s /Volumes/Playground ~/playground

Once the python project are setup on a case-sensitive file system, then everything works (for European, Japanese, Korean, and Chinese). I also had mecab-ko installed:
brew install mecab-ko

It would be nice if these tips (case-sensitive volume, how to create on Mac, and need to install mecab vs mecab-ko) was added to the README / docs -- for those that don't want to rely on the Docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant