Skip to content
ninjin edited this page May 16, 2011 · 8 revisions

MeCab

This page describes interaction with MeCab and how to get in running.

About

MeCab is a word segmentation tool for Japanese and can be found at:

http://mecab.sourceforge.net/

Like most academic software MeCab has a few rough edges, but we will get you up and running in a jiffy with some knowledge about software porting. We'll even make sure it runs in its own directory not depending on being able to root (MeCab has a strong desire to be in /usr/local but we will dodge that).

Instructions

These instructions assume that we are installing the 0.98 version of MeCab and the 2.7.0 version of the IPA dictionary.

Get the following files:

Create a directory and extract the source code.

mkdir mecab
cd mecab
mv ${PATH_TO_MECAB_DOWNLOADS}/mecab-*.tar.gz ./
find . -name '*.tar.gz' | xargs -n 1 tar xfz

We will install MeCab in this directory, thus we need a local to place it in.

mkdir local

Now, we configure, compile and install MeCab.

( cd mecab-0.98 && ./configure --prefix=`pwd`/../local --enable-utf8-only && make install clean )

Then the same for the dictionaries.

( cd mecab-ipadic-2.7.0-20070801 && env PATH="${PATH}:`pwd`/../local/bin" \
    ./configure --prefix=`pwd`/../local --with-charset=utf8 && make install clean )

Do a dry-run with the MeCab binary.

echo '鴨かも?' | local/bin/mecab

Now we only have to build the Python SWIG bindings.

( cd mecab-python-0.98 && env PATH="${PATH}:`pwd`/../local/bin" \
    python setup.py build_ext --inplace --rpath `pwd`/../local/lib )

We want to try out the bindings, but first we patch test.py since it doesn't have an encoding specified.

sed -i -e '2i# -*- coding: utf-8 -*-' mecab-python-0.98/test.py

Then we are ready to go.

( cd mecab-python-0.98 && python test.py )