Skip to content

Releases: mesolitica/malaya

Version 5.1

27 Mar 14:33
Compare
Choose a tag to compare
  1. Purged Tensorflow, no longer needed it.
  2. Added Malay dictionary module, https://malaya.readthedocs.io/en/stable/dictionary-malay.html
  3. Syllable now use PyTorch LSTM, https://malaya.readthedocs.io/en/stable/load-tokenizer-syllable.html
  4. Pretrained Transformer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-transformer.html
  5. Masked LM scorer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-mlm.html
  6. Causal LM scorer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-gpt2-lm.html
  7. Stemmer now use PyTorch LSTM, https://malaya.readthedocs.io/en/stable/load-stemmer.html
  8. Jawi now use T5 HuggingFace, support Rumi-to-Jawi and Jawi-to-Rumi, https://malaya.readthedocs.io/en/stable/load-jawi.html
  9. Kesalahan Tatabahasa now use T5 HuggingFace, https://malaya.readthedocs.io/en/stable/load-tatabahasa-tagging.html
  10. Emotion Analysis now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-emotion.html
  11. Sentiment Analysis now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-sentiment.html
  12. Added Embedding module, https://malaya.readthedocs.io/en/stable/load-embedding.html
  13. Added Reranker module, https://malaya.readthedocs.io/en/stable/load-reranker.html
  14. Semantic Similarity now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-similarity-semantic.html
  15. Entities Recognition now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-entities.html
  16. Part-of-Speech now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-pos.html
  17. Dependency Parsing now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-dependency.html
  18. Constituency Parsing now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-constituency.html
  19. Now Translation module use from and to parameters, https://malaya.readthedocs.io/en/stable/load-translation.html
  20. Zero-shot classification now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-zeroshot-classification.html
  21. Text-to-KG now use T5 HuggingFace, https://malaya.readthedocs.io/en/stable/text-to-kg.html

Version 5.0

19 Dec 15:07
Compare
Choose a tag to compare
  1. Started initial mixed language knowledge graph toolkit, https://malaya.readthedocs.io/en/latest/knowledge-graph-toolkit.html
  2. Released Abstractive Augmentation, able to convert standard structure to local / social media structure while maintaining the same polarity, standard EN -> local MS, standard MS -> local MS, https://malaya.readthedocs.io/en/latest/load-augmentation-abstractive.html
  3. Now Encoder based (WordVector, Encoder models) Augmentation will be under malaya.augmentation.encoder, https://malaya.readthedocs.io/en/latest/load-augmentation-encoder.html
  4. Now Rules based Augmentation will be under malaya.augmentation.rules, https://malaya.readthedocs.io/en/latest/load-augmentation-rules.html
  5. Released HuggingFace T5 models for True Case module, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-augmentation-rules.html
  6. Released HuggingFace T5 models for Segmentation module, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-segmentation-huggingface.html
  7. Released HuggingFace T5 models for Abstractive Normalizer, end-to-end Text Normalization, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-segmentation-huggingface.html
  8. Now Rules based Normalizer will be under malaya.normalizer.rules, https://malaya.readthedocs.io/en/latest/load-normalizer.html
  9. Released HuggingFace T5 models for Kesalahan Tatabahasa, https://malaya.readthedocs.io/en/latest/load-tatabahasa-tagging-huggingface.html
  10. Now Prefix Text Generator will be under malaya.generator.prefix, https://malaya.readthedocs.io/en/latest/load-prefix-generator.html
  11. Now Isi Penting Text Generator will be under malaya.generator.isi_penting, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator.html
  12. Released HuggingFace T5 models for Isi Penting Generator, with Article style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-article-style.html
  13. Released HuggingFace T5 models for Isi Penting Generator, with News Headline style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-headline-news-style.html
  14. Released HuggingFace T5 models for Isi Penting Generator, with Karangan style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-karangan-style.html
  15. Released HuggingFace T5 models for Isi Penting Generator, with News style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-news-style.html
  16. Released HuggingFace T5 models for Isi Penting Generator, with Product Description style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-product-description-style.html
  17. Released HuggingFace T5 models for Paraphrase module, https://malaya.readthedocs.io/en/latest/load-paraphrase-huggingface.html
  18. Now Doc2Vec based text similarity will be under malaya.similarity.doc2vec, https://malaya.readthedocs.io/en/latest/load-doc2vec-similarity.html
  19. Now Semantic text similarity will be under malaya.similarity.semantic, https://malaya.readthedocs.io/en/latest/load-semantic-similarity.html
  20. Released HuggingFace T5 models for Semantic Similarity, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-semantic-similarity-huggingface.html
  21. Released HuggingFace T5 models for Dependency Parsing, https://malaya.readthedocs.io/en/latest/load-dependency-huggingface.html
  22. Released HuggingFace T5 models for Abstractive Summarization, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-abstractive-huggingface.html
  23. Released HuggingFace T5 models for MS-EN Translation, https://malaya.readthedocs.io/en/latest/load-translation-ms-en-huggingface.html
  24. Released HuggingFace T5 models for noisy MS-EN Translation, end-to-end mixed language translation to EN, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en-huggingface.html
  25. Released HuggingFace T5 models for EN-MS Translation, https://malaya.readthedocs.io/en/latest/load-translation-en-ms-huggingface.html
  26. Released HuggingFace T5 models for noisy EN-MS Translation, end-to-end mixed language translation to MS, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms-huggingface.html
  27. Now Extractive QA will be under malaya.qa.extractive, https://malaya.readthedocs.io/en/latest/load-qa-extractive.html
  28. Released HuggingFace T5 models for Extractive QA, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-qa-extractive-huggingface.html
  29. Released HuggingFace T5 models for ZeroShot Classification, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-zeroshot-classification-huggingface.html
  30. Released HuggingFace T5 models for ZeroShot Entity Recognition, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/zeroshot-ner.html
  31. Now Decomposition based Topic Modeling will be under malaya.topic_model.decomposition, https://malaya.readthedocs.io/en/latest/load-topic-model-decomposition.html
  32. Now LDA2Vec based Topic Modeling will be under malaya.topic_model.lda2vec, https://malaya.readthedocs.io/en/latest/load-topic-model-lda2vec.html
  33. Now Transformer based Topic Modeling will be under malaya.topic_model.transformer, https://malaya.readthedocs.io/en/latest/load-topic-model-transformer.html
  34. Added BERTopic inferface for Topic Modeling, https://malaya.readthedocs.io/en/latest/load-topic-model-bertopic.html
  35. Released HuggingFace T5 models for Abstractive Keyword, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-abstractive-keyword-huggingface.html
  36. Now Extractive Keyword will be under malaya.keyword.extractive, https://malaya.readthedocs.io/en/latest/load-keyword-extractive.html
  37. Released HuggingFace interface for Transformer, https://malaya.readthedocs.io/en/latest/load-transformer-huggingface.html

Version 4.9.2

17 Sep 13:30
Compare
Choose a tag to compare
  1. Released Masked language model text scoring, https://malaya.readthedocs.io/en/latest/load-mlm.html
  2. Released GPT2 language model text scoring, https://malaya.readthedocs.io/en/latest/load-gpt2-lm.html
  3. Compare spelling correction results using KenLM, Masked LM and GPT2 LM, https://malaya.readthedocs.io/en/latest/load-gpt2-lm.html
  4. Added deep learning based for syllable tokenizer, with WER accuracy is 4.3% while rules based WER accuracy is 9.01%, https://malaya.readthedocs.io/en/latest/load-tokenizer-syllable.html
  5. Starting 4.9.2, pytorch and transformers are necessary libraries for Malaya.

Version 4.9.1

01 Sep 17:01
Compare
Choose a tag to compare
  1. Added pretrained KenLM models, trained on https://github.com/huseinzol05/malay-dataset/tree/master/dumping/clean, https://malaya.readthedocs.io/en/latest/load-kenlm.html
  2. Improved spelling correction interface, under malaya.spelling_correction.*.
  3. Improved JamSpell spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-jamspell.html
  4. Improved speed and accuracy Probability spelling correction, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability.html
  5. Added Probability LM, probability + KenLM spelling correction, a better scoring based on sentence context, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability-lm.html
  6. Improved Spylls spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability-lm.html
  7. Improved SymSpeller spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-symspell.html
  8. Improved Transformer Encoder spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-encoder-transformer.html
  9. Improved Seq2Seq Transformer spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-transformer.html
  10. Added Syllable tokenizer, https://malaya.readthedocs.io/en/latest/load-tokenizer-syllable.html
  11. Added stemmer trained on noisy dataset to achieve better stemming for local language structure, https://malaya.readthedocs.io/en/latest/load-stemmer.html#Sensitive-towards-local-language-structure
  12. Improved normalizer, now able to add stemmer and add more parameters, https://malaya.readthedocs.io/en/latest/load-normalizer.html

Version 4.9

02 Aug 08:46
Compare
Choose a tag to compare
  1. Released EN-MS translation alignment using Eflomal, https://malaya.readthedocs.io/en/latest/alignment-en-ms-eflomal.html
  2. Released EN-MS translation alignment using HuggingFace, https://malaya.readthedocs.io/en/latest/alignment-en-ms-huggingface.html
  3. Released MS-EN translation alignment using Eflomal, https://malaya.readthedocs.io/en/latest/alignment-ms-en-eflomal.html
  4. Released MS-EN translation alignment using HuggingFace, https://malaya.readthedocs.io/en/latest/alignment-ms-en-huggingface.html
  5. Now preprocessing able to use NMT, https://malaya.readthedocs.io/en/latest/load-preprocessing.html#Load-translation
  6. Released Demoji module, https://malaya.readthedocs.io/en/latest/load-demoji.html
  7. Added transformer for Rumi-Jawi converter, https://malaya.readthedocs.io/en/latest/load-rumi-jawi.html, BASE size model WER 0.043%
  8. Added transformer for Jawi-Rumi converter, https://malaya.readthedocs.io/en/latest/load-rumi-jawi.html, BASE size model WER 0.3%
  9. Added substring language detection combined rules based and deep learning model, https://malaya.readthedocs.io/en/latest/language-detection-words.html
  10. Added EN-MS translation trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms.html
  11. Added EN-MS translation using HuggingFace trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms-huggingface.html
  12. Added MS-EN translation trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en.html
  13. Added EN-MS translation using HuggingFace trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en-huggingface.html
  14. Now normalizer able to translate, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Use-translator
  15. Now normalizer able to group similar subword languages and translate to get better local context translation, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Problem-with-single-word-translation
  16. Now normalizer able to segment words, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Use-segmenter
  17. Now normalizer able to normalize emoji, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Normalize-emoji

Version 4.8

01 Jun 13:44
Compare
Choose a tag to compare

Version 4.7.5

06 May 13:48
Compare
Choose a tag to compare
  1. Improved Word Tokenizer, https://malaya.readthedocs.io/en/latest/load-tokenizer.html
  2. Improved Normalizer for better speech synthesis, https://malaya.readthedocs.io/en/latest/load-normalizer.html
  3. By default use HuggingFace as backend repository.

Version 4.7.4

13 Apr 16:18
Compare
Choose a tag to compare
  1. Full support HuggingFace for pretrained and finetuned models, check how to use HuggingFace as model repository, https://malaya.readthedocs.io/en/latest/huggingface-repository.html
  2. Added full unit tests for pretrained and finetuned models at https://github.com/huseinzol05/malaya/tree/master/tests

Version 4.7.3

17 Mar 12:04
Compare
Choose a tag to compare
  1. Improved Regex for urls.
  2. Now predict_words able to do in Jupyter Notebook.

Version 4.7.2

10 Mar 03:53
Compare
Choose a tag to compare
  1. Improved sentiment module, now default label is ['negative', 'neutral', 'positive'], and use better dataset iterate using active learning, https://malaya.readthedocs.io/en/latest/load-sentiment.html
  2. Dataset can get at https://github.com/huseinzol05/malay-dataset/tree/master/sentiment/semisupervised-twitter-3class, label studio labelling for general tweets at https://label.malaysiaai.ml/projects/12/data, label studio labelling for political tweets at https://label.malaysiaai.ml/projects/16/data, get access at https://github.com/malaysia-ai/label-studio#how-to-get-access