Arabic NLP Toolkit logo

Project 04 · Open Source · NLP

Arabic NLP Toolkit

Production-ready Arabic NLP for real-world text — eight dialects, sentiment, NER, morphology, Franco-Arabic transliteration, keywords, profiling, and a polished browser demo.

8
Dialects
301+
Tests
~89%
Coverage
0
GPU Required
Overview

Most Arabic NLP libraries target Modern Standard Arabic only. arabic-nlp-toolkit is built for how Arabic is actually written — Egyptian, Gulf, Levantine, Maghrebi, Iraqi, Yemeni, and Sudanese dialects on social media, in reviews, and in Franco-Arabic chat.

The core ships with a single required dependency (Pydantic v2), rule-based models that work offline, typed JSON-serializable results, a full CLI, and a FastAPI web demo with RTL dark UI for live playground testing.

Capabilities

End-to-end pipeline

Dialect detection

Confidence-ranked scores across eight Arabic varieties with Arabic display names.

Sentiment

Negation, intensifiers, and dialect-aware lexicon scoring for social text.

Named entities

Gazetteer + pattern NER for persons, locations, and organizations.

Transliteration

Franco-Arabic ↔ Arabic and Buckwalter for chat-alphabet workflows.

Morphology & POS

Roots, patterns, stemming, and Universal + Arabic POS tagging.

Keywords & profiling

TF keyword extraction, register detection, quality score, and recommendations.

Normalization

Diacritics, alef variants, mentions, hashtags, and emoji cleanup.

Document export

analyze_document() → JSON-ready pipelines for APIs and ETL.

Web demo

Playground + project profile tabs at python webapp/app.py — port 8765.

Quick start
pip install arabic-nlp-toolkit

from arabic_nlp import ArabicNLP
nlp = ArabicNLP()

nlp.detect_dialect("ازيك عامل ايه؟")   # egyptian
nlp.sentiment("المنتج رائع جداً!")     # positive
doc = nlp.analyze_document("نص كامل")  # JSON export
Dialects
MSAModern Standard Arabic — فصحى
Egyptianمصري — social & media default for Egypt
Gulfخليجي — GCC conversational text
Levantineشامي — Syria, Lebanon, Jordan, Palestine
Maghrebiمغاربي — Morocco, Algeria, Tunisia
Iraqi · Yemeni · SudaneseExtended coverage for regional chat
Technology
PythonPython 3.9–3.12
Pydantic v2
FastAPIweb extra
GitGitHub Actions CI
My Role

Author & maintainer — library architecture, dialect lexicons, test suite (301+ tests), web demo, PyPI packaging, and documentation. MIT licensed, built from Egypt.

Star on GitHub →