ACL 2005

University of Michigan, Ann Arbor

COMPUTATIONAL APPROACHES TO SEMITIC LANGUAGES

June 29, 2005

 

Registration to the workshop is still open. Come join us there!

 

WORKSHOP DESCRIPTION

The Semitic family includes many languages and dialects spoken by a large number of native speakers (around 300 Million). However, Semitic languages are still understudied. The most prominent members of this family are Arabic and its dialects, Hebrew, Amharic, Aramaic, Maltese and Syriac. Beyond their shared ancestry which is apparent through pervasive cognate sharing, a common characteristic of these languages is the rich and productive pattern-based morphology and similar syntactic constructions.

 

An increasing body of computational linguistics work is starting to appear for both Arabic and Hebrew. Arabic alone, as the largest member of the Semitic family, has been receiving a lot of attention lately in terms of dedicated workshops and conferences. These include, but are not limited to, the workshop on Arabic Language Resources and Evaluation (LREC 2002), a special session on Arabic processing in Traitement Automatique du Langage Naturel (TALN 2004), the Workshop on Computational Approaches to Arabic Script-based Languages (COLING 2004), and the NEMLAR Arabic Language Resources and Tools Conference in Cairo, Egypt (2004). This phenomenon has been coupled with a relative surge in resources for Arabic due to concerted efforts by the LDC and ELDA/ELRA. However, there is an apparent lag in the development of resources and tools for other Semitic languages. Often, work on individual Semitic languages, unfortunately, still tends to be done with limited awareness of ongoing research in other Semitic languages. Within the last four years, only three workshops addressed Semitic languages: an ACL2002 Workshop on Computational Approaches to Semitic Languages and an MT Summit IX Workshop on Machine Translation for Semitic Languages in 2003, and the EAMT 2004, held in Malta, had a special session on Semitic languages.

 

This workshop is a sequel to the ACL 2002 workshop and shares its goals of:

 

(i)                 heightening awareness amongst Semitic-language researchers of shared breakthroughs and challenges,

(ii)                highlighting issues common to all Semitic languages as much as possible,

(iii)               encouraging the potential for developing coordinated approaches; and

(iv)             in addition, leveraging resource and tool creation for less prominent members of the Semitic language family.

 

 

WORKSHOP PROGRAM

Opening

9:00

Kareem Darwish, Mona Diab, Nizar Habash

Welcome and Opening

Session 1: Morphology

9:15

Erwin Marsi, Antal van den Bosch, and Abdelhadi Soudi

Memory-based morphological analysis generation and part-of-speech tagging of Arabic

9:40

Shlomo Yona and Shuly Wintner

A Finite-State Morphological Grammar of Hebrew

10:05

Nizar Habash, Owen Rambow, George Kiraz,

Morphological Analysis and Generation for Arabic Dialects

10:30

Coffee break

11:00

Keynote Speaker

Salim Roukos, IBM

 

Session 2: Applications I

11:40

Kareem Darwish, Hany Hassan, Ossama Emam

Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval

12:05

Gregory Grefenstette, Nasredine Semmar, Faza Elkateb-Gara

Modifying a Natural Language Processing system for European languages to treat Arabic in Information Retrieval Applications

12:30

Lunch

Session 3:  Part of Speech Tagging

2:15

Roy Bar-Haim, Khalil Sima'an and Yoad Winter

Choosing an Optimal Architecture for Segmentation and POS-Tagging of Modern Hebrew

2:40

Sisay Fissaha Adafre

Part of Speech tagging for Amharic using Conditional Random Fields

3:05

Kevin Duh and Katrin Kirchhoff

POS Tagging of Dialectal Arabic: A Minimally Supervised Approach

3:30

Coffee Break

Session 4:  Applications II

4:00

Imed Zitouni, Jeff Sorensen, Xiaoqiang Luo, and  Radu Florian

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution

4:25

Samuel Eyassu and Björn Gambäck

Classifying Amharic News Text Using Self-Organizing Maps

4:50

Rani Nelken and Stuart M. Shieber

Arabic Diacritization Using Weighted Finite-State Transducers

5:15

Hany Hassan, Jeffrey Sorensen

An Integrated Approach for Arabic-English Named Entity Translation 

5:40

Parting Words

 

ORGANIZERS

Kareem Darwish (German University in Cairo, Egypt) kareem@darwish.org

Mona Diab (Columbia University, USA) mdiab@cs.columbia.edu

Nizar Habash (Columbia University, USA) habash@cs.columbia.edu

 

PROGRAM COMMITTEE

Ibrahim A. Alkharashi (King Abdulaziz City for Science and Technology, Saudi

Arabia)

Tim Buckwalter (Linguistic Data Consortium, USA)

Violetta Cavalli-Sforza (Carnegie Mellon University, USA)

Yaacov Choueka (Bar-Ilan University, Israel)

Joseph Dichy (Lyon University, France)

Martha Evens (Illinois Institute of Technology, USA)

Ali Farghaly (SYSTRAN Software, Inc.)

Alexander Fraser (USC/ISI)

Andrew Freeman (Mitre)

Alon Itai, (Technion, Israel)

George Kiraz (Beth Mardutho: The Syriac Institute, USA)

Katrin Kirchhoff (University of Washington, USA)

Alon Lavie (Carnegie Mellon University, USA)

Mohamed Maamouri (Linguistic Data Consortium, USA)

Uzzi Ornan (Technion, Israel)

Anne De Roeck (Open University, UK)

Michael Rosner (University of Malta, Malta)

Salim Roukos (IBM, USA)

Khalil Sima'an (University of Amsterdam, Netherlands)

Abdelhadi Soudi (ENIM, Rabat, Morocco)

Shuly Wintner (University of Haifa, Israel)

Remi Zajac (SYSTRAN Software, USA)