Punjabi University Patiala, India, Website http://www.universitypunjabi.orghttp://www.advancedcentrepunjabi.orghttp://www.universitypunjabi.org/sangam/http://www.advancedcentrepunjabi.org/intro1.asp
 

Home Page

Project Background

Objectives

Project Time-Line

Team Members

Project Progress

    Hindi and Urdu are mutually comprehensible languages written in mutually incomprehensible scripts and spoken by more than 600 million people in India and Pakistan. Over the time, with the influence of Persian in Urdu and Sanskrit in Hindi, the vocabularies of the two languages have also become different though they still share more than 70% of common words. Though, the grammar of the two languages is still same. This project is a culmination of twelve years of academic research and literacy development both in the UK and the development of languages in Pakistan and India. The aim of the partnership is to facilitate electronic and written communication between people living in India and Pakistan through the development of a bi-directional web based Hindi-Urdu Language Transliteration/Translation Tool. The target groups will be Media organisations (such as magazines/newspapers), literary and literacy promotional organizations, writers and NGOs involved in dissemination activity amongst the urban and rural poor, virtual Hindi-Urdu speaking communities, schools and colleges. The intellectual background to this work has already been completed via a grant from the EU, Asia-ITC programme. The Punjabi University at Patiala has also developed Gurmukhi to Shahmukhi(Urdu) and reverse Transliteration softwares. Punjabi University is also currently working on development of Urdu-Hindi Transliteration tool through a funded research project. This project will develop the complementary Hindi to Urdu Transliteration Tool as well as a complete machine translation system between Hindi and Urdu languages as facilitate use of these technologies on the web, thus enhancing networking between India and Pakistan.

 

    South Asia is one of those unique parts of the world where single languages are written in different scripts.This is the case for example with Urdu and Hindi spoken by hundreds of millions of people, but written in India (500 million) in Devnagri script (a Left to Right script) and in Pakistan (80 million), it is written in Urdu (a Right to Left script based on Arabic). In spoken form Hindi and Urdu are mutually comprehensible languages but they are written in mutually incomprehensible scripts and spoken by more than 600 million all over the world. Over the time, with the influence of Persian in Urdu and Sanskrit in Hindi, the vocabularies of the two languages have also become different though they still share more than 70% of common word and the grammar of the two languages is still same.

The project aims to bring the Urdu and Hindi speaking people closer by developing a transliteration/translation tool for the two languages. We aim to provide a tool that will help people in the two countries to link across a hostile geographical divide. In so doing we will provide an ITC solution to a social problem that had seemed insurmountable for centuries.

The problem of communication between Hindi and Urdu languages has long been a social barrier between the Muslim populations of India and the majority Hindi public. In addition it is one of the main barriers to people to people contact between India and Pakistan. This project will address this problem through the development of a web based transliteration/ translation system.

The option of a transliteration component is to enable the well developed poetic verse in the Urdu language is to be available to the Hindi literate public. The translation tool will enable the Hindi and Urdu literate people to convert the Urdu websites to Hindi and reverse, enabling them to read them in their languages. This will facilitate electronic and written communication between people living in India and Pakistan through the development of a bi-directional web based Hindi-Urdu Language Transliteration/Translation Tool.

 

    Development of a web based software package for automatic machine transliteration/translation between Hindi and Urdu languages with following features

  • Conversion of Hindi text to Urdu and reverse script with the click of a mouse
  • Transliteration accuracy to be more than 95% at word level.
  • As part of this process the following tools will also need to be developed:
    • Development of Hindi-Urdu and Urdu-Hindi electronic Dictionaries
    • Development of mapping tables and rules for transliteration/translation from Hindi to Urdu and reverse.
    • Development of sentence aligned parallel Hindi-Urdu corpus.
    • Development of module for converting any Hindi website to Urdu and reverse

 
 Year  Beginning of Activity Output
Schedule of the Year
Month 1 System analysis and design. Study of Urdu and Hindi languages and vocabulary Refresher workshop for Urdu and Hindi languages
Month 2 Development of mapping tables and transliteration rules for transliteration of Hindi to Urdu. Selection of 25,000 most frequently used Hindi-Urdu terms. Design of Lexical entry interface and lexical entry of these terms. Data entry of 10,000 Hindi-Urdu terms
Month 3 Development and implementation of mapping tables and transliteration rules for transliteration of Hindi to Urdu. Development of parallel Hindi-Urdu corpus. Data entry of 25,000 Hindi-Urdu terms. Half million word parallel Hindi-Urdu corpus.
Month 4 Integration of Hindi-Urdu dictionary with the rule based Transliteration software. Continue work on Development of parallel Hindi-Urdu corpus. Hindi-Urdu Transliteration Software One million word parallel Hindi-Urdu corpus.
Month 5 Analyse the Hindi and Urdu corpus (Developed in the Urdu-Hindi Transliteration Project) to determine the Hindi and Urdu words that need to be translated. Create electronic Hindi-Urdu dictionaries for such words. Development of two million word parallel Hindi-Urdu corpus. Electronic Hindi-Urdu dictionary for translation from Hindi to Urdu
Month 6 Create electronic Urdu-Hindi dictionary of Urdu words not found in Hindi vocabulary. Continue work on parallel corpus development. Develop routines for sentence level alignment of parallel corpus. Three million word parallel Hindi-Urdu corpus. Sentence Aligned parallel corpus. Electronic Urdu-Hindi dictionary for translation from Urdu to Hindi.
Month 7 Continue work on parallel corpus development. Develop translation rules Sentence Aligned five million word parallel Hindi-Urdu corpus
Month 8 Develop rules and modules for translation of Urdu text to Hindi using the parallel corpus and Urdu-Hindi dictionary. Urdu-Hindi Translation System
Month 9 Test the Urdu-Hindi Translation system on carefully selected text. Validation and removing of bugs in the Urdu-Hindi translation system.
Month 10 Develop rules and modules for translation of Hindi text to Hindi using the parallel corpus and Hindi-Urdu dictionary. Hindi-Urdu Translation System
Month 11 Testing of Hindi-Urdu Translation System on carefully selected text. Validation and removing of bugs in the Hindi-Urdu translation system.
Month 12 Develop the user interface for text entry of Urdu and Hindi text for online transliteration and translation. Develop the routines to convert complete Hindi website to Urdu and reverse. Installation, User testing and field testing of final version Final version of the software ready.
 
 
 Year  Progress Activity Output
Year 2009
March System analysis and design. Study of Urdu and Hindi languages and vocabulary from translation point of view. A one day workshop was organised in March which included the staff team as well as Language experts from Punjabi University, Patiala:
  • Dr. Anwar Chiragh lecturer, Dept of Punjabi Lexicography, Punjabi University, Patiala
  • Mr. Nadeem Ahmed Lecturer NRLC Patiala.
The linguistic problems associated with transliteration and translation were discussed and reviewed. Discussed the sources for Urdu and Hindi vocabulary Language text and structure analysis of both Urdu and Hindi languages performed.
April Development of mapping tables and transliteration rules for transliteration of Hindi to Urdu.

Selection of 25,000 most frequently used Hindi-Urdu terms.

Design of Lexical entry interface and lexical entry of these terms.
Data entry of 5,000 Hindi-Urdu terms Completed.
Conversion of existing 'C' language Urdu-Hindi engine to Dot Net Service Started.
May Development and implementation of mapping tables and transliteration rules for transliteration of Hindi to Urdu.

Development of parallel Hindi-Urdu corpus.
Data entry of 10,000 Urdu-Hindi terms Completed.
Development of parallel Urdu-Hindi corpus started.
(Reason for slow progress of data entry: No Project Staff for Data entry could be appointed because of election code.)
Conversion of existing 'C' language Urdu-Hindi engine to Dot Net Service Completed.
Website address http://sggs.learnpunjabi.org
June Integration of Hindi-Urdu dictionary with the rule based Transliteration software.

Continue work on Development of parallel Hindi-Urdu corpus.
Rule based Hindi-Urdu Transliteration engine created.
0.1 million word parallel Urdu-Hindi corpus Completed.
(Reason for slow progress in data entry: No Project Staff for Data entry has been appointed)
July Analyse the Hindi and Urdu corpus (Developed in the Urdu-Hindi Transliteration Project) to determine the Hindi and Urdu words that need to be translated.

Create electronic Hindi-Urdu dictionaries for such words.
Analyse the Hindi and Urdu corpus Completed
0.2 million word parallel Urdu-Hindi corpus Completed.
Electronic Urdu-Hindi dictionary development has been hindered due to lack of full time Project Staff.
All the coding and system development work till now has been done by the Project coordinator and co-coordinator and some research students. No full time project staff appointed so far.
 
 
Project staff members for development:

Project LeaderDr. Gurpreet Singh Lehal
Overseas Partner Dr Virinder S Kalra
Co-coordinatorMr. Tejinder Singh Saini

© 2009 ACTDPL Punjabi University, Patiala