Punjabi University Patiala, India, Website http://www.universitypunjabi.orghttp://www.advancedcentrepunjabi.orghttp://www.universitypunjabi.org/sangam/http://www.advancedcentrepunjabi.org/intro1.asp
http://www.apdip.net/projects/ictrnd/2005/185/index_html/view
 

Home Page

Project Background

Objectives

Project beneficiaries

Project Time-Line

Team Members

Project Progress

Starting Date of the Project: March 2006
First Year -2006
Month of the Year Progress
March-06
  • A detailed study of the various standards and formats like INPAGE, UNICODE, Nastalik based fonts has been performed.
  • Language text and structure analysis of both Shahmukhi and Gurmukhi scripts performed.
  • System Design completed.
April-06
    Inpage to Unicode Converter
  • In order to develop the corpus for Shahmukhi it is necessary to have a converter for Inpage to Unicode formats, as majority of source text (Shahmukhi) is available in InPage only. A utility for conversion of inPage text to unicode format has been developed.
  • Selection of 25,000 most frequently used Shahmukhi-Gurmukhi terms completed, based on frequency analysis of Shahmukhi corpus.
  • Design of Lexical entry interface completed and 5000 Shahmukhi-Gurmukhi entries digitized
May-06
  • Phonetic based mapping table for transliteration from Shahmukhi to Gurmukhi text finalised.
  • Knowlwdge base of Shahmukhi-Gurmukhi tansliteration rules created.
  • 5000 more Shahmukhi-Gurmukhi entries digitized.
June-06
  • Shahmukhi Corpus
    • A Corpus of Shahmukhi has been created having 5 Lakh Total Words.
      Tools for Corpus Analysis
    • Corpus analysis tools have been developed to perform the various analysis like Word Frequency, Bi-Gram and Tri-Gram on the corpus.
  • In total 15,000 Shahmukhi-Gurmukhi entries digitized.
July-06
  • The size of Shahmukhi Corpus has been increased form 5 Lakh to 10 lakh (Total Words).
  • 70% work has been performed to generate a rule based primitive version of the Transliteration software.
  • In total 20,000 Shahmukhi-Gurmukhi entries digitized.
August-06
  • 11 lakh (Total Words) Shahmukhi Corpus Ready.
  • A rule based primitive version of the Transliteration software generated.
  • 5000 more Shahmukhi-Gurmukhi entries digitized.
September-06
  • The size of Shahmukhi Corpus increased to 12 lakh total words.
  • Testing of rule based primitive version completed.
  • Working on Integration of Shahmukhi Gurmukhi dictionary with primitive version.
October-06
November-06
  • Integration of Shahmukhi Gurmukhi dictionary with primitive version completed.
  • Testing of this primitive version completed.
December-06
  • Web based version of InPage to Unicode Converter created.
  • Working on Shahmukhi and Gurmukhi Corpus Analysis.
  • The initial design of the Morphological Analyzer, being developed with linguistic experts.
January-07
  • Web based version of Shahmukhi to Gurmukhi Transliteration started.
  • 70% of Shahmukhi and Gurmukhi Corpus Analysis completed.
February-07
  • Front end and back end design of Web based version completed.
  • Shahmukhi and Gurmukhi Corpus Analysis Completed.
  • 10,000 Shahmukhi words of Morphological Analyzer, digitised with linguistic experts.
March-07
  • 60% of Integration of Web based version completed.
  • In total 20,000 Shahmukhi words of Morphological Analyzer, digitised with linguistic experts.
April-07
  • Integration of Web based version completed and preparing the first Beta version.
  • More 10,000 Shahmukhi words of Morphological Analyzer, digitised.
May-07
June-07
  • Performed the following Enhancements in beta version:
    1. Frontend visualization improved.
    2. Addition of Roman Pad along with Gurmukhi and Shamukhi Pads.
    3. Backend support of Dictionary improved
    4. Online Shahmukhi Web Page Transliterationbeta
 
© 2006 ACTDPLLC Punjabi University, Patiala