Is Urdu Losing the Race?
Current State of Computational Linguistics in Pakistan
by: Waqas A. Khan
On February 15, 2014, The Economist ran a story, “The Urdu Rate of Growth”. The scope of the story was not the national language of Pakistan, The Urdu; however, its name was used as a metaphor to depict the then poor state of growth in the energy sector of Pakistan. Our 200 million people of Pakistan speak a total of 72 provincial and regional languages including Urdu. According to a parliamentary paper, at least10 of them are either “in trouble” or “near extinction”.
From the founder of Pakistan, Quaid-e-Azam Muhammad Ali Jinnah to Justice Jawwad S. Khwaja, everyone interested in making Urdu an official language of Pakistan has failed badly. Partially because those at the top are least interested in it and mainly because, in the twenty-first century, its pace towards a language of computer and science is wobbly and precarious. The debate and quest of making Urdu a digital language has been suppressed by a parallel debate about its usefulness in comparison to English. Both the corners have neglected the need of Urdu digitization as a principle subject. One extreme, the English lovers have denied giving Urdu its space in the national run and the other, Urdu patriots have hated English as a step tongue.
Nations take pride in their language. Identities of the nations are marked with their language. Arabs, Chinese and Germans have taken pride in their languages to an extent that their national and lingual identity is the same today. We, however, have been reluctant. Somehow, from the start, Urdu in Pakistan has been controversial. It was pushed in a multilingual society in a way that it created more enemies than lovers. The first confrontation came from the East Pakistan, now Bangladesh, where Urdu was unknown to the most and the state language, was Bengali. But Quaid-e-Azam in a quest to make a nation decided to tell them, without Urdu, they are not Pakistanis. In 1948, while addressing the students of Dacca University in his immaculate English, he said: “The state language of Pakistan is going to be Urdu and no other language. And anyone who tries to mislead you is an enemy of Pakistan.”
The confrontation started, we lost Bangladesh, but Urdu, our national language could not become the official language until now. We have been wandering between Urdu/English mediums and our 60% kids have been failing and leaving their studies because of continuous failure in English language papers but neither English nor Urdu have become the languages of Pakistan. In our offices, Urdu is not allowed and in streets English.
The debate again started when in September 2015, the Supreme Court Justice, Jawwad S. Khawaja gave three months to the Nawaz Sharif’s government to implement “Article 251 in line with Article 5 of the Constitution” to make Urdu mandatory for “official and other purposes”. But the ultimatum expired, Urdu lost the race.
Globally, many languages die. When nations lose pride in their languages, they become prey to this. Globalization, Industrialization, Innovation and population pressure are the most accredited culprits of the crime, “Language Murder”. Economic patterns of the world force outdated communities to espouse to a different culture and language. Their own language does not conform to the global requirements so those nations deliberately encourage a different language to prevail in the place of their own. This is called “Lingual Assimilation”. The assimilation consists of several stages.
At first, the speakers of a susceptible (weak) language face gigantic pressure to speak in the dominant language. This pressure comes from multiple sources, from official communication, school language, peer pressure and government laws. At the second stage which can be called as “bilingualism”, people start adopting two languages as primary. One as a need and other for love of that language, in our case English and Urdu. At the third and last stage, their new generations which are not in love with the other language find themselves more familiar with the dominant language and become less connected to the national language (in our case Urdu). The most compelling factors that emerge at the last stage are the feelings of shame and inferiority about the language of their parents and grandparents. This is the stage of “Language Murder”. Urdu apparently is passing through the last cycle of stage-2. If necessary and timely measures are not taken, it will proceed to the stage-3, the murder.
In this century to save a language, its relevance to knowledge and innovation must be present. In the case of Urdu, we have forgotten this basic principle. Although progress has been made but yet it is too late to call it a language of information technology (IT). Progressive nations felt this need on time like the German Munchener Oberlandesgericht court decision of 1985 restricts the delivery of computers if it does not accompany operating instructions in German.
Similarly Chinese, European, Russian and Arabian nations took similar measures to enforce their local/national languages. We have been accepting English for all proudly. So no major electronics, computer hardware, utilities and software company of the world bothers to include Urdu in their product manuals today. That is the reason more than 80% of our population is unable to benefit from the automation the world offers today. From Cheque Book to Train Ticket, a common man has no meaning for digitization.
Urdu software development started in 1970 and early 1980s. Since then many applications have been developed for desktop publishing but no one has been successful in offering seamless data exchange between famous design applications like Corel Draw, Adobe Photoshop, Illustrator and other such applications. Urdu even today is exported as a picture and none of the high-end design software understands or accepts Urdu as a font/language. It is impossible to alter Urdu text in Corel Draw and so on. Even the Urdu typing software present including the famous “Inpage” have countless versions prompting for HASP drivers and registry updates.
All of these have been developed without any underlying computing standard and each has its own character set and code page. Even data between these softwares is not exchangeable. Unlike English, every Urdu software has its own keyboard setting, putting a new user at an ultimate challenge to learn the new layout.
To bring pride in the Urdu we must rush to bring IT revolution into it. Solutions of E-governance and e-commerce must be provided in Urdu for a common man to benefit from. Sufficient research work in this area has not been done because of insufficient copyright laws and their poor implementation. The only serious attempt, Inpage is also doomed because every second newbie can alter its code, can edit the credits and even the software name.
However, we have been successful in lexical development and corpus-based lexical data acquisition at CRULP. But the grammar modeling at CRULP is still absent. Like English, no Urdu software is capable of pointing grammatical mistakes and spelling correction facilities to its users. Speech Recognition and Optical Character Recognition are a far cry. Without it, no one can digitize the Urdu text except for creating JPG E-books, allowing nothing except zoom to their readers. To your surprise, our so called experts are still fighting for the existence of Urdu phonemes like lh, mh, , nh, rh. Our phonological rules are not developed and in their absence, Urdu speech synthesis and recognition application would never come.
Similarly, work in the areas of Morphology, Syntax and Semantics is also limited. This is not only hurting the promotion of Urdu but other regional languages like Punjabi, Sindhi, Balochi, Pashto, Saraiki, Brahvi, Hindko and others which depend heavily on Urdu software development. There is a lack of consensus on the writing styles and even the total number of Urdu characters as well. For Siddiqui & Amrohi (1977) these are fifty-three and Platts (1911) thirty-eight. Kifayat (1993), Siraj (1999), PTBB (2000), BUQ (1999) and KUQ (1999) have 36, 51, 53, 47 and 37 characters respectively.
In so much “sufferings” the development of Urdu as a language of future can only be a dream. Our character order is incomplete even and all applications which depend on sorting and indexing (including computational lexica) cannot be developed unless collation sequence has been standardized for a language. Even the standards for keyboards and fonts are absent.
Another issue in Urdu is “Aerab” (like zabr, zer, pesh, jazm etc.), when included in full, take all total of 128 coding slots and all (27)=256 spaces are already filled. At the governmental level, no one has taken help from the established organizations like EACL, ISCA, EAA, ELRA, ELSNET to make Urdu a language of today.
Urdu is losing the race. Can you come forward?
The article was originally published in More Magazine.