Since the last thread was getting unwieldy in size it has been shut. Please continue the discussion here.
The last few posts from the previous thread are posted below for continuity.
Since the last thread was getting unwieldy in size it has been shut. Please continue the discussion here.
The last few posts from the previous thread are posted below for continuity.
Dear Dharma
Unicode/SLS 1134 is incomplete Standard. They have not registered all the Sinhala characters.
It is just the typewriter technology
Donald Gaminitillake
Colombo
Hi Everyone,
Donald Gaminitillake on Jul 7th, 2006 at 9:24 pm wrote:
——————————————————————–
quote
This union produces a set of encodings containing all the basic elements (letters). Ironically, this union not only contains your 1660 letters, it also includes baendi akuru. You should also note that the basic elements are not encoded by a fixed number of bits.
Unquote
——————————————————————–
You’ve finally quoted my comments from our very first discussion in late 2004! At that stage, Anuradha and I, somewhat naively, thought we could explain Unicode Sinhala to Donald.
I incorrectly assumed that Donald was an academic and hence described Unicode Sinhala in an overly mathematical manner (please read this short email):
http://sinhala.sourceforge.net/archive/akuru.org/0013.html
Donald Gaminitillake on Jul 11th, 2006 at 9:39 am wrote:
——————————————————————–
Please confirm whther you have a hidden “union” of character table apart from the few characters registered in the unicode = Slsi1134.
“yes” or “no”
——————————————————————–
NO. I guess 1.5 years onwards you still can’t understand this. There is no *”hidden” union*.
Very simply, codepoints 0d80 – 0dff represent the Sinhala codepage and codepoint 200d represents a ‘Zero Width Joiner’ (ZWJ) and is shared amongst all the South Asian and some other scripts. That is why ZWJ is not in the codepage of one or all the South Asian scripts, it is in a ‘shared’ codepage. All the aforementioned codepoints are officially registered with Unicode and are a part of the *standard*.
Please reread my earlier comments for a detailed explanation:
1) http://www.lirneasia.net/2006/04/questioning-ict-myths/#comment-1571
2) http://www.lirneasia.net/2006/04/questioning-ict-myths/#comment-1703
Regards,
Harshula
Hi,
There really seems to be two main discussions (*) that are occurring here:
1) Unicode Sinhala (SLS1134:2004) is complete. And those that disagree.
2) Why isn’t Unicode Sinhala (SLS1134:2004) support available in software released before 2004?
Can we talk about (1) for the moment. Other than Donald, does anyone think Unicode Sinhala (SLS1134:2004) is incomplete?
(*) Of course I’m ignoring all the slander and the axe grinding that has been going on in these threads.
Regards,
Harshula
Dear Harsula
Tell me after the “‘Zero Width Joiner’ (ZWJ)” what happens.
Table 35 row 20 under General Punctuation these joiners are listed.
Once you input the joiners it goes in search of a codepoint to represent the character
Harsula was the person who spoke about a union of characters now you say “NO”
I quote from your 2004 mail
For extensibility purposes, the Unicode encoding ends up involving more than one simple matrix. In our case it is roughly the union of:
* A1
* A2
* C (= Cartesian product of A2 and A3)
* D (= a subset of the Cartesian product of B, A3, A2 and C2 (C2 = Union of C and A2))
This union produces a set of encodings containing all the basic elements (letters). Ironically, this union not only contains your 1660 letters, it also includes baendi akuru. You should also note that the basic elements are not encoded by a fixed number of bits.
For the reasons stated above, it is incorrect to claim that the Unicode Sinhala encoding is incomplete.
Unquote
Please show us the locations of these elements in Unicode or SLSI1134. “This union produces a set of encodings containing all the basic elements”
Where are these basic elements. Give one example for “DU” (result after the joiner)
Like the joiner which has been registered in the unicode give me the other locations which are byond the list of the Sinhala Unicode or SLSI 1134
Donald Gaminitillake
Colombo
I am sorry, I cannot write this under my own name. Had I done so, this post would have obviously carried more weightage. In fact, I did not want to contribute first, but then thought if I do not, what I am going to tell now will be hidden forever. Hence this post.
Anyway, let me tell you what I know.
Donald, Ahangama, Dharma, Harsha, Harsula or Anuradha (or anyone here) knows very little about the history of this topic. Let me educate them.
At a certain committee meeting in CINTEC in early 2002, somebody pointed out that the Sinhalese usage in computers is (was) extremely low, in spite of the fact that there is (was) THE Unicode. (There were no Donald Gaminitilakes then and more or less, everyone 100% accepted Unicode was THE ONLY STANDARD.)
Everybody agreed to this and as a result, it was decided to assign the task of *POPULARISING* Sinhalese in the computers to Dr. Gihan Dias. (With the consent of all these who were present, including *Prof. V. K. Samaranayake.*)
The tasks Dr. Dias was supposed to perform, inter alia, were:
1. Talk to all major OS manufacturers (especially Microsoft) and persuade them to support Sinhalese in the respective OSes.
2. Conduct a contest among Sinhalese font developers and give awards to the best ones. (So more Sinhalese font sets based on Unicode will be available for the user and it will solve the incompatibility issues.)
3. Discuss with newspaper companies (who are among top users of Sinhalese fonts) and persuade them to use fonts based on Unicode instead of proprietary fonts they were using. (So there will be no question of sharing information.)
4. Encourage Sinhalese content developers both within government and out of government to produce more and more content in local languages
5. Conduct workshops / lectures / seminars on related topics to educate the developers and the users
Please note this was in early 2002 and there was no e-Sri Lanka, nor ICTA. SMS usage was minimal and nobody bothered about sending SMS in Sinhalese. Ditto for OCR and WAP. They were not big issues. The only issue was only to have a single set of fonts for Apple and IBM environments to ease the difficulties faced by the *PUBLISHERS, WEB DEVELOPERS AND WEB USERS* thereby increasing the Sinhalese usage in computers.
The information about this initiative appears at http://www.fonts.lk/intro.html, but under a later date line. Probably it was the date the committee first met, but the project has a history much longer.
Then came the big tamasha e-Sri Lanka in 2003, and suddenly everyone wanted to be a part of that. Not that I blame anyone. When there is a national initiative in ICT, everyone wants to be in it, rather than out of it. In addition, they were given high hopes of good compensation packages.
What followed was the saddest episode.
Dr. Gihan Dias exploited this project so that he can get into ICTA. Apart from this project, he has never been involved in any national level *development* projects. Technical, yes; but development, no. He was a good techie, no doubt, but that was all. He joined as the Director – HR, though he never had any experience in HR. (In fact, he never did anything related to HR in ICTA.)
So, to make a long story short, overnight Dr. Dias changed his colours and joined ICTA, leaving Prof. Samaranayake and the company alone crying in the road. (who were among the top critics of ICTA then.) Naturally, Prof. Samaranayake was badly hurt. He had told many people, “Gihan has hijacked MY project” (That was largely true.)
(In the Computer Society Annual sessions of 2003, Prof. Samaranayake, as the Chairman of one of the sessions brought out this topic and passed a hint. Dr. Dias had immediately got up and responded. The following debate between the two amused many in the audience.)
Coming back to the topic, needless to say, Dr. Dias never achieved any of the above initial objectives. He was not the best person to do so. He was only a techie. Further those were assigned to him by CINTEC and not by ICTA, and back in 2003 CINTEC was a four letter word, nobody wanted to mention in public. So everyone forgot what CINTEC wanted.
The only achievement of Dr. Gihan Dias was SLS 1134, (which he developed with the help of few other software companies.) I personally do not see this has done any major changes.
Somewhere in the middle (I am not sure exactly when) we also started hearing about a Donald Gaminitilake, who was very critical about Unicode/SLS1134/ICTA etc. One reason why Donald was so famous was that he got into the tussle between Dr. Dias and Prof. Samaranayake. Each of them wanted Donald to attack the other, so they tolerated him to certain extent. If both of them worked together against Donald, the latter would not have had any chance. Fortunately for Donald, he fought against a divided force. Prof. Samaranayake and Dr. Dias never joined hands to fight the common enemy, Donald.
I am not a techie and frankly I do not understand what Donald says. So I let the experts to decide whether it is necessary to give him and ear or not.
This brings us to the present. Although four years have escaped, anybody will see we are still at square one.
It is unfortunate that most of the concerns one raised in 2002 are still valid.
1. Sinhalese usage in Computers is still extremely low.
2. There is no single set of fonts that everyone accepts to be used by publishers and web developers. (So the compatibility problems still exist)
3. There are so many incompatibilities between Apple and IBM. (Any publisher will confirm this.)
4. You cannot chat using Sinhalese.
5. You cannot even write e-mails in Sinhalese (unless it is with somebody you communicate regularly) because you do not know whether the receiver has the same set of fonts you have.
As far as I know, all what we have to do it to address these issues. (Not to mention additional issues like sending SMS in Sinhalese and doing OCR in Sinhalese.)
All I can say is we should not do the same mistakes we have done in 2002.
Answer to all this is a correct Character allocation Table for Sinhala.
Only I have done it. Not the Cintec or ICTA or any one.
(for new comers –character allocation table ISBN 955-98975-0-0 (Contents do have Copyright areas & Patent pending areas©2000-2006))
Take the unicode consortium. The whole unicode consortium is based on character allocation tables.
Sri Lanka registered SLSI 1134 in Unicode (under protest– only two groups posted the protests ie Srilanka Association of Printers and Donald Gaminitillake) which was incomplete set of sinhala characters.
No software maker will be able to develop a commen font simply because of all characters are not represented in the UNicode Sinhala or SLSI1134.
Now this group is talking about “elements”. Where are these elements for Sinhala in SLSI1134 or Unicode Sinhala? They also speaks about a “union” and in the same time they say NO to it.
Take China for an example. They have three sets of Character allocation tables.
Once the Character allocation table is given the rest belongs to the software developers.
In Sri Lanka like the dot LK which has a monopoly — the same group wanted a monopoly for Sinhala Language. That is why they registered a part of the characters and hid the balance in form of “elements’ or a “union”.
They never expected for me to come forward and point this out.
All started (somewhere in 1998) with Niranjan Meegammana wanted his Kandy set to be compatible with MAC. Then I found the problem is with the Character allocation table.(I was in Japan at this time)
Japan too had this problem. That is why they have JIS one JIS two etc. Anyway with some delay they incorporated all the Japanse characters into the Character allocation table with proper code points. Same thing happend with Koreans , Chinese.
The OS developers too adopted the system to accormodate any form of characters table.
In early days you will not be able to install MS DOS one byte english into a computer that had been design to use in Japan or Korea. Now these are not problems or nobody knows about it.
The INdic group got into the typewriter module and Ascii limitation of 256 — could not get out of this loop
Whoever took the initial steps in Sri Lanka had no knowledge of Typography or Typology or was not from the Printing and publishing Industry. The characters belong to this Industry.
They only knew a Sinhala typewriter and typewriter technology.
We have to give the full credit to Mr Wijesekera. He was the person who broke he Sinhala characters into parts and accormodate them into a limited space of a type writer. This was Mr Wijesekera’s Character allocation table for the type writer.
All the scripts other than the indic group in the unicode are full complete characters.
To solve this problem all we have to do is publish the full sinhala character allocation table with correct code points.— ISBN 955-98975-0-0—
Donad Gaminitillake
Colombo
Thanks HeWhoMustNotBeNamed (Voldermot also called by the same name in Harry Potter!!!) for the explaination. This takes the biscuit!!!!!
Two experts Dino and Gihan D, got in this mess because of their extreme greed for money, reputation and power and became the butt of the joke in front of the whole country (not to mention those kids in Mahawilachchiya here.)Both would have been respected figures in the country if they used their brains and realised what they can do and what they cannot do. (I still cannot understand why Dino stood against Mahawilchchiya though as there can’t be found any logic here to sabotage their future.)
Was this (UNICODE issue) the reason why Prof prevented Donald talking at CSSL some years back? But why did he sabotaged the kids’ presentation????? What is the logic?????
Under what logic people like VK (certified crooks) become the big shots at at ICTA under a president like MR? Whose Chinthanaya is this??????
Gihan D had a good track record when he was younger among Moratuwa undergrats and now such a respected figure is in shatters due to extreme greed.
Why do all people try to JUMP INTO a place like ICTA? This is a seriois project and only those who are capble of doing something should be there. Manju H seems lucky to be out of this mess it seems…..
When Manju is out and VK is even stornger, will he destroy Mahawilachchiya again????? Will MV become the primary target of the TERMINAOTR VK now???
At the end, will Lankans be able to use proper Sinhala in their PCs? How long Donald has to waste money on Internet writing down the same facts over and over? Will there be any little Donalds to take his ideas forward in future?
Harsula,
I will be very grateful if you do not misinterpret what I say.
[quote]
2) Why isn’t Unicode Sinhala (SLS1134:2004) support available in software released before 2004?
[unquote]
This is NOT the issue, as I have explained so many times. This is only a very poor misinterpretation of the real issue.
What I ask simply is even after more than 15 years of efforts in introducing Sinhala to computers why we are still not successful? (as Nepalis or Bengalis) Why we have so less number of Sinhala web sites? Why the Sinhala content in the net is negligible? Why cannot BBC has a Sinhala site, when they can have Tamil, Nepali and Bengali versions? Why I cannot use Sinhala in my computer, though I can use Tamil, Arabic and Nepali characters? Why we use so many different sets of fonts based on different character mappings? WHY WE STILL DO NOT HAVE A SINGLE STANDARD ACCEPTABLE BY ALL?
Please also remember Unicode Sinhala is NOT as same as SLS 1134. Unicode Sinhala was approved in 1998 and SLS 1134 came in 2004. There was a six year gap in between.
I am sure the long post above by HWMNBN will open your eyes.
BTW, thanks HWMNBN for the detailed explanation. It shows how the IT experts in this country behave.
As for your first question, the users do not mind the technical details. The users want a SOLUTION. Whether it is Unicode or JCcode or Harsulacode or Donaldcode is not important to them.
I met a person working at PC House He knew this CAt problem. I told him to write to this chat.
They back off saying cannot fight with the big people. “Loku minnusth ekka happenna ba”
This backward timidness is what we have to get over if we have to develop Sri Lanka.
Do not worry there are many Small ‘Donalds” around to take over but shy to come forward.
Quote
I still cannot understand why Dino stood against Mahawilchchiya
unquote
This was simple because a group of students not having even o/L was talking IT in front of a Professor. This is the digital development and top down situvation the country needs.
I was kept out because of the same reason. I am not one of them not from the uni of Sri Lanka. I put forward my paper on Character allocation Table. instead of putting my paper into the waste paper bin they should have taken it up. You have to hand the copyrights of the writings to the CSSL to publish any article in this seminar. wow — with that they could have my rights too.
They missed this chance. If they permitted to do my presentation and taken my copyrights into their hands I will not be strong as now.
Their intention is not to educate the village students in Sri Lanka even if we have the free education system. Please visit villagers (any) there are no teachers in schools. But these students will have to sit for public examinations when they reach the age.
There are some places the computers are still kept in original boxes.
Donald Gaminitillake
Colombo
Dharma,
I visit lirnasia site for sometime and contribute/share my views in threads which interests me and also the one’s which I can give my views based on our experience in the industry. I am not visiting this site to DEFEND or HARD SELL Sinhala Unicode standard to anyone but rather expressing my views based on reality. People who really wants to work in Sinhala with or without a standard work in Sinhala be it Word Processing, Publishing, Emailing whatever, even with inter operability issues without a standard even today.
I agree with you and perhaps some of the comments of post 5) where we have wasted such a lot of time on this matter without getting the required deliverables in place at right time. As I say always this thread shouldnt have exist if things been resovled and standardized and technical products released to the market long time ago.
The convincing role to major OS players should have been done even prior to 2002, which was the faliure perhaps of the respective teams, commitees who were invovled in this process. Now since we have miss the earlier buses there is big issue about the standardization as things have not progressed earlier.
All my views were to say that the Sinhala Unicode that’s been adopted now can be technically implemented. I also agree with you in order for this to become the standard everyone should start using it from Apple’s to IBM’s to Linux to what not with complete inter operability. If someone comes (say it’s Donald) with a technically superior product and pushes hard all out to end users which includes government, NGO’s, Printers/Publishers, Major OS vendors to other’s it can become the standard in the marketplace and the SLS1134 will be the standard just in books.
Finally, as per my knowledge lot of companies are releasing products now on Sinhala Unicode and eventually when all end users starts using it all over including publishers to others it will become the standard. Thereafter no one will be in a position to change it since it will be widely used and inter operated like English.
Harsha,
Yes, I agree with you.
However, you have not answered my earlier question. Have you seen anywhere it is said that Windows VISTA (or whatever the next ver) would support Sinhala?
I have never read anybody from Microsoft saying this and that is why I keep my fingers crossed.
Dear Harsha Purasinghe
We do have a problem in implementing Sinhala.
Unicode Sinhala lacks many characters. Where are they?
You just avoid this question
If they are in “elements” or inside a “union” can you give the locations registered in Unicode.
Tell me what hapens after the “‘Zero Width Joiner’ (ZWJ)” of a sequence
Under General Punctuation these joiners are listed.
Also explain how a OCR works using the present Sinhala Unicode example character “DU”
Unless the “DU is there physically in unicode sinhala or in the hidden union or elements OCR will not recognise the character “DU”
Get into the mechanism and write simple
Donald Gaminitillake
Colombo
I have done some web search on the topics under the discussion, and what’s out there do not look good for our poor Donald.
1. I personally do not think Donald or anyone has the single ownership to a character allocation table. It was there in many hodi pothas probably even before Donald was born. However, even in terms of computers Donald is NOT the first one to publish a character allocation table. One published in 1994 appears at http://userweb.pdn.ac.lk/~nimalr/sinhala/lreport.pdf
So there goes our Donald’s pipe dreams for a patent!
2. The site http://www.bhashaindia.com gives how thirteen Indian languages are represented in computers based on Unicode character tables. If you spend even ten minutes in this site you will realise, how they have successfully used Unicode to implement Indic language applications. Sinhala is not a very different language and Sinhala letters have one to one relationship with Devanagari. So if you can use Unicode Hindi in computers, there is no reason why you cannot use Unicode Sinhala.
If Donald continue to ask his question (like a broken gramophone!) let me say how to write ‘Du’. First check how they do it in Devanagari. Then substitute appropriate Sinhala characters. You will get your answer.
This was done by a person called Yannis Haralambous
Anyway my table contains more than this list with proper code points for Sinhala.
My copyrights do exsist in Sri Lanka and the pending patent.
There are more caontents than a table for my pending patent
also we dont have a IME for Sinhala
Indic IME 1 (v 5.0) is a common setup for Tamil, Kannada, Gujarati, Hindi scripts. It allows you to install the IMEs for the required scripts. It is an enhanced version of the previous Indic IME 1.
For an IME to run all the code points have to be defined.
Also have to develop several IMEs to use in differnt systems.
IME—Input Method Editor, with thousands of potential code point combinations
For “DU” we do not have registered CODE POINTS — DHARMA also we do not also have an IME
Before an IME we have to publsih all the code points that I have done.–character allocation table ISBN 955-98975-0-0 not by Yannis Haralambous
Donald Gaminitillake
Colombo
Donald,
[quote]
This was done by a person called Yannis Haralambous
Anyway my table contains more than this list with proper code points for Sinhala.
[unquote]
The issue is not the name/ethnicity of the person or the number of code points.
The point is, SOMBODY HAS ALREADY BEATEN YOU!
So how ethical it is for you to talk intellectual property rights and patents?
More Dharma
Quote from http://www.bhashaindia.com/Developers/MSTech/IndicSupport/indiclife.htm
….
These formatted glyphs, based on the underlying code points, are sent on for display …
text is stored as code points until it needs to be displayed…..
unquote
For SInhala we have not defined these code points. Only a limited number is registered. Rest are hidden.
This is the problem that I am addressing.
That is the issue over here.
Donald Gaminitillake
Colombo
Okay, I have found the list of new languages that will be supported in Windows VISTA.
Alsatian (France).
Amharic (Ethiopia). Unicode only.
Assamese (India). Unicode only.
Bashkir(Russia).
Corsican (France).
English (India)
English (Malaysia)
English (Singapore)
Greenlandic (Greenland)
Hausa (Nigeria)
Khmer (Cambodia). Unicode only.
K’iche (Guatemala).
Kinyarwanda (Rwanda).
Lao (Lao P.D.R.)
Lower Sorbian (Germany)
Mongolian (PRC)
Sinhala (Sri Lanka). Unicode only.
Spanish (United States)
Tajik (Tajikistan)
Tamazight(Algeria, Latin)
Tibetan (PRC)
Tibetan (Bhutan)
Turkmen (Turkmenistan)
Uighur (PRC)
Upper Sorbian (Germany)
Wolof (Senegal)
Yakut (Russia)
Yi (PRC)
Yoruba (Nigeria)
Can sombody explain, what they mean by Unicode only? Does it mean we can type in Sinhala but the menus, help, error messages etc., will not in Sinhala yet? (Even that is not bad for a start!)
Dharma,
As per what I know Microsoft wants to support Sinhala (Unicode – as per Sinhala Unicode not any other) in upcoming Windows Vista. I got a opportunity to meet one of the key regional MS figures where I even asked him about VISTA’s support of Sinhala which he too said they want to support it. Further, I have credible information that Bill Gates himself in one of his emails to regional people have mentioned to include Sinhala support in upcoming Vista.
So what’s the big deal? Will this happen?
Well again as per what I know in order for above to happen ICTA has to take some initiatives and update Microsoft accordingly. Basically MS and ICTA has to finalize the agreement related to Sinhala Language Interface Pack which is based on Sinhala Unicode. I am not sure how far this has gone and perhaps they must have finalized by now.
But if you dont see Sinhala on next Vista it will definitely due to above issue which I have mentioned.
By talking reality I might get marginalized and side lined perhaps in the industry but I rather keen on talking reality as per what I see. Like you, myself and hunderds and thousands of MS Windows users want to see this support in next version of Windows.
I hope the people concerned who are involved in respective decision making will do the needful before we miss another BUS, which will leave Linux as the only alternative to use full blown Sinhala Unicode on MS platform otherwise. And MS Windows users will have to rely on 3rd party products which supports Sinhala Unicode.
Donald,
I am sorry to dissapoint you as am not wasting any of my time answering your same repeated questions. You have been given clear explanations by Linux user group and many other on this subject and you have failed to understand how Unicode works as pointed out earlier. You can keep on pushing your standard which is on paper by the time either Sinhala Unicode or something else will emerge and will be used around in Sri lanka. As what’s important is technical implementation of these standard and people practically starts using them.
Harsha and others,
Donald repeats the same thing as nobody has answered or given a solution for the problem he is talking about. Silencing Donald doesnt mean the problem is solved. We all have to understand this.
THough everyone talks big here, we still cannot read a sinhala webpage without downloading fonts and cannot chat or email in Sinhala. So, asking Donald to shut up will not be the solution.
VKS
Dear HArsha
Linux have not answered the question or You
Where are the “elements ” or the “union” listed in the unicode
But the most interesting comment was “or something else will emerge and will be used around in Sri lanka.”
What is this someting else!!! Something better than unicode by the same group!!!!
Again hidden concepts and agenda’s
Like Crossed said “Donald repeats the same thing as nobody has answered”
Dhrmma , “SOMBODY HAS ALREADY BEATEN YOU!”
No one has beaten me.
No code points and Please visit the patent office and read my patent pending 13120.
If you think Yannis Haralambous CAT is better than mine why not use it.
Yet the present unicode is incorrect and incomplete set of Sinhala characters
Technically I wrote to Yannis Haralambous for his comments.
Donald Gaminitillake
Colombo
Harsha is a businessman and has a right to defend his product. But why don’t anyone who wasted money on unicode (which has proved a total farce) silent on this issue. There is a brochure published by ICTA (Donald must see this) saying that they have found the answer for using Sinhala with PCs. But to see even ICTA website’s Sinhala version, we need to download the font. Why don’t we have an answer yet for the problem?
Dear Too crossed
I have given the answer so many times like a broken gramophone!(as Dharma say)
Publish the total character Allocation Table for Sinhala which I have done.
Give me the credentails to make the software similar to an IME to run on PC, Apple and linux
Keyin method could be Wijesekera or JC;s Romanized Sinhala
“brochure published by ICTA” If you can pls post me one
Donald Gaminitillake
Colombo
[quote]
Further, I have credible information that Bill Gates himself in one of his emails to regional people have mentioned to include Sinhala support in upcoming Vista.
[unquote]
Lolz! I think Harsha thinks the others in this forum are konde bedapu cheenas. Bill Gates might not even know what Sinhala is. Moreover, he is now not there in Microsoft officially, so he cannot say anything regarding Sinhala in Windows Vista.
This kind of poor publicity is good for political stages. (Like “Do not worry, we will achieve peace within one year”) Unfortunately for Harsha, the users of this forum are not as stupid as the average voters in the country.
These podiens like Harsha go long distances to save their masters Dino and Gihan D.
Cisco2000 you are correct I wonder when Bill Gates spoke to Visa developers!!
By the way I am copying a mail received by me from Microsoft India 15/November /2003
Donald Gaminitillake
Colombo
———–
Dear Mr. Gaminitillake,
Thank you for your confidence in us.
The fact of the matter is that what you are proposing and what Microsoft is following, actually supplement each other!
Your main request of supporting the QWERTY keyboard layout for Sinhala and Tamil is not at all an unreasonable request. Users the world over have graduated from traditional typesetting systems, and there are strong legacy reasons for which it is sometimes prudent to support multiple keyboard layouts for any language.
What Mr. Paul Nelson has been trying to explain, sir, is that this is possible, without compromising on the Unicode support!
As I had explained on the phone, the font, the keylayout and the char storage are all different entities. One can provide support for multiple keyboards, and still carry out character storage in Unicode, which Microsoft strongly embraces for all languages.
For your information, we already provide IME kind implementation for Hindi, Tamil, Gujarati and Kannada through ISVs in India. This kind of an IME is similar to a Japanese one, though of course much simpler.
To take this forward, I suggest that I work with you, to ensure that your QWERTY keyboard “is” supported in our forthcoming Sinhala support.
To do this, I request you sir, to please send me:
· The QWERTY keyboard layout for Sinhala, as you would want it to be implemented
· The QWERTY keyboard layout for Tamil, as you would want it to be implemented
Based upon your layout, I shall engage a partner into this exercise, and work to ensure that when we release Sinhala support for Windows, the layout is supported.
I hope this helps.
Thanks & I look forward to hearing from you.
Raveesh Gupta
Program Manager: Localization
Microsoft Corporation (India) Pvt. Ltd.
The Great Eastern Centre
New Delhi 110019
India
Ph: 91 112 6294600, ext 131
Cell: 9811202123
Cisco2900,
I am not particularly keen to answer people who hide their identity and accusing without knowing the facts. I think you have not perhaps followed clearly from the begining of this long thread which started sometime ago, which I repetadly said that am talking the reality as per what I see nothing else.
It seems like you people who comes to this site with a covered PAPER BAG (as prof. highlighted earlier) accuse people without knowing who they are and whom they work for. People in the industry knows inlcuding people like Donald who we are etc, as we join hands with Donald in other community projects. Even on the sinhala matter we are having healthy discussions always not sidelining him, as I beleive his critics has helped to fast track Sinhala related work on ICT.
As I said I am not interested to pass any information to a forum without knowing the credibility of this source of Bill Gates comment on Sinhala. As you know there is a MS Country office and a country manager whom you can talk to and find out the credibility of this informaiton as people at MS-LK office are the one’s who broke this news to others and also his note has triggered to work things in a fast phase at MS. Please note MS has a country office in Sri Lanka and it’s really pity to underestimate company’s chairman not knowing about one of it’s country office’s, and some of the activities happening at least breifly. SO SUGGEST YOU GIVE A CALL TO MS OFFICE IN LK AND FIND THE CREDIBILITY OF MY COMMENT! as I dont want to do provide any false information to this forum. Further I wrote this comment to answer Dharma & others who are keen to discuss real issues and not to people who hide their identities.
I think this forum would have been much productive if people stick to the topic and talk real issues rather accusing people. And also it would be great if this site can be restricted only to people who come with true identities and discuss important policy issues in a productive manner rather than using this for personal attacks and showcasing their personal attitudes to others.
Crossed,
I always suggest Donald to implement his work technically and come up with working products rather than keep on repeating the same “incomplete” story. If he comes up with a superior product and pushes to people concerned it will become the standard as I mentioned earlier. So am not talking about silencing donald. His voice is very important to fast track development work related to Sinhala.
Mr dear Harsha,
Bill Gates ceased to undertake any official responsibility of Microsoft since Jan 2000, when Steve Ballmer replaced him as the CEO.
In addition, he even resigned from the post of Chairman last month to work full time on Bill and Melinda Gates foundation. He is no more there at Microsoft.
I am surprised that and IT guru like yourself is not updated about these happening.
Microsoft is a commercial establishment, and like any profit oriented institution, it usually boasts to impress its customers. Anybody who knows the market, takes these with a pinch of salt.
But people like you, think Bill Gates is so concerned about the development of Sinhala language and Sinhala content, he waits behind his developers with a whip in his hand forcing to offer solutions in Sinhala! Lol!
Microsoft knows the market for Sinhala is negligible. Only 15 million or so speaks Sinhala. Out of this 15 million only one or two out of hundred actually use a computer. Then there will be very few who actually purchases licensed copies. So why should Microsoft worry so much to introduce Sinhala to Vista?
Finally, it does not matter from where the information comes, it only matters whether the information is true or not.
Harsha,
You still don’t answer questions. We do not have a proper solution to use Sinhala in computers!!!!!! There is no point just being with Donald for community projects. We all do community projects with spare time and spare money. But we need to do our JOB properly. In your case, you haven’t done your job properly. So, get together with Donald to do the JOB properly.
Do not curse the men with paper bags either. A dirty guy like VK cannot do what he wants now simply because of the men with paper bags. This is e-democracy what we wanted 20 years back. So, men with paper bag here don’t mean any harm to you. This is only to open your eyes. We know you are a good man with a good heart and you are definitely not in VK’s gang. You are a decent creative guy. When you put your Helawadana right, we will stop making comments here.
Not all men have the liberty and luck to come up without paper bags like Donald. He says he is the king. But others are just citizens and if they remove paperbags, VK will take them to the guillatine. Didn’t he take all powerful Manju to guillatine???
So, prof and Harsha, take men in paper bags as angels, not as devils. We all want to see something productive happening in ICT. We had enough of VK marfia. Men will remove their paper bags when this mafia is beaten. You and us are friends, not enimies.
Cisco2900, Bill Gates will step down only in 2008. In the meantime he is very much involved with Vista. Why don’t you google Bill gates and Vista and see the results. To man with paper bag and others who have just dropped by, please read the earlier threads before jumping in and making comments.
Harsha and others have explained a few times already that currently Sinhala language support is not available natively with Windows and Mac, although it is available for Linux. When Windows Vista comes out it will support Sinhala Unicode and most of the current problems people face will be resolved.
Now you can continue to jump up and down asking why there is no support for Sinhala NOW?! I think various versions of history has been provided on this thread as to why Sinhala Unicode standard wasn’t developed sooner, please feel free to pick any of the many explanations that suit you.
Even Microsoft understood my proposal but the people in the “ARUMA PUDUMA RATA” just ignore the importance of my project.
Only I have the ‘Solution” for Sinhala IT.
I have proved this byond any reasonable doubt. Why cant the software develpoers and young engineers look both sides of the coin and give an impartial comment.
Even Highly protected application Quark Xpress Tag data and data saved on EPS text format can be edited using any simple Editor. These edited data could be reexported into Quark and produce high quality work.
All this is possible because of proper character allocation tables.
I hope gypsies group will be able to read this contents and create a song “WHY IT WHY”
Donald Gaminitillake
Colombo
Well well well Diwakar,
Do you think all Sri Lankans can buy Windows Vista as soon as it is avaible. Most of the country’s PCs run on Windows 98/Me yet, let alone XP. Are you asking us to buy Vista???? What about the government money Gihan D and Dino devoured??? This is not a small issue as you think What Donald is talking on is not just about Sinhala. He and the rest of the country has grave concerns over public funds wasted by the two “experts.”
Donald,
[quote]
Even Microsoft understood my proposal but the people in the “ARUMA PUDUMA RATA” just ignore the importance of my project.
[unquote]
I have a simple answer why nobody is interested in your so-called ‘solution’.
While we all understand the issues, you have failed to convince anyone (at least myself, a user who have been sympathetic towards you than most of the others here) that you have a solution, let alone THE solution.
[quote]
I have proved this byond any reasonable doubt.
[unquote]
May be you have proved this to yourself. But the things do not work that way. If you want that to be implemented, you have to convince OTHERS too. (in addition to yourself) This is where you have failed.
So many times I have explained what you should do if you want to convince others. If you do not listen to it, even for another ten, twenty or hundred years you do nothing else than ranting, ranting and ranting.
Cisco2900,
I am glad that Divakar answered you and seems like you are not updated. Bill Gates will step down in mid 2008 and that too he’s trying to figure out how hard it is since he loves the role he plays as Cheif Software Architect though Ray Ozzie gonna take over. (Suggest you either do google or read Fortune-July-Bill Gates Reboot – above are based on these mag interviews/sources).
It’s pity the way you are presenting my comment which is based on a CREDIBLE source which is none other than Microsoft itself. So answer whether it’s true or not please talk to MS-LK office which is at WTC and get yourself satisfied as I dont want to publish FALSE information to other members of this forum who are very focused on the topic. As you can see in my reply I have even made that post to answer Dharma, as I thought it would be better to share something I got to know which is credible as Dharama was quite keen to know the Vista’s support.
Man with a Paper Bag,
In short all I can say is if you’ll can remove your paper bags and voice out confidently on the topic and real policy issues/ICT issues with real identities like Donald, Dharma, JC and all other’s does with true identities, NO MAFIA’S will prevail in this country. People shouldnt be afraid to talk about facts and realities.
Quote
I have a simple answer why nobody is interested in your so-called ’solution’.
Unquote
Then why are they worry about ny copyrights and pending patent?
Dharma is the very person who says “why one have to downlodd fonts everytime”
“IT” in SriLanka is not only text development but OCR voice GPS SMS and many more
All depend on a proper “Character Allocation Table” with proper code points.
Try to understand why Latin script works. All characters do have individual code points
Sinhala registered in Unicode or SLSI1134 are just few characters. IF you can understand this differance let me know.
Best is for you to meet me.
Donald Gaminitillake
Colombo
Donald,
[quote]
Dharma is the very person who says “why one have to downlodd fonts everytime”
[unquote]
At least Unicode people have some solution, though it is cumbersome.
Do you even have that?
You know only to critisise others and rant. You have not even written a single line of code.
Before trying to find others faults, a wise man will think whether he can offer an alternative.
There is a saying in Sinhala that a naked man should not find faults with the dirt in the clothes others wearing.
Why you avoid the simple question
The Sinhala unicode not having the total number of Sinhala characters
Only a limited number is registered in this table
That is what I am addressing.
I have asked the unicode registered locations for the “elements” or ‘union” nobody cares to answer this.
This is Unicode SInhala is incomplete and incorrect.
Quote
he can offer an alternative.
unquote
Yes I have published the alternate complete Character Allocation Table for Sinhala.
When this is highlited all find other answers. Avoiding the truth
Quote
trying to find others faults
Unquote
This also confirms the Unicode Sinhala do have faults and I am the only person who is making a voice to protect the Language Sinhala
Donald Gaminitillake
Colombo
Testing…
Thank you and pardon me.
Hi Donald,
Donald Gaminitillake on Jul 14th, 2006 at 1:26 pm wrote:
——————————————————————
Why you avoid the simple question
The Sinhala unicode not having the total number of Sinhala characters
Only a limited number is registered in this table
That is what I am addressing.
——————————————————————
Those familiar with Unicode Sinhala (SLS1134:2004) realise what you are saying is untrue. You have been unable to prove your allegation after almost 2 years of your propaganda.
Let’s do a little experiment using your favourite word “Dumriya” [http://sinhala.sourceforge.net/archive/akuru.org/0020.html]. I’ll write the word in Unicode, with one letter on each row and the corresponding Unicode codepoints alongside in parentheses “()”.
Unicode Vs Donaldcode: Round 1
==============================
Unicode:
දු ()
ම් ()
රි ()
ය ()
Donaldcode:
?
Now it’s your turn do the same using Donaldcode. If you can’t, then Donaldcode is incomplete and a failure.
Regards,
Harshula
Reposting after removing the angle brackets …
Hi Donald,
Donald Gaminitillake on Jul 14th, 2006 at 1:26 pm wrote:
——————————————————————
Why you avoid the simple question
The Sinhala unicode not having the total number of Sinhala characters
Only a limited number is registered in this table
That is what I am addressing.
——————————————————————
Those familiar with Unicode Sinhala (SLS1134:2004) realise what you are saying is untrue. You have been unable to prove your allegation after almost 2 years of your propaganda.
Let’s do a little experiment using your favourite word “Dumriya” [http://sinhala.sourceforge.net/archive/akuru.org/0020.html]. I’ll write the word in Unicode, with one letter on each row and the corresponding Unicode codepoints alongside in parentheses “()”.
Unicode Vs Donaldcode: Round 1
==============================
Unicode:
දු (0daf,0dd4)
ම් (0db8,0dca)
රි (0dbb,0dd2)
ය (0dba)
Donaldcode:
?
Now it’s your turn do the same using Donaldcode. If you can’t, then Donaldcode is incomplete and a failure.
Regards,
Harshula
Okay!
I found what was wrong in my posts not being allowed. I did not enter the email address!
My apologies to Dr. Divakar and professor Samarajiva for complaining. Thank you Divakar for letting me know about the new requirement of the email address. (Hmmm… that means all those pseudonym posts actually come from real addresses? or could the email address be false too?)
If you both as administrators object to my further using this system, please let me know. I will then thank you for letting me use your resources thus far and write my comments at:
http://groups.google.com/group/SinhalaUserGroup
Under the heading:
A Standard Sinhala for Computers and the Internet
—————————————————–
Now to the business of replying:
SOMEONE SAID THAT THAI UNICODE PAGE IS BEHIND SINHALA:
———————————————————————-
That is true. But Thais are defying Unicode and use Latin code points in their national standard. The reason they give is simple. They want Thai message to pass through all Internet roadblocks. RFC822 and its successors permit only Latin-1.
I say we can do the same but in a less confrontational way because we are a little island. Just use romanized Sinhala and use fonts to show romanized Sinhala in Sinhala script. It works. We need some programming to make it work in older computers too. For some people, money matters and some others cannot afford new computers.
So, those who are ardent supporters of Unicode use Unicode. And those who want full IT capabilities for the script use romanized Sinhala. It can support Wijesekera keyboard too. And there can be transliteration between the two.
Microsoft is researching with us why the Sinhala font is not showing correctly in certain Windows XP machines. This test font is easy to use. It is unfortunate that none from Sri Lanka has asked for a copy. That seems to prove my point that the people who discuss here actually are not interested in learning the facts but belong to die-hard groups (or elites who actually don’t care about Sinhala, Professor Samarajiva, not you, you are the facilitator — I know now).
I invite your kind indulgence on an advice an ancient person who died 2550 years back gave:
He was talking about Ekamsovaaða (one-sided belief):
The Lord Buddha advised his disciples to be flexible and not to be angry if someone gives a new or different kind of understanding to his teachings…
Thank you.
ANOTHER QUESTION WAS IF SINHALA UNICODE BLOCK IS INCOMPLETE:
——————————————————————————-
I think it is complete in the sense it represents Sinhala script.
However, it has a ligature named as a regular character: TAALUJA SANYUGA NAASIKYAYA (I might be spelling this wrong). This unfortunate decision makes Pali and Sanskrit incompatible with Sinhala Unicode. I tested a conversion and it failed and there is ambiguity. The reason is the older languages (and Sinhala grammar books) correctly treat this letter as the combination of HAL JAYANNA and TAALUJA NASIKYAYA. The good test for a ligature (bænði akura) is that it does not take the hal kiriima. TAALUJA SANYUGA NAASIKYAYA does not take a hal kiriima.
Dear JC, I am glad you are able to post. We have never had any problems about your posting to this forum and you may continue to do so. I also hope you are able to appreciate that all of us at LIRNEasia are quite busy and when we have to respond to repeated email about administrative issues it takes time away from other time critical work. Thanks for your cooperation.
Your Unicode
දු (0daf,0dd4) = Sinhala letter alpapraana dayanna,Sinhala vowel sign ketti paa-pilla
This is not “DU” just two code points defined above
IF you read my chart “DU” is represented by 3708 (tentative allocation)
character allocation table ISBN 955-98975-0-0 (Contents do have Copyright areas & Patent pending areas©2000-2006))
If you need my chart let me have your postal address to send one
Likewsie the rest follows.
If I can paste jpg files into this I would show the public the correct “DU”
You were the people who said there are “elements” and “unions” after the joiner but failed to give the code points after the joiner.
You just talking about typewriter techniques
even Microsoft confirm my system
Quote
“Users the world over have graduated from traditional typesetting systems, ”
unquote
We are using computers not typewriters
Donald Gaminitillake
Colombo
Hi Donald,
Unicode Vs Donaldcode: Round 1 (cont.)
==============================
Unicode:
දු (0daf,0dd4)
ම් (0db8,0dca)
රි (0dbb,0dd2)
ය (0dba)
Donaldcode:
(3708)
?
?
?
Donald, you still have 3 more letters to fill. Hurry up. Or are those three so “tentative” that they are still unallocated in Donaldcode?
Donald Gaminitillake on Jul 14th, 2006 at 11:05 pm wrote:
————————————————————
දු (0daf,0dd4) = Sinhala letter alpapraana dayanna,Sinhala vowel sign ketti paa-pilla
This is not “DU” just two code points defined above
————————————————————
It is “DU”. Modern technology, i.e. computers, are *smart* enough to understand that if a codepoint for paa-pilla comes after a ‘consonant’, then that paa-pilla is ‘applied’ to the ‘consonant’. In this case the paa-pilla is applied to the dayanna. Therefore it is well and truly “DU”.
I know you might find this difficult to grasp, but this is modern technology, not like your ancient printing press technology. Donaldcode is like going 500 years back in time when the printing press was *new* technology in Europe.
What Donald doesn’t understand is that CJK (Chinese, Japanese, Korean) solutions are *not* suitable for South Asian scripts. CJK are logographic scripts (http://www.omniglot.com/writing/logographic.htm) and Sinhala is syllabic (http://www.omniglot.com/writing/syllabic.htm). Because they are completely different writing systems, the best solution for CJK is *not* the best solution for Sinhala.
The reason why Donald doesn’t understand this is because he is neither a linguist nor a computer scientist, but he tries to be an ‘expert’ in an area where he is a ‘novice’.
Regards,
Harshula
This post probably is hilarious. But I’d like to join in.
This example discussed is about,
ðumriya
Well, those who are human here, that is those gradually acquired knowledge as they grew up might guess what that funny character ð is.
Now I invite you to go back and read the block of text these two gentlemen are arguing about:
දු
Oops! Unicode Sinhala broke my post half way. let me try again.
දු
ම්
රි
ය
If the above gets in, I write the balance
Okay.
Now looking tat each of the Sinhala line, I see the characters decomposed into consonants and vowels. But when I typed them in, they looked complete. Isn’t this a problem, good folks?
Now see romanized Sinhala:
ðu
m
ri
ya
That is decomposed too. Now, if you have the font, it is supposed to show the combined Sinhala characters.
The problem is that the underlying font renderer of this web application we use here is not properly written. There are lot of kinks in the system which will take time to get fixed
UNLESS, yes, unless WE write the drivers. We can turn tables on M$ and sell them back to them. Yes. I truly believe in this. This could be a Sri Lankan expertise to the world like the Indians are the call center for the world.
Donaldcode:
(3708)
?
?
?
Donald, you still have 3 more letters to fill. Hurry up. Or are those three so “tentative” that they are still unallocated in Donaldcode?
I told you to either send your address to psot the table or go to National archives and read the table
Du =3708
Mm=4501
Ri= 4806
Ya= 4702
If you do not give a proper absolute allocation Table there will no OCR , No Voice to text , etc etc
Even microsoft understand my system. Only you guys are thinking of a type writer
IT is not typying
Quote
CJK is *not* the best solution for Sinhala.
Unquote
Microsoft say see 24 quote
The fact of the matter is that what you are proposing and what Microsoft is following, actually supplement each other!
quote
This kind of an IME is similar to a Japanese one, though of course much simpler.
unquote
Read No 24 —
CJK system had solved all the IT problems
I know my Sinhala I know my CJK system I can derive the best for Sinhala
Donald Gaminitillake
Colombo
Todays Dailynews (15/7/2006) page25
As by IC- Cedit Number 3986CE
Development of a web partal for District Secretariats and Divisonal Secretriats — IFB: ICTA /SER/55
When we cannot send an Email in sinhala across any platform, No web site are availble across any platform for sinhala. Our Cinstitution and Official language dept say we have to do work in SInhala & Tamil
How came any one develop a web site and do content development in Sinhala???
This Ad is by the MD/CEO of ICTA
I think both persons should seek medical assistance at the Angoda mental hospital.
Please do not waste public funds on projects that are good but impossible to do in Sri Lanka in Sinhala simply because we do not have a proper code points for Sinhala language across all platforms
Unless you solve the langausge issue
Donald Gaminitillake
Colombo
Two corrections for 47
WEB PARTAL = read as “WEB Portal”
came = “come”
Donald
Intersting site for unicode
http://www.decodeunicode.org/w3.php?ucHex=0D80
If you take the pointer you will be able to get the code point and the definition
My question is where are the rest of the sinhala characters?
This is why I say Unicode Sinhala is incomplete and incorrect
Donald Gaminitillake
Colombo
Hi JC,
If you want to participate in the experiment, please follow the rules, namely:
——————————————————————–
… with one letter on each row and the corresponding … codepoints alongside in parentheses “()”.
——————————————————————–
“one letter” = equivalent of one Sinhala letter. So, in your case it might be two Latin letters.
Don’t forget to include the *codepoints*.
If you don’t provide a solution in this format, I’ll continue the experiment with only Unicode & Donaldcode. If you do provide a solution in the correct format then I’ll continue the experiment with Unicode, Donaldcode & JCcode.
Regards,
Harshula
Thank you for accepting the Donald’s code
Why dont we add few new pages into unicode with Donalds Code
Do not worry about the exsisting set. It will be there too. There are similar langauges in unicode
Arabic , Hangul all have the part story and full characters. Hangul still use the parts to input but full characters are used by the IME for dispaly and text.
Donald Gaminitillake
Colombo
JC your romanized Sinhala can be use as a input method with wijesekera in my derived system.
Wheter one uses Wijesekera or JC or any system that a future person make — all will acccess the same code point and the character remains the same.
In my system even voice could be used as a input method.For that best person is Savinda who has done lots of research in Sinhala voice area.
Sombody will have to start Sinhala Elocution Classes for corrrect pronounciations. See by accepting the exsistance of a Donalds code new job opprtunities are imerging. In Srilanka we donot have any elocution classes for Sinhala. But for English it is available.
Donald Gaminitillake
Colombo
Hi Donald,
Most importantly, thank you for accepting Unicode Sinhala (SLS1134:2004) is complete and realising that දු (0daf,0dd4) is actually “DU”.
Regards,
Harshula
With (0daf,0dd4) you have to sepcify with the jjoiner to go and locate the “DU” from where “DU” is.
The elemnts or union is not specified in the SLSI1134 or unicode consortium
That is why public will have to download each and every font that one use to create the text.
Also text data will not flow across all platform.
Only sinhala characters in unicode SLSI1134 Sinhala (what ever you call) is listed below.
http://www.decodeunicode.org/w3.php?ucHex=0D80
I quote this site because any one can go into the URL and see all the unicode registered characters in jpg form rather than on pdf format on unicode consortium. Compare with the facts that we are discusiing in this site
Therefore I always stand by the fact the Unicode Sinhala or SLSI134 is incorrect and incomplete standard. Once registered in unicode one cannot change the incorrect page but when the National Standard is changed they will include the new National standard in Unicode Consortium.
My system is also accepted as a solution by micorsoft. (see 24)
Also with my system many software developers will have a chance of developing IT compatible components for Sinhala later move their knowledge into other indic langauges
With my system Sri Lanka would be a HUb for indic langauges.(again job opportunities)
With every step I take forward new job opportunities will be automatically created. The young dynamic software engineers will be able to strike Silver.
Before the Humpty Dumpty face the great fall —
” All the king’s horses and all the king’s men
Couldn’t put Humpty together again.”
The time is ripe for ICTA to hand me this subject and the project for the betterment of the public in Sri Lanka.
Donald Gaminitillake
Colombo
Harshula!
I am honored, Sir! Yes. truly I am. I wanted to ask for this and you, as if by intuition asked me to get in the ring, though I’d rather not fight. I want everyone to be compassionate and try to understand each other’s point. We are in one camp. The “we want the best for Lanka” camp.
I can’t erase the grin on my face. It’s like I just got admission to Colombo university (which at that time said I was too old) I think this other thing also played a part: I Just returned from the Thai Wat and may I say, namoo buððhaaya?
Ok. You want the code points. They are at:
Basic Latin (Codepoints 0 thru 127):
http://www.unicode.org/charts/PDF/U0000.pdf
Latin-1 Supplemant (Code points 128 thru 255):
http://www.unicode.org/charts/PDF/U0080.pdf
That is, romanized Sinhala which includes COMPLETE pure Sinhala, COMPLETE Pali and COMPLETE Sanskrit is in the FIRST 256 code points of the Unicode!
This is also known as ISO-8859-1. That is the set of characters honored by RFC822 and its successors. (Which is why Thais did not budge from it). It is also the default characacter set of web pages and MIME messages. Further, it is the set of code points used by the who’s who of the developed nations. (they made it. We just hopped in. Sorry white man. BTW, the white man looking over my shoulders is laughing)
I do not need to enumerate the code points of the English alphabet. They are in the Basic Latin which is ASCII that us old dogs call. Romanized Sinhala uses only the lower case letters (except for sanyakas), i.e. those code points falling between,
a = 61 thru z = 122 in the alphabetical order like a=61, b=62, … z=122.
f=102 is for unpadhmaaniiya — a rare Sanskrit letter also used by the Sinhalese for transcribing from English. As this is already in roman, it is a mute point. But if you insist, go ahead and use it in Sinhala (e.g. fonseekaa mahaþaa). The pronunciation is close and Sinhalese anyway pronounce f as upadhmaaniiya than the English way.
q=113 is used for Sanskrit jihvaamuuliiya, another rare Sanskrit letter. It has no use in Sinhala.
We do not use w=119 like Italian and many other Indo-European languages.
Then we use ten chracaters from the Latin-1 supplement:
The items such as AltGr z tells how these characters are typed in US-International keyborad. This is the keyboard I use everyday. There is an equivalent for it in Linux (Reda Hat): use U.S. English w/deadkeys layout for your keyboard model. On macintosh use the Internatonal keyboard.
æ=230 Ash AltGr z Directly borrowed from Old English / Icelandic
þ=254 Thorn AltGr t Directly borrowed from Old English / Icelandic
ð=240 Edh AltGr d Directly borrowed from Old English / Icelandic
ñ=241 Enye AltGr n adapted from Spanish / same as in Tatar
ç=231 c with cedilla AltGr , Adapted from French c with hook
µ=181 Mu AltGr m Greek — adapted for Muurðhaja na
ø=248 Oe AltGr l Danish — adapted for Muurðhaja la
ü=252 iruyanna AltGr y Sanskrit only
ö=246 iluyanna AltGr o Sanskrit only
ä=228 visargaya AltGr q Sanskrit only
Then for Sinhala sanyaka akuru we use the following shifted letters:
G=71 Sanyaka gayanna
D=68 Sanyaka dayanna
Ð=208 Sanyaka ðayanna
B=66 Amba bayanna
What is this about an experiment? Is it the train expriment? Like,
Unicode:
දු (0daf,0dd4)
ම් (0db8,0dca)
රි (0dbb,0dd2)
ය (0dba)
romanized Sinhala:
ð (240)
u (117)
m (109)
r (114)
i (105)
y (121)
a (97)
Harshula, this is the last time I am playig this game. romanized Sinhala is just that. All code points are known. We write just like the Western Europeans do. The complete alphabet is at:
http://www.Sinhalaheritage.com
None of you good gentlemen or ladies have written to me asking for the font. I sent it to the admins and Mr. Gaminitilleka. They are probabaly too busy. If you want to know what I am doing why don’t you write to me and ask for the font? Either you are using your boss’s computer and therefore you cannot test a font or you are entrenched on one sdie in an imaginary battle do not want to peacefully think and discuss.
Here is my email address, everybody:
osp@LANandWAN.com (you can use jc@… too but ‘osp’ will help).
I know you have good backbones. So write to me. I spent too much time here just to play this silly game. All the techies should know what is meant by Latin-1. If you want to know how Sinhala sounds are mapped to Latin-1, go to this web site and check out the alphabet (by the way, it is the most complete Sinhala alphabet anywhere on the net):
http://www.Sinhalaheritage.com
There’s nothing more to it or less to it. We have a proof-of-concept font that shows Latin-1 in Sinhala. No other language in the world does that. So, in my opinion, we can have the Unicode block, but romanized Sinhala is superior because it will always show in romanized, readable form, not a row of question marks. If your system supports OpenType and /or Uniscribe you’d be able to read and edit in Sinhala script too.
Currently Microsoft is researching why we cannot get the ligatures to form in certain Windows XP machines. I’ll report the results back so the people know. This is about the country, its money and its future.
There’s no experimant here. Just facts to digest, if you have the time and inclination. I can’t spend time here like Mr. Gamintilleka and everyone else that debates here. Nobody pays me a salary for it and I am a poor man. (And the admins also rightly said I write too much. It’s a waste of their reources too if we just argue for the sake of an empty win than investigating each point. LOGIC! LOGIC!!).
ABOUT THE ROMANIZED SINHALA FONT:
I think I said this many moons back. Anyway, the font is a simple idea. Instead of the Latin letter ‘a’ we show the ‘ayanna’, instead of Latin ‘k’ we show the ‘hal kayanna’. This is just like the Singlish program that was on the Internet. It is still there probabaly, but I could not track down its developer. One of the persons who wrote about it was not too helpful either.
Any way, for the Train competition, you write:
ðumriya
by simply typing like you’d usually type English.
If you have that Sinhala font, you can use it instead of, say Arial. Now comes the most intriguing part that nobody was able to give an answer to:
If you are typing inside Notepad (with the Sinhala font) inside a fairly new Windows XP professional computer, perhaps that also has Service Pack 2 update in it and was an PEM install, the words will form as follows:
[ðu],m,[ri],[ya]
Perfect! Otherwise, the svara and vyaçjana stand by themseles with the word decomposed — as good as typing with Arial.
We have been in communication with Microsoft regarding this. Their Typography team gave a sort of a backhanded answer to my question why the script is well formed inside Notepad and not Word 2007 and every version earlier. (At this time, I was not aware that some XP PCs do not show properly formed Sinhala inside their Notepads).
The answer was that it might be because of a business decision by MS that the Simple Script ligatures are not supported in Word. They think that higher grade appa would support it (Publisher?).
Now we are having the general support group of MS researching why some computers show the ligatures and not others.
I know the partial answer, two pronged:
1. Microsoft Windows is the giant hairball it is that nobody really knows it.
2. It depends on whether Uniscribe or OpenType support is called by the application. Obviously, Notepad in those machines get that support.
I know we do not have low level programmers (C language) in Sri Lanka that are willing to read Unicode and write a Uniscrtibe like font driver. If we can do it, the country saves a ton. (Plus we can sell to India too). It might be cheaper to get it done in a country like Thailand or that other Souht Asian country that someone said has found their own IT solution. If we can find willing programmers we should try it before the president takes out the check book to buy Windows Vista. I can give all the leads, but Dharma does not want me to be paid. (David Trotter here is good for it too, if you prefer. He is my sub).
Dear JC
You will have to do some planing for my system too.
You have the list already. If I have missed any you can add them too
After allocating if you find the same sound value even for two or three or six or more still can accormodate without any problem. Only thing you have to isolate & identify them for me.
Also if you find interesting combinations which help the users indicate them
All this work is based on paper and pencil.
In my system every user will be able to expand this area independently according to his or her subject. IF they are intersted can forward this data for future upgrades under a index system.
With my system quality of Sinhala lit will improve.(another advantage)
== Without changing fonts.==
In my propsed system you can use the Latin script , Greek alphabet,(another advantage)
You can mix SInhala and any language using the Latin script.(another advantage)
and many more advantages!!!!
Donald Gaminitillake
Colombo
Donald,
The advantages you list are impressive indeed.
As for sounds that you are talking about, Sinhala has only one set. This is because our vowels are clear mono tones not diphthongs and because the sounds are already PHONETICALLY DEFINED by the alphabet and categorized and arranged. There’s nothing for you and me to say anything about the Sinhala sound set. The grammar books have already taken care of it. We are fixed within one set of sounds. I identified these in modern terms and allocated Latin letters and digraphs to them. This is romanized Sinhala. It supersedes the existing romanized Pali and all Sanskrit transliteration schemes which are unweildy to use and unnatural to look at.
Compare English: It is a sister Indo-European language. But has a shifting sound set which we understand as regional accents.
The alphabet is listed below. The sounds are also indicated. As HTML collapses the extra spaces, it may be difficult to read. The pipe characters I included to counter that might have even aggravated it. Copy and paste it somewhere and fix the format to read it better.
ROMANIZED CLASSIC SINHALA ALPHABET
(includes Pali and Sanskrit)
———————————————-
a | aa | æ | ææ | i | ii | u | uu (æ and ææ only in Sinhala)
ü | üü | ö | öö (Sanskrit only)
e | ee | ai | o | oo | au (ai and au Only in Sanskrit)
á | í | ú | ó nasal vowels — The acute accent indicates nasalized vowel
CONSONANTS
k | kh | g | gh | ñ | G Velar consonants
c | ch | j | jh | ç Postalveolar consonants
t | th | d | dh | µ | D Alveolar consonants
þ | þh | ð | ðh | n | Ð Dental consonants
p | ph | b | bh | m | B Bilabial consonants
y | r | l | v approximant consonants
z | x | s | h fricatives (þaaluja sayanna, muurdhja sayanna, ðanþaja sayanna, hayanna)
ø palatal lateral approximant (muurdhaja layanna — dark el)
ä voiceless uvular fricative (visarjaniiya)
q voiceless velar fricative (jihvaamuuliya) — allophone of visarjaniiya
f voiceless bilabial fricative (upadhmaaniiya) — allophone of visarjaniiya
JC thanks butlater shall come back to you on this
Waiting for a reply from ICTA
We are having FAGAT this august. A printing exhibition and a conferance. Character issue will be a topic.
I will be a bit busy with guests who are coming to Colombo
Donald Gaminitillake
Colombo
Dear Harsula,
I will be very grateful if you stop spreading the myth that Unicode Sinhala is as same as SLS1134:2004.
Unicode Sinhala is dated 1998. (http://www.fonts.lk/history.html)
This refers to a document at http://www.unicode.org/charts/PDF/U0D80.pdf.
In fact, the copyright of this document goes back to 1991.
Further, Unicode is an INTERNATIONAL standard.
SLS1134 refers to a separate document available at (among other places) http://std.dkuug.dk/jtc1/sc2/WG2/docs/n2737.pdf
This is a LOCAL (SRI LANKAN) standard and dated 2004.
These are two documents, though there are overlapping areas and SLS1134 was developed to fill the gaps in Unicode Sinhala (such as the absence of yansaya, repaya and bendi akuru)
There were Sinhala fonts based on Unicode long before SLS1134 was approved.
Hope the difference is clear.
Dear Dharma
The SLSI 1134 too have the same contents as the unicode Sinhala
SLSI1134 do not represnet any characters such as yansaya, repaya or bendi akuru or “DU” etc etc
The SLSI also does not indicate the locations of any elements after the joiner.
Please send me your Email address so that I can send you a copy of the pdf format of SLSI1134
Unicode only honour the National Standard of a country.
To Admin:
I hpoe the Admin will post both unicode chart and the SLSI 1134 chart in jpg form for the public to verify. Pls send me the Email address I can send both in jpg format.
Donald Gaminitillake
Colombo
Donald,
As far as I know, both Unicode Sinhala:1998 and SLS 1134:2004 are not ‘charts’ per se. There are something more to charts. They are two documents.
I have given the links for both so any interested party can refer. No need for admin to repost.
As for Yansaya and Repaya you are wrong, because page 6 of SLS:1134 (2004) tells how to represent them clearly. (But not Unicode Sinhala: 1998)
I have checked with Devanagari. I do not think they have a yansaya, but the repaya and rakaransaya are handled in the same manner.
Dear Dharmma
Unicode consortium is a list of Character allocation tables
SLSI1134 too a Character allocation table
I talk of full individal characters but SLSI1134 only talk of these characters but the elements or locations are not specified
The Chart only consist of the same unicode sinhala chart or code points.
These code points are clearly defined.Not the rest of Sinhala characterrs.
That is why you need many different fonts to be down loaded to see the contents.
Simply beciase the different font maker keep the “DU” in different locations.
No specific standard or location
example
0daf = Sinhala letter alpapraana dayanna,
0dd4 =Sinhala vowel sign ketti paa-pilla
With the Sinhala unicode or with SLSI 1134 no one can develop any OCR, Voice and other computer based applications for Sinhala.
This is the problem I am addressing.Without publlishing all the sinhala characters with proper code points nothing can be develop.
Why dont you give me a phone call.
Donald Gaminitillake
Colombo
(0daf,0dd4) = Sinhala letter alpapraana dayanna,Sinhala vowel sign ketti paa-pilla
Quotes from SLSI 1134
Quote
1
“Symbols used in the Sinhala language are coded using 128 cells in the half page plane
reserved for Sinhala characters in ISO/IEC 10646. Each cell or position given in
Figure 1 of the standard represents one character.”
2
“In addition to storage, retrieval and machine to machine communication in
Sinhala, it also includes provisions to co-exist with other languages as specified in
ISO/IEC 10646.”
3
“Codes are not provided in the code set for distinct formations in the language
for the Repaya, Yansaya, and Rakaransaya.”
4
“However, specific collation algorithms (not specified in this standard)
are required to correctly collate text encoded in this code.”
Unquote
———————-
I hope now Dharmma will accept the the SLSI1134 & Unicode Sinhala is just a few sinhala characters registered in unicode to use a computer as a typewriter.
You could see on jpg format the number of Sinhala characters in unicode
http://www.decodeunicode.org/w3.php?ucHex=0D80
Quote 2 confirm SLSI1134 = Unicode Sinhala = ISO10646
The SLSI1134 & Sinhala unicode is incomplete set Sinhala Characters. (4) quote confirms all this.
Donald Gamnitillake
Colombo
Donald,
As a *USER* I will accept Unicode or Donald code or SLS 1134 (or for anything for that matter) if that enables me to freely use Sinhala in the computer environment. Period.
Not only Dharmma all the people who wants to use Sinhala must have the free right.
Today it is not there.
Implementing my system you all get the right to use SInhala not only in computer as text but on many other applications.
I am the only person who is fighting for these rights and the rights of the Sinhala Langauge.
Yet the authorities are Deaf, Dumb and Blind.
By the way Dharmma and Admin do you know of any place where they need help for computers.
Please visit this site
http://www.rotarycolombocentral.org/funactional Literacy.htm
(a). Photograph of the site(not necessary at the first stage),
(b) Address of recipient(e.g. Chief priest’s name) ,
(c) Beneficiaries (e.g. 40 villagers per month),
(d) Number of people in the village served by each centre,
(e) Person responsible in the Rotary Club or project Chairman’s name and email address and telephone number.
(e) would be myself and our club
I need thhis information before the 25th of July.
Donald Gaminitillake
Colombo
http://www.rotarycolombocentral.org/funactional Literacy.htm
or
http://www.rotarycolombocentral.org
Click –Functional Literacy Programme-2006/2007
Donald
Hi,
I didn’t had the time to read the complete blog. I’m Yannis Haralambous and I would like to make one thing clear: when I have designed the Sinhala font there was no Unicode yet and so the encoding I used was (and is) completely arbitrary. All my efforts went into designing a nice font for a writing system I admire a lot. Personnally I think I have succedded althout native readers may think otherwise. I have designed my font in METAFONT and others have afterwards converted it into PostScript. I believe that Unicode is the future, but if you know of any characters missing from Unicode you should make a proposal to add them. On the other hand, if you know of ligatures not included in my font, please let me know, and maybe one day I can add them. Ligatures are glyph issues, and have nothing to do with Unicode, which is a character encoding. In November will appear the English translation of a book I have written on Fonts and encodings, from US publisher O’Reilly.
Cheers
Yannis
Dear Yannis
Thank you for the reply
I very much appriciate.
I will post you the list of my characters soon
Donald Gaminitillake
Colombo
Hi Dharma Gamage,
Dharma Gamage on Jul 18th, 2006 at 10:27 am wrote:
—————————————————————————-
I will be very grateful if you stop spreading the myth that Unicode Sinhala is as same as SLS1134:2004.
Unicode Sinhala is dated 1998.
—————————————————————————-
Unicode Sinhala = SLS1134:2004. Perhaps you are failing to understand that Unicode and SLS1134 are not static, periodically they are revised. Go look at:
http://unicode.org/versions/enumeratedversions.html
Nor are they independent of each other. When SLS1134 is revised, e.g. in 2004, it will be presented to Unicode. e.g.
http://www.unicode.org/consortium/utc-minutes/UTC-099-200406.html
——————————————————————————
Toronto, Ontario, Canada — June 15 – 18, 2004
Scripts, New Characters – Sinhala (C.16.9)
[99-A3] Action Item for Rick McGowan: Draft a letter of commendation (regarding the new Sri Lankan standard for Sinhala) for the Sri Lanka national body, for Mark’s signature.[L2/04-131]
…
Sri Lanka standard for Sinhala (C.15.12)
[99-C37] Consensus: The UTC recommends that “right-side” forms of conjuncts in Sinhala be represented by a sequence of . [L2/04-131]
[99-A51] Action Item for Rick McGowan: Write a response to Sri Lanka re the subcommittee recommendation. [L2/04-131]
[99-A52] Action Item for Peter Constable: Write a document on consistency of left and right-side conjunct forms in Indic scripts and request an agenda item for the August meeting. [L2/04-131]
——————————————————————————
For a bit more history on Unicode Sinhala, please read:
http://fonts.lk/pdf/whatisunicodetalk.pdf
Hope you now understand why they are equivalent.
Regards,
Harshula
Harsula,
Please do not try to establish this myth of equating Unicode and SLS1134 anymore to cover the skins of certain people.
I can understand the argument you try to build. You try to say Sinhala Unicode was established only in 2004, so there is no way it could have been incorporated in any of the OSes.
Thus you can easily cover the skins of Dino and gang, who have been working in this issue since mid 1980s. If Unicode was approved only in 2004, what Dino and his people were doing all these days?
I do not talk about versions. I talk only about the originals. Unicode Sinhala was first approved in 1998. So if that were complete, there were no difficulties in incorporating it to the OSes.
The question is why it has not happened.
Quote
The question is why it has not happened.
unoute
Because it was incomplete Sinhala characters
and
Quote from SLSI
“However, specific collation algorithms (not specified in this standard)
are required to correctly collate text encoded in this code.”
unquote
Hidden area that still kept under carpets
So my claims are fully justified.
Donald Gaminitillake
Colombo
Friends,
I just visited Milinda Moragoda’s revamped web site http://www.milinda.org.
It is very interesting that I cannot read the Sinhala pages. All I see is garbage. I think Milinda too uses Unicode!
If he knew the issues Milinda would have used the same solution used by Lankaenews or Lankadeepa instead of using proprietary fonts.
What is the big idea of having a web site that cannot be read by the majority of users?
This is the font set he is using
font-family: FCLWebSinhala, FCLSinhalaWebS
I told him this probelm in a public forum when they were contesting for elections
He told me that I am telling him the empty side of the water glass. This is the understanding Millinda has.
I wish he will be able to see this grabage using Dharmmas computer or mine
Donald Gaminitillake
Colombo
I was there at that meeting when Donald questioned Moragoda on standardisation of Sinhala issue. I think that was the point all these started.
I remember how Moragoda responded to queries. Then he did not know how to talk properly. Not that he was impolite. He was polite and replied in detail, but his tone was very arrogant. (as Donald says, instead of replying the question he has reverted saying Donald sees a glass half empty! – something a politician should never do)
Anyway, now I think Moragoda has changed after losing the election. (Politicians learn lessons only after losing elections. Till then they think they are the gods.)
For those who have missed the fun that day, I reproduce an extract from an unpublished article. (Authour not given.)
It is not on Sinhala standardisation, but any IT enthusiast in Sri Lanka, should read it. It gives a clear impression how the two major political parties sees ICT.
[quote]
Immediately before the General Election of 2004, Computer Society of Sri Lanka invites representatives from both major political fronts to present their views on ICT policy, to an audience consisting mainly of ICT professionals. Moragoda was the obvious choice from UNP. Susil Premjayanth comes representing UPFA, probably because he is the secretary of it, and nothing else. I feel sorry for Premjayanth. In this subject, he is only a Lilliput compared to Moragoda. I hope the crowd would not take too harsh on him.
There is a sharp difference in the approaches taken by the two politicians. Moragoda, the Messiah of ICT of the developing world, comes in his royal Ambassador, escorted by another Ambassador and few more vehicles full of his ‘chuk-golayas’.
Moragoda’s officials seem to know nothing about communication in the political world. Perhaps they wanted to show their devotion to the boss. They occupy the entire front row, quite inadvertently creating an invisible barrier between their master and the audience.
Moragoda needs no preparation to answer quires on ICT. He takes questions one by one, answers them at the conceptual level by taking examples from USA and Malaysia. He explains his future plans in detail. If he is not sure of any point he passes the question to an official in the front row. One of his golayas immediately comes to rescue the master. Sometimes several officials, one after another, stand to answer the same question. Somebody asks a simple question about some minor thing, and even before Moragoda has a chance, a junior officer rises and replies on his behalf. I see Moragoda nodding. He might be quite happy with the loyalty shown by his yes-men.
Without even realising it, Moragoda is doing two mistakes. One – He is using public officials in his political campaign, which is by no means ethical. Two- he gives the impression that he does not know what is going on, and helpless without the assistance of the officials. Both reflect negative on him.
A senior academic puts a simple question: “Mr. Moragoda, why do you want to spam our mail boxes by sending us unsolicited political propaganda?”
Moragoda with a visible uneasiness, tries to explain in detail and says if anyone does not want the mails there is always an ‘unsubscribe’ option. A dumb response. He should have been smarter. “I tried that…” retorts the academic, “…but still I get your mails!”
We all have a good laugh on Moragoda’s expense. Moragoda apologises and says he would take corrective measures, but to no avail. The damage has already been done.
I do not say Moragoda did not answer the questions or evaded them. He answered every one of them satisfactorily, but the problem was his answers were too abstract. He uses no anecdotes, no yarns and absolutely no humour. He never smiles. All of us might have been ICT professionals but we are all tired after a hard days work. The last thing we want is another conceptual level lecture on ICT. We look for something simple. Moragoda fails to deliver that.
The approach taken by Susil Premjayanth is exactly opposite. He comes alone, clad in pure white and with a book in his hand. He is obviously not prepared and probably he knows nothing about ICT. Still he looks confident. He smiles at the crowd and requests not to ask technical questions, as he himself is not an expert. He hints that the real experts are in the audience. He would have made Dale Carnegie beaming with proud with that statement. I see some in the audience already nodding their heads.
Having built the rapport, Premjayanth takes questions one by one and gives short and simple answers. (Obviously, unlike Moragoda, that is the best he can do!) Without a row of obedient officials creating a communication barrier, Premjayanth has no difficulty in building an effective relationship with the audience. He goes one more step further by remembering the names of the persons in the crowd and addressing them by their names. Now he is making Dale Carnegie really proud. He smiles and cracks jokes. To the audience, he is no more a politician; he is their next-door neighbour.
The same academic now puts a similar question to the speaker: “Mr. Premjayanth, before you Mr. Milinda Moragoda addressed us. But we find your voice is much coarse than Mr. Moragoda’s. I think that is because you go on addressing political rallies in the traditional manner, while Mr. Moragoda uses modern technology like e-mail in his campaign. Don’t you think it is the time for you too to adopt such new techniques?”
Pat comes the reply. (It is a gem!) “That has always been the problem with UNP. They do not understand the culture of this country. So they go on sending you e-mails. No, thank you sir. I do not want to engage in any e-mail campaigns. I will do my campaign in the way I am used to. Let’s see the results on April second!”
Hats off to you Mr. Premjayanth! We all erupt in laughter and for the first time that day the audience gives a round of applause. Mind you, this is a hundred percent IT savvy crowd clapping for someone who has just spoken against the use of e-mail! Cannot help. Premjayanth has won the day.
What Premjayanth did not tell was that more than half of the e-mail users in this country, being members of the affluent class, will anyway vote for UNP whether they receive propaganda mails or not. So why waste time and money on votes you already have in your pocket? Why not go to paddy fields, market places, factories and temples and attempt to win the hearts of the simple folks by addressing them in a language they understand using a communication medium they are comfortable with?
The verdict: It was a fight between David and Goliath. We all know who won.
[unquote]
Hi Dharma Gamage,
Dharma Gamage on Jul 19th, 2006 at 12:25 pm wrote:
————————————————————————
I just visited Milinda Moragoda’s revamped web site http://www.milinda.org.
It is very interesting that I cannot read the Sinhala pages. All I see is garbage. I think Milinda too uses Unicode!
————————————————————————
No, it’s not Unicode. And the Tamil section is not Unicode either. If it was Unicode I would have been able to see the letters in both cases.
It’s yet another Sinhala font with their own proprietary encoding scheme. Just like Donaldcode.
Regards,
Harshula
I have a vague idea that the earlier version of Milinda’s site was done with Sinhala Unicode fonts. If I remember correctly, it was done by the same company that did the ICTA site.
May by even Mr. Milinda Moragoda too had lost faith in Unicode.
The day when Sri Lanka use “Donald’s Code” these things will never happens
Harsula you have not answered the following yet
:Quote from SLSI
“However, specific collation algorithms (not specified in this standard)
are required to correctly collate text encoded in this code.”
unquote”
Donald Gaminitillake
Colombo
Hi Donald,
It’s quite clear now why you are so confused about Unicode Sinhala (SLS1134:2004). You completely don’t understand the concept of ‘ligatures’ in *smart* fonts.
Donald Gaminitillake on Jul 18th, 2006 at 1:28 pm wrote:
——————————————————————
The Chart only consist of the same unicode sinhala chart or code points.
These code points are clearly defined.Not the rest of Sinhala characterrs.
That is why you need many different fonts to be down loaded to see the contents.
Simply beciase the different font maker keep the “DU” in different locations.
No specific standard or location
example
0daf = Sinhala letter alpapraana dayanna,
0dd4 =Sinhala vowel sign ketti paa-pilla
——————————————————————
Your explanation/suggestion is completely and utterly incorrect.
An example of a ligature is “DU” (0daf,0dd4). Very simply, it’s two ‘inter-connected’ letters.
You need to first realise that fonts are much *smarter* now. There are two notable sets of tables in these fonts:
Table 1) Defined by Unicode (non-ligatures)
– The codepoints of the letters you have typed are defined by Unicode, these are looked-up in the font, if it is *not* a ligature, then the glyph/image is drawn on your screen. If it is a ligature, there’s an additional step.
Table 2) Defined by Font (ligatures)
– If the codepoints of the letters you have typed form a ligature, e.g. you have typed (0daf,0dd4) dayanna,paa-pilla, then an additional lookup is done *internally* in the font.
– This works by looking in an internal table to see whether there is a ligature for the two codepoints, in our example (0daf,0dd4). If there is a ligature for the two codepoints (0daf,0dd4), then the glyph/image of the ligature is drawn on the screen. e.g. “DU”.
– This means that whenever you write “DU” (0daf,0dd4) the user can expect to see “DU” regardless of which Unicode Sinhala font is used.
– This means that “DU” does *NOT* have to have a separate codepoint of its own like the primitive printing press technology used by Donaldcode. It is simply recognised and drawn on the screen by typing (0daf,0dd4).
Donald Gaminitillake on Jul 19th, 2006 at 4:42 pm wrote:
——————————————————————
Harsula you have not answered the following yet
:Quote from SLSI
“However, specific collation algorithms (not specified in this standard)
are required to correctly collate text encoded in this code.”
unquote”
——————————————————————
http://www.ucsc.cmb.ac.lk/ltrl/public/Localization/Sinhala_Collation_&_Encoding_2%5B1%5D.0.pdf
Donald, do you even understand what collation is?
Regards,
Harshula
Dharma and Harshula,
Being a techie Harshula knows this, but you missed to answer the question properly perhaps due to a slip of the mind. It happens to everybody.
It is simple to find out what font is used in a plain web page like what you are discussing here. Simply look for View Source from the menu and search for the ‘font’ key word in the resulting text page. You will find the font-family declaration(s).
In this page, they specify to use FCLWebSinhala, and failing which show in FCLSinhalaWebS and no deafult font if those fonts are absent. So, the default is the regular Latin font of the browser. The question marks indicate that the default Latin font does not have letters for those Unicode positions.
Copy the garbage characters, paste it into Notepad, and change the font to some Sinhala web font you have. You’d be able to see some of the characters in Sinhala but they would not make sense. Now change the font to a Unicode Sinhala font. You’ll see only English characters — same as what you saw on the web page.
This illustrates the problem we are facing. There are many Sinhala fonts. Each define code points with different alphabetic letter. They don’t agree from one to the other. There are two classes of Sinhala fonts: Web fonts that use Latin code points and Sinhala Unicode block based fonts such as Iskoola Pota.
To read Sinhala you should have the font specified by the web page. Supposing everyone agrees to use the same standard, i.e. same character for same code point, you should expect to be able to read Sinhala everywhere. We still would have two problems:
1. The web developers would have to have many font faces to make their sites attractive. Hard work, just for a few thousand rich people in the cities to read their web pages.
2. The abugida type fonts (Unicode fonts) need font rendering specifically installed for the web browser to do the ‘complex font’ rendering. Or, they’d have to do font embedding plus rendering in each brand browser.
The technical ramifications of moving Sinhala to ICT would have vastly different results depending on what the majority finally accepts as their standard (standard in the sense, consensus). The problem with this statement is that ‘majority’ in Sri Lanka means ‘the powerful’.
I say it is best to tack along with the Western Europeans. That way, we only need to say our characters look different than theirs and to be quiet because German Fraktur and Gaelic do have different characters shapes. We have already said this to Microsoft re: Pali. They do not have objections, and agreed to find the cause of our font not uniformly showing correctly in all Windows XP machines.
If this is solved, we in the Pali group would have gone way far ahead of Sri Lankans. I know Lankans want somehow the government to say this is what you should do. We are not bound by that silly notion.
Unicode Vs Donaldcode Vs JCcode: How to write ‘train’ in Sinhala
Assessment criteria
a) Encoding scheme is internally consistent in the provided example.
– This verifies that the scheme has been given at least some thought. This is essentially to give the participants some easy marks.
b) Sinhala script and Latin script visible at the same time in a text file.
– This ensures that the Sinhala script is given ‘first class’ position alongside Latin. If you can’t write the Sinhala script and the Latin script simultaneously in a text file then you can’t use Sinhala in low-level applications, thus relegating the Sinhala script to a ‘second class’ status.
c) There are existing implementations
– This is important for two reasons. One, is that it is a practical solution, not just a theoretical solution. Two, it minimises the latency before it is available in the market, thus becoming widely adopted.
Unicode
=======
දු (0x0daf,0x0dd4)
ම් (0x0db8,0x0dca)
රි (0x0dbb,0x0dd2)
ය (0x0dba)
(a) 1/1, it is internally consistent in this example.
(b) 1/1, the goal of the Unicode project is to uniquely encode all scripts.
(c) 1/1, GNU/Linux implementation available and MS language pack available.
(total) 3/3
JCcode
======
[Requirements: “equivalent of one Sinhala letter. So, in your case it might be two Latin letters.” Hence a slight format change of JC’s submission.]
ðu (240,117)
m (109)
ri (114,105)
ya (121,97)
(a) 1/1, it is internally consistent in this example.
(b) 0/1, JCcode collides with Latin, thus not allowing Sinhala script and Latin script to be displayed simultaneously in a text file. It relegates the Sinhala script to ‘second class’ status.
(c) 1/1, Using the latin script should be available on all platforms. Using the Sinhala script apparently requires downloading a font.
(total) 2/3
Donaldcode
==========
Du =3708
Mm=4501
Ri= 4806
Ya= 4702
(a) 1/1, it is internally consistent in this example.
(b) 1/1, encoding values will not collide with Latin one, however it should be noted that the encoding values will collide with Unicode.
(c) 0/1, no implementations available, purely theoretical.
(total) 2/3
Conclusion
==========
Unicode wins because it provides the best medium-long term solution. The currently slow adoption because of the slowness to market of a vendor holding a monopoly on OSs is causing a lot of pain. JCcode is an interesting solution in that it tries to accommodate the Sinhala language, not the Sinhala script, in old technology. JCcode is worth observing. Donaldcode is technically already a dinosaur even before it is implemented. For a solution which started from scratch it is quite disappointing.
Regards,
Harshula
Harshula,
What?:
a) Encoding scheme is internally consistent in the provided example.
Please explain, if necessary with an example.
I know its all English. But it has no discernible meaning, granted tech sounding words.
b) Sinhala script and Latin script visible at the same time in a text file.
– This ensures that the Sinhala script is given ‘first class’ position alongside Latin. If you can’t write the Sinhala script and the Latin script simultaneously in a text file then you can’t use Sinhala in low-level applications, thus relegating the Sinhala script to a ’second class’ status.
Here’s the sample:
[Sinhala]
mee vacana síhala bhaaxaaveni. harzula magee miþurekya. jayaveevaa!
[English]
This is English. English rules if you like it or not. So, stay near English if you care about your future.
Re:
‘first class’ position alongside Latin, above,
Here is the First Class language list:
Standard Scripts:
languages with Basic Latin & Latin-1:
If you call these ‘standard’, it implies that all others are non-standard.
Source:
http://www.microsoft.com/typography/otfntdev/standot/appen.aspx
Danish
Dutch
English
Faroese
Finish
Flemish
German
Icelandic
Irish
Italian
Norwegian
Portuguese
Spanish
Swedish
And here is the ‘second class’, if you please:
Also Standard but have problems:
languages with Unicode Extended Latin:
Afrikaans
Basque
Breton
Catalan
Croatian
Czech
Esperanto
Estonian
French
Frisian
Greenlandic
Hungarian
Latin
Latvian
Lithuanian
Maltese
Polish
Provencal
Rhaeto-Romantic
Romanian
Romany
Slovak
Slovenian
Sorbian (Lower)
Sorbian (Upper)
Turkish
Welsh
Vietnamese
The next classifications are by me:
The third Class:
Chinese
Japanese
Korean
Fourth Class:
Arabic and Hebrew
Fifth Class
Abugidas (All Indics and Sinhala Unicode block.
==============================
It seems like you are suggesting German is second class to itself. Romanized Sinhala is an adaptation of Icelandic.Icelandic is a ‘first class’ script.
c) There are existing implementations
– This is important for two reasons. One, is that it is a practical solution, not just a theoretical solution. Two, it minimises the latency before it is available in the market, thus becoming widely adopted.
Romanized Sinhala:
garu siyaluðenaa veþayi,
oba mata liyuvoþ mama obata evannam font eka.
þaraha ganna epaa. namuþ meeka þaraGayak novee. meeka lákaavee anaagaþa parapura koyiþaram hoðin lookee anikuþ aya samaGa karata kara þaraGa kaariiva iðiriyata yanna uðavvak veyiða baaðhaavak veeða kiyaa soyaa balana avasþaavak. obee hoo magee þaavakaalika ðinuma pæraðuma væni ðee apa siþin ivaþ kara yuþuya kiyaa mama siþanavaa.
aþana meþana karalaa þibuna paliyata meeka þamayi karanna oone kiyana eka bhayaanaka ðeyak. ehenam, api aaµdu venas nokara eka pakxayak kala ðeema iðiriyata genayanavaaða næþinam væraði maarga aþa hæra ðamaa aøuþ maarga haa krama anuva yali iðiriyata yanavaaða?
rooman akuru síhalen ðæn paali suuþra liyaagena yanavaa. noyek jaaþikayan eya ihalin anumaþa karaa. kisikenek eya pahala ðæmuvee nææ.
mee siyalla mama liyuvee vindoos nootpææd ekeyi.
sþuþiyi.
Let me repeat this again I am not defending anyone or any masters as we dont work for anyone. But we followed the Sinhala Unicode and technically implemented it for various use of Sinhala in MS & Mobile Platforms. Same thing I beleive people like Linux, MS, IBM, Orcale will be doing.
The technical implementation of finalized unicode standard was started about 1 – 1/2 years ago, if am not mistaken so it will take some time to see all these things in place in the marketplace. Windows will support this in Vista, Linux already supports, Oracle supports Sinhala Unicode and so does many other technical implementations. Win XP supports Sinhala Unicode through an enabling pack however the best implementation for MS will arrive with VISTA.
I think the big mistake ICTA is doing is sleeping rather than coming out and showcasing these solutions to what’s available to general public. If they organize a forum and mini-exhibition to showcase all Sinhala Unicode compliant products/services including emailing among platforms, cut & paste to what not, we can invite all these forum members to showcase the same. Perhaps the same forum can be used to arrange a debate/questioning about the Unicode!
Newspapers
————-
The best people to answer would be people from ANCL, Wijeya, and Upali where there are 2 people from these organizations who were in the Unicode Task Team if am not mistaken. So they should come and highlight why the papers are not Unicode complaint yet. I dont see an issue but beleive it’s all internal matters which they among themselves needs to finalize.
Dharma for you to see respective Sinhala Unicode based websites in your PC without downloading, you may have to wait for VISTA where you upgrade to it. Even if another standard get’s established there is no way it will just get established in your PC magically. It applies to JC, Donald and whowever who builds another standard, font or way of working in Sinhala. That too has to be technically accomplished.
Finally, if it was Donald’s standard which is been accepted by ALL (MS, Linux, Oracle, Googls to local acedamia and private software companies to government) sometime ago we would have implemented the same. But everyone agreed and accepted to work on Sinhala Unicode (SLSI1134) hence we too have established the same to ensure inter operability and also since it’s feasible to technically implement.
We do have a problem in implementing Sinhala
Please confirm whther you have a hidden “union” of character table apart from the few characters registered in the unicode = Slsi1134.
“yes” or “no”
Even Harsula avoid this question. This was posted last week.(160 and 163 )
Linux group have proved that there is a”union” .
Donald Gaminitillake
Colombo
The best people to answer would be people from ANCL, Wijeya, and Upali where there are 2 people from these organizations who were in the Unicode Task Team if am not mistaken. So they should come and highlight why the papers are not Unicode complaint yet.
[unquote]Yes, the best person to answer this question is Mr. Naveendra Gunaratne from Wijeya Newspapers, who was in the original Sinhala fonts task team and left thoroughly disappointed, because his concerns were never taken seriously by Gihan, Dino and the rest of the team, who had their own agendas. (The ANCL man was only a puppet.)
However, I do not see any logical reason why any newspaper company should shift to Unicode compatible platform.
As I said before, if the Unicode supporters want to make Unicode Sinhala, a standard they should first have enough applications to attract users. As long as they do not, the newspapers will use what will bring them better results. The business leaders take decisions based on market.
Newspapers do not use Oracle or Linux. All they wanted is good font sets to be used in the publishing environment and perhaps relevant applications.
You cannot force anybody to use Unicode compatible Sinhala fonts sets, if that does not given any advantage over the rest. You cannot hold a gun at the head of a press baron and threaten him to use Unicode.
Finally, have you seen anywhere that VISTA will support Sinhala? I have not and given what had happened in the past, I have strong doubts about that.
Since there is no set of full Sinhala characters registered either in SLSI or in Unicode the software developers are deprived to make any software for sinhala.
Only I have done and published this document with code points. Since this was done by me in private capacity I do have the copyrights and a patent is pending.
The code points which are outside the unicode registered area is kept under a blanket called a “UNION” and this list was never published. The content in this “union” differ from one font maker to the other.
As I have previously mentioned in 178
Quote
Only a part is registered balance kept inside a unpublished “union”. Who ever hid these codepoints may had a commercial venure –a monopoly — in the mind or deprive the people in lanka of Sinhala IT education. IT only open for the english speaking group.
Unquote
Donald Gaminitillake
Colombo