Please continue discussion from Standardizing Sinhala for IT Part 4, on this thread. The change in the title reflects the diverse software issues discussed in the context of Sri Lanka that have gone beyond the initial Unicode vs Donalcode debate. Please keep the discussion civil.
Previous discussion is archived in the following threads:
214 Comments
Noam Chomsky
Shouldn’t the title be “IT Issues in Sri Lanka” because always we see people discuss here about beyond the scope of Software, Like Organizational issues and politics of ICTA , .lk issue and VK Sam bashing .
JC Ahangama
LET’S WRITE IN SINHALA
Helaya, Donald, and the honorable audience, may I humbly propose that we try to use Sinhala when writing here? The arguments aside, I was just elated to see so much Sinhala. (Actually, I think we should keep to civil public discourse, but I agree that it is difficult).
For those who would like to test it, we have the US (Sinhala) keyboard. It is only slightly different from the US (English) keyboard but it is optimized or fast-typing Sinhala.
Helaya, please try it, just for the fun of it. Of course, if you use only Linux then you’d have to use the one with dead keys. In most Windows machines you switch between the keyboards with CTRL SHIFT combination. It’s easy, I am not lying, kiddos.
Here are the rows showing the difference (remember, only-lower-case):
LINE ONE
———–
English:
q w e r t y u i o p
Sinhala:
ä w e r þ y u i o p
(Notice that the first position is visargaya and fifth is OE Thorn)
LINE TWO
———–
English
a s d f g h j k l
Sinhala
a s ð f g h j k l
(The third is OE Edth)
LINE THREE
————-
English
z x c v b n m
Sinhala
æ x c v b n m
(The first is OE Ash)
The original letters assigned to those altered keys could be had by typing those keys while holding down the right-hand side Alt key or CTRL ALT on the left.
síhalayaa nægitapiyav! uBalage molee munta vadaa varðhanaya velaa þiyenavaa. avasþaava viþarayi apata oone. ee kiyanne hæmaþænama Internet eka (jájaalaya) nomilayee þiyenna oone. eeka karanna oya hiþana þaram salli oona nææ.
URL:
http://www.sinhalaheritage.com/ussinhala/
Blogwatch
Naom,
The real topic should be VK Curse in Sri Lankan ICT. It is not the topic that is important here, it is the content. Your revelations about VK has gone to the right place (to the top most) and his life in ICTA is hanging in the balance now. So, it is high time you start a “well organized attack” on him as you once mentioned. VK’s real skin is being exposed now…. Wolf is out. His time is being reduced, thanks to this blog. So, keep attacking. One day you see the light at the end of the tunnel. Hope VK also see his light as well since he is about to kick the bucket. You could do in one post what Donald couldn’t do in last few months as you attack with substance. VK won’t be able to fool the big man for long.
Donald Gaminitillake
“Standardizing Sinhala” to “Software issues in Sri Lanka” much broader topic.
Quote from 199 of Standardizing Sinhala part 4
“If you have any problems regarding dictionaries and encyclopedias ask from an expert on that field. There is no point to bring such questions here.”
unquote
We do not have a complete electronic Sinhala – English dictionary or an encyclopedia neither in printed form.
Only I have printed and published a near complete set of full Sinhala Characters.
All this is due to the fact that Unicode Sinhala , SLSI1134 and Sinhala ISO is incomplete and incorrect.
Dino and his group has spent funds from the government for the past 25 years yet failed to give the public proper product.
Donald Gaminitillake
Colombo
Donald Gaminitillake
Re OCR
My version of OCR is as follows
“The OCR hardware scans the material and compares the image with the text matrix in the software. The character is analyzed in a Cartesian coordinate system on a pre-defined grid to identify the character. Once the shape of the character is determined, the program could conduct a search for similar character(s) in the character allocation matrix and thereby determining the character”
For Sinhala character “DU” or “KU” to identify by OCR the image of “DU” or “KU” has to be in the software as a full character “DU” or “KU” . In SInhala unicode I cannot find a “DU” or “KU”
Profesional IT guys can you give an explanation. As to how OCR works for Sinhala Unicode.
When Sinhala unicode has no location for the character “DU” or “KU”
Donald Gaminitillake
Colombo
JC Ahangama
Software Issues in Sri Lanka. I like that topic because that is what everything boils down to.
Now, I have been watching his Dumriya debate for long. On one side, it gives rise to such passion that it becomes personal. Let me take the OCR question.
I am a person who is old enough to remember the time OCR was introduced. It was very inefficient. Now it is better, yet hardly anyone uses it. I am talking about English.
Let’s look at the English alphabet:
Small letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
We can agree that the following sets of letters occupy about the same shape and size of space and about the same density of darkness to confuse the program.
c, e, o
b, h, k
i, l
i, j
So, depending on the clarity of the scan, there are ten out of 26 for lower case letters that someone may have to edit OCRed doc. This is presuming that the program is advanced enough to guess the font family.
A B C D E F G H I J K L M N O P Q R S T U V W
For caps I’d take the following sets:
B, D, E, H
C, G, O, Q
F, T
U, V
P, R
M,N
Too many!! However, a spell checker might help because the choice is less than four letters.
Even so, OCR is just not a big deal here. You scan and PDF the files, perhaps because scanning is much improved now. US send most doctors’ records to India to scan and PDF because the Federal Government requires record keeping (and compromise security, sigh). They also do medical transcription. They do MT because OCR is inefficient.
How about Sinhala? There are 24 vowels, long and short without counting iluyanna. And about 39 consonants counting updhmaaniiyaya (f) and leaving out visargaya and bænði akuru. That is, 24×39 letters. As many of this set do not belong to mixed Sinhala sound set, let’s say we have 900 letters. (The font we made has 1600 plus characters altogether).
First, Sinhala letters have a much higher dark part density than Latin. Second, Sinhala fonts cannot be categorized into font design families. An OCR program has to take these into consideration.
Consider one similar set:
Mahapraana kayanna, kantaja naasikyaya, cayanna, tayanna, dayanna, sanyaka dayanna, mahapraana ðayanna, bayanna, mayanna, amba bayanna and vayanna = 11 letters
We cannot say which one of these would be selected by the OCR program for a marred mayanna. So someone needs to manually edit the OCRed text or help along the PCR process.
If I were the person to convert the scan to text, I’d employ typist to do it because it is afster, accurate and cheaper.
Donald Gaminitillake
JCA!!!
quote
As many of this set do not belong to mixed Sinhala sound set, let’s say we have 900 letters…
unquote
The problem here is the UNICODE SINHALAor SLSI 1134 or ISO whatever you name these charts do not have the full set of SInhala characters. less than 100
This is where the problem is.
Quote
(The font we made has 1600 plus characters altogether).
unquote
Unicode sinhala do not not have 1600 plus characters in the registered format.
Latin script does have more then 128 and falls into several pages in unicode!!!
With the few registered unicode sinhala set how can OCR identify the “DU and “KU”?
I place this question to HELAYA who claims to be an IT professtional.
Donald Gaminitillake
Colombo
Donald Gaminitillake
The best answer was given in the Maubima a Sinhala news paper in Sri Lanka
Page 36 written by Mr Pushpananda Ekanayake
Will someone translate it to english and post it.
A simple summery is (but he had written more than this)
“The people who worked few decades ago had the knowledge of the subject that they were working. A typesetter knew how to use sinhala characters and compose looking at the manuscript additionally had a very good knowledge of the language even to correct the errors.
Today a typesetter knows how to use a computer but have no good knowledge of the language.”
Donald Gaminitillake
Colombo
JC Ahangama
Donald,
I understand the OCR dilemma. Unicode does not have all the possible combinatons. I am just trying to approach the problem honestly.
Unicode indeed presents a problem for OCR. However, that may be solved.
I think what I might do is make a font that has all the possible characters (I mean Sinhala letters — Unicode even mangled the English language). How you do that is by placing those characters that do not belong to the Sinhala Unicode block in the Private User Area (PUA) of the Basic Multilingual Plane (BMP) of Unicode starting at U E000 going for 6400 positions.
I’d map them as composites and decompositions of the Unicode character set. The decompositions are needed for the two ligatures it has that shouldn’t be there. Now, the OCR engine could associate charcters it finds to the characters in the font.
Sonald Gaminitillake
Thanks JC
Quote
Unicode does not have all the possible combinatons.
Unquote
again the rest have bolted
When we can register the whole thing in unicode or SLSI or ISO why should we use the private area?
First we have to get the people who did it wrong to correct it
Officially Donald Gaminitillake placed the objections to the SLSI 1134.
This was officially discussed and documented
While accepting a need for a character allocation table DINO’s group over rule my objections.
That is why Dino is not coming forward to debate with me.
He knows that a character allocation table is required to move Sinhala forward in the IT field.
Since he missed it and now he has to give the credit to none other than Donald
Now get back to the SLSI 1134 correct it with my Sinhala character set.
Then the whole problem of Sinhala software will get solved overnight
Donald Gaminitillake
Colombo
Google
Interesting http://groups.google.com/group/sinhala-unicode
On sinhala stuff…
Helaya
Dear Administrator,
It is ideal that you have changed the topics of this thread to Software Issues in Sri Lanka Part 5.
The so called standardization issue (or shall I say non-issue) had been discussed for so long and it has brought no results. Even if we discuss this non-issue for another ten years the results will be same.
Donald and JC are just repeating what they have said earlier umpteenth time. Their individual questions have been answered successfully so many times, but they ask the same questions repeatedly. For instance Donald asks what is the code point for ‘Du’ I have pointed it out. (In fact so many like Harsha and Harsula have done so) It is clearly there in Unicode in black and white. So that shows only Donald inability to understand the Unicode concepts. That is his problem. Not ours.
There is no use of wasting web space for non issues.
There are so many important issues regarding the development of software in Sri Lanka.
Let us all forget this so called Sinhala standardization non-issue and start talking about something useful and productive.
Donald Gaminitillake
Answer the questions posted rather than running away from the problems caused by your group.
Also note that I speak on SINHALA UNICODE , SLSI 1134 and SINHALA ISO
Not the unicode consortium.
You mix up Unicode consortium and Sinhala Unicode
What Sri Lanka registered in the UNICODE is incorrect and incomplete Sinhala
This is the issue. Helaya, Dino and his group knew only a sinhala typewriter not ICT
Read Maubima Sunday 15 Oct Page 36 written by Mr Pushpananda Ekanayake. This text suit Helaya. Helaya claims to an IT Professional expert trying to have a first class upper division but knows nothing how to use this IT as tool unable even to understand the basic of OCR!!!!! I post the question once again for this IT expert. How OCR identify “KU” & “DU” only using Sinhala unicode chart or SLSI 1134 or SInhala ISO ?
Donald Gaminitillake
Colombo
Noam Chomsky
It is obvious that we should ignore Donald’s usual question about ‘DU’ now, but I don’t think we are in such a hurry to say that there is no issue in Standardization of Singhala for IT (current Singhala Unicode standard).
1. One point JC Ahangama raised the question that current Singhala Unicode standard cannot support Pali and Sanskrit successfully. (I still think JC seems to be honestly trying to push some technical issue and practically trying to solve it, may be in correctly ;-) )
2. Other point it is similarly important to talk and criticizes the process of achieving that standard and to question the real productivity of that process . In fact I assume the major portion of theses discussions should be about this aspect of IT. (If you guys follow discourse on IT in West (mostly US ;-) ) is mostly about such topics, they don’t do the technical analysis of for example the Open Document Format on public discussion groups, but they discuss the issues relating to not having such standard and how Microsoft is trying to under cut such standard and things like. And there are so many forums that have very influential and independent views on many aspects of IT) If I understand correct, Lirneasia is also trying to initiate some thing similar, so I think we should use it to the fullest.
One more thing I think we should consider JC Ahangama’s efforts little more seriously and should not categorize him also with Donald. I downloaded his stuff and trying to dig in. (He mentions open source in some places, but I am yet to find what exactly in there open source. Don’t worry JC I am not trying to fork.)
It is also important to identify the real objectives of Anti-Donald voices show up in this group. One such category is just opposite of Donald, they don’t add anything just repeat the same answer and hatred (unproductive) again and again. I think what this people really want is, just add lot of unnecessary noise in to theses forums and suppress the more important debate, the really really important debate like What are the real bottle necks in utilizing IT is SL how much VK Samaranayeke, Gihan Dias, et el contributing to these frictions, is it cost effective what they have done so far, should the authorities allow these people to continue, Ultimately why IT is not really flourishing in Sri Lanka.
These noise makers finally make these discussions ineffective, by distracting the audience.
At last I think people dont come here to talk about some thing productive in IT (For that you need to go to your work place meeting room. After all this is not kernel.org is it.) – dont get me wrong. People come here to talk about why they can’t do certain things productively in IT (in Sri Lanka). To discuss about those road blocks (and may be to kick those shrivel old asses too ;-) ).
Helaya
Noam,
I agree with you that there are issues with Sinhala language introduction to computers but since we have come a long way, those issue should be settled within the Unicode structure and SLS 1134, not outside of it. Also it is time to put this ‘Du-ku-gu’ rubbish to rest.
I also agree with you JC should not be lowered to the level of an O/L dropout who has not done even an iota of work but gives “pora talks”. JC has actually done some research in this area so we should take him seriously.
JC Ahangama
OK, Naom and Helaya,
Naom you say you downloaded the stuff that I put out for y’all to test. That looks like a first for this place. Now, then get serious and take it piece by piece and say what you have to say about them.
Let’s begin with the fundamentals: the alphabet. Tell us merits and demerits. It represent the entire Sinhala sound set. Challenge me. Let’s se if Unicode Sinhala or Dual-script is practical.
Let’s honestly bring out its weaknesses and try me for answers or admissions. If you do not want to install WorldPad, add files for complex scripts and uninstall them. The residue would allow the orthographic font to work inside Notepad (because Microsoft does not take back the system services that ought to have been there from the beginning. Who knows that they were there originally and they removed it when they started shipping? Also, how come Adobe can show the font in their programs?
Go and read the Install.txt file that has some tech links that highlight the problems Unicode has in general (not just Sinhala). These docs are by Microsoft and IETF, not street persons.
Helaya, you say since a lot of time was spent on Unicode we should stick with it. And, go for the $53 million debt too? How about the time we spent in the US on the alternative? The time I spent on it was really money lost because my time was taken away from my work. There’s no boss paying me.
You say you have answered my questions repeatedly. I did not ask questions. I answered questions you guys put out, and you become silent or change names and hide. I don’t know which is which.
I was asked to change the link to the Dual-script Sinhala files. They are improved and stationed at:
http://www.sinhalaheritage.com/ahangama/
This is only for a limited time and only for genuine testing and feed back.
Donald Gaminitillake
Quote
Also it is time to put this ‘Du-ku-gu’ rubbish to rest.
Unquote
No way you can put aside sinhala language away
Answer the question posted for OCR
Donald Gaminitillake
Colombo
Helaya
Dear Readers,
I am not here to answer O/L dropout’s questions. If they know any thing about ICT, it is their problem. Not mine.
However, if any of you think that I run away from arguments, here are my responses.
1. First of all I do not think for an application like OCR, we need any standards. We need standards only for information exchange. We do not need standards for each an every thing.
Somebody can specify every car should have four wheels, a steering wheel in the right front seat etc., but it is up to the manufacturers to decide the models and the features of a car. If govt. forces the same standard on every car manufacturer the world will be a very boring place to live. So OCR application developers can always have their own systems as long as they can convert the Sinhala to same Unicode compatible font set once they are scanned and recognized.
2. Even if anyone does not like it an imposes a standard one can always use the Unicode code point (0DAF0D8B) to recognize the character.
So, not just one, but two solutions.
This is the LAST TIME I respond on this matter. This is just to show that we have ALL the answers for this stupid O/L dropout has been raising for so many days. This mottaya does not even know the difference between Dhammapada and Loveda Sangarava and give “pora talks” about saving the Sinhala language.
Sinhala language should be saved only from those mudalalis who tries to make it proprietary so that they can earn a fortune from that.
I think this man is somebody escaped from the Angoda metal hospital. We should let the hospital authorities know about this, so all these unnecessary issues are solved.
Donald Gaminitillake
What you scan from the OCR can you cut and paste with note pad into word and then into helawadane and/or Thibas? further into linux Unix Apple using the sinhala unicode or SLSI 1134
Where is “DU” “KU” and “GU” in Sinhala unicode to use in OCR?
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote from Helaya posting 18
“Sinhala language should be saved only from those mudalalis who tries to make it proprietary so that they can earn a fortune from that”
unquote
Now you can understand who was paid govt funds for 25 years!!and still in the govt pay roll — yet no proper Sinhala in Computer
Unicode Sinhala incompete and incorrect.
Donald Gaminitillake
Colombo
Helaya
Only the computer illiterate mottayas can say stupid things like no Sinhala in Computers.
1. Sinhala Unicode standard had been established. It is correct and complete. Not a single sane person had ever challenged its completeness or correctness. (Escapees from Angoda don’t count)
2. Sinhala standard key board is there.
3. There are so many Sinhala sites using Sinhala Unicode compatible fonts.
4. There are even chat rooms and forums using Sinhala Unicode.
5. Sinhala Unicode compatible fonts support all major Operating Systems.
6. There are so many Sinhala Unicode based font developers. Companies and individuals.
7. Printing industry heavily uses Sinhala fonts with no difficulties what so ever. They publish not only Sinhala text, but also Pali and Snaskrit texts.
8. All major Sinhala Newspapers are now page made by Computers.
9. You can send e-mail and SMS in Sinhala.
10. OCR and handwriting recognition applications have been developed.
If computers cannot handle Sinhala how can one explain all these developments?
If anyone still claims there is no Sinhala in computers after all these advances, he should first touch his head to see whether there are two horns growing.
Donald Gaminitillake
You cannot perform the following task
What you scan from the OCR can you cut and paste with note pad into word and then into helawadane and/or Thibas? further into linux Unix Apple using the sinhala unicode or SLSI 1134
Pls refer to the unicode sinhala chart and give locations for “GU” “DU” and “KU”
What Helaya had written is just gimmock
Donald Gaminitillake
Dino's son
Earlier they have removed my father’s (Dino) pants.
Now Donald’s pants are being removed. This is good for a change.
Donald, are you blind? I see Helaya has given the code point for ‘Du’ so many times. Why you repeatedly ask this question when he answered it? Don’t you realize that you are now wearing Emperor’s new clothes?
Hoo…hoo…Donald, there are girls here. Plese cover yourself asap…hoo…hoo…
oka thamai kiyanne Donald, anunta kala de – thamanta pala de…
Guys, this Donald uncle is a bigger crook than my father.
Never ever let Donald uncle go anywhere near ICTA. He will create annexes in ICTA and will sell them through his web site to gullible foreigners.
JC Ahangama
Helaya,
You listed ten things. If all that is true, Sinhala does not need any more work. I am removed from the scene and that could be why I have this perception that Unicode Sinhala is inadequate. I feel that romanized Sinhala gone further ahead. Who is right?
First, let’s agree that if Unicode Sinhala needs fixing, it should be done. We all can get together and agree on a fix. Yes, Helaya, only if it is broken. Same is true about romanized Sinhala.
This list seems like comprehensive. I humbly request that we take the POST 21 as the basis for the discussion. Let us please keep the discussion civil and respect each other.
So, as a rule, if one says I am not going to answer that question, then let’s have someone declare that that point is in suspense and go to the next. (How about Gamage acting as the moderator or Samarajiva? Sam, I hope you do not have any special interest than the betterment of the language) Don’t personally attack for not answering a question. That itself is admission of failure. So why hurt someone already injured?
May I ask Helaya to support each point and let others respond so that we could come to a logical conclusion? We must have proof to support each point. Sweeping statements like everyone does it or only fools do not agree are not proof. For instance, if one says the newspaper industry uses Unicode Sinhala, let someone from that industry confirm that with some verifyable proof. No one can attest to anything by saying ‘I do it’ while concealing their personal identity.
I won’t dare say 1. until Helaya agrees to this scheme (and of course, the audience). But let me just suggest going down the list in order is orderly. (duh).
samarajiva
Sincere apologies, but LIRNEasia cannot provide the requested moderation. As we have repeatedly explained these issues are not central to our work. If we don’t keep our focus and get our work done, no one will bail us out.
Please manage the discussion among yourselves. The suggestions for ground rules seem eminently sensible. Perhaps JCA can moderate, while also expressing his own views.
Donald Gaminitillake
2. Sinhala standard key board is there.
Nobody challange the Sinhala key board. input methods are not a problem.
There are several input methods but the same character should appear. The text has to be compatible amoung any OS any system any application
3. There are so many Sinhala sites using Sinhala Unicode compatible fonts.
4. There are even chat rooms and forums using Sinhala Unicode.
6. There are so many Sinhala Unicode based font developers. Companies and individuals.
seen on Limited to operating system and a limited font set — data not compatible across any OS
5. Sinhala Unicode compatible fonts support all major Operating Systems.
Data not compatible. What created by font “A” cannot be read using the Font “B”
7. Printing industry heavily uses Sinhala fonts with no difficulties what so ever. They publish not only Sinhala text, but also Pali and Snaskrit texts.
Limited to specific users. the sinhala news papers cannot be read in the web without downloading the font that created the text.
9. You can send e-mail and SMS in Sinhala.
Nokia has come out good issue the bad part is it will never be compatable with another brand or a mobile system. It will work with in the frame work of Nokia a act of monopoly. These act of Monopoly has to be stopped. There would not be a monopoly if the character allocation table is published. I do have the copyrights over the issue of using full sinhala characters.
With the unicode sinhala a limited number of sinhla may be used. But Nokia gives a location to “Yansaya” Nokia has its own charracter allocation table.
Nokia SMS any provider SMS readable only on Nokia Hand sets
Dialog SMS only on limited handsets only on Dialog System
Celltell SMS only on limited handsets only on Celltell System
Mobitel SMS only on limited handsets only on Mobitel system
Example
If someone send a Sinhala SMS from Nokia handset OS to Sony on Dialog — the SMS is not readable!!!
Is this what we wanted.
OCR issue has been answered before
Unicode Sinhala is incorrect and incomplete
Donald Gaminitillake
Colombo
Donald Gaminitillake
Re Sinhala e-mail
You need the same font that the E mail was created to read the content.
All above violates the concept of unicode consortium
Quote from unicode consortium
What is Unicode?
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
unquote
“Sinhala unicode” will not perform any of the above tasks.
This is because Sri Lanka registered a limited number of Sinhala characters in the unicode consortium.
Donald Gaminitillake
Colombo
Donald Gaminitillake
Helaya a claimed to be an IT professional write the following on quote 18
Quote
1. First of all I do not think for an application like OCR, we need any standards. We need standards only for information exchange. We do not need standards for each an every thing.
Unquote
He contradicts the first phrase with the second, and the second with the third
Do I have give a lecture to him on ” information exchange”
These are the type of guys we have in our universities. He knows nothing else writing on behalf of someone else.
Donald Gaminitillake
Colombo
Helaya
JC,
Your post looks very sensible. I am agreed to the scheme, if everyone else agrees to the following conditions.
1. Okay, we can take post No. 1 as the base.
2. Donald had been asking the same stupid question about “Du”, “Ku” and “Gu” since time immemorial. This question has been successfully answered. For your information the code points for the same are given below.
DU = 0DAF0D8B
KU = 0D9A0D8B
GU = 0D9C0D8B
Since the successful answers are given Donald should stop asking this question repeatedly. So that issue is settled.
3. Donald makes so many false statements either to misled the public or because of his ignorance.
eg. [q] seen on Limited to operating system and a limited font set — data not compatible across any OS [uq]
This is wrong. fonts based on Unicode standard is compatible across the applications and across OSs. This has already been demonstrated to Donald at ICTA. So this issue too is settled. We need not discuss about it.
4. Anybody who challenges saying Unicode is incompatible and incomplete should have a solution of his own. You have your own solution. (though some may not agree that is the way) So we can argue with you. No problem. But Donald does not have ANY solution as an alternative. So there is no need to argue with such a person.
5. Donald does not understand how market works. For instance he thinks there should be one key board. This is a big joke. Even for English there are so many key boards. (eg. Dvorak, which is not very popular) So there is no need to have standards for each and everything we need standards only for information exchange.
Under these conditions, I am ready for a fruitful conversation. But definitely NOT with O/L dropouts who repeat the same stupid arguments because of their ignorance. Their stupidity / ignorance is their problem. Just because one person is stupid, we do not have to make the whole nation stupid.
Now I am sure that O/L dropuout will jump with the same age old rubbish.
We have sucessfully answered EVERY question he had asked. So my question is what is the use of asking the questions that had been already answered?
Do you think there is any use of answering such a person?
Do you think anybody sane can maintain such a stupid attitude?
Helaya
JC,
No dont get the facts wrong.
We can agre about opinions but fact should be maintained.
Donald said:
[q] Now you can understand who was paid govt funds for 25 years!!and still in the govt pay roll — yet no proper Sinhala in Computer[uq]
I replied:
[q] Only the computer illiterate mottayas can say stupid things like no Sinhala in Computers. [uq]
I gave ten examples and ALL major Sinhala newspapers using computers in their layout designs is one. There was no issue of Unicode.
What proof you need that newspaper companies use computers for layout purposes? You are free to visit any newspaper company (ANCL, Wijeya, Upali, Rivira, Maubima, Leader etc) and see yourself. There is no dire need to use Unicode compatible fonts here, because there is no dire need for information exchange.
Do you agree with me that any ordinary person can say there is no Sinhala in Computers when so many newspaper companies use them?
M. Rajapakse
With the powers that have been vested to me by the people of this country, I hereby declare this debate on Sinhala Standardisation OVER.
CHAPTER CLOSED.
Also please do not say anything against Prof. V. K. Samaranayake because he has been appointed as ICTA Chairman with a good reason.
Just like this country is the feudal property of families like Senanayake, Bandaranaike and Jayawardena families, ICT in Sri Lanka (including organizations like UCSC, CINTEC, Infotel and ICTA) is the feudal property of Prof. Samaranayake. So he is the best person to head ICTA. After he retires (i.e. after the doctor confirms the death) his son will take over the post of Chairman from him. If he does not have any sons post of ICTA Chairman will be given to the brother of his wife.
I hope this is clear.
Let us everybody now join hands and discuss how to develop this country instead of uselessly arguing.
Therunai?
Donald Gaminitillake
Quote
For instance he thinks there should be one key board.
Unquote
Who said so I never said this.
Nobody challange the Sinhala key board. input methods are not a problem.
There are several input methods but the same character should appear. The text has to be compatible amoung any OS any system any application
quote
DU = 0DAF0D8B
KU = 0D9A0D8B
GU = 0D9C0D8B
Unquote
These are just sequence NOT a Code POINT for “DU” “GU” or “KU” all three characters are not visible in SINHALA UNICODE or SLSI 1134
Quote
This has already been demonstrated to Donald at ICTA. So this issue too is settled. We need not discuss about it.
Unquote
I have answered this http://www.lirneasia.net/2006/05/standardizing-sinhala-for-it/
quote from 191
Oh That story is quite different
I went with Keerthi president of Sri Lanka Association of Printers (posting 19) & Delan
Manju too was present. This was the second meeting.he got angry and went out.
The “Yakshaya” part came on the first meeting.
Dr Gihan wanted a word from us to show in his computer with his system
I told to type “Yakshaya” into the note pad and copy and paste it to another application
The word “Yakshaya” what he typed into the note pad was not the “Yakshaya” apperaed on the second application.
In this meeting even he failed with the word “Yakshaya” he never got angry. He said it was a bug.(wow)
In the second meeting still he had no solution for “Yakshaya”. (after several months) He got angry and walk away embrassing Manju. Not only because of “Yakshaya” but due to other facts.Unfortunately I have not taken any notes.
These meeting was not on invitation form ICTA but by requested appointments with Manju.
Technically during the time we made objections to the SLSI 1134.
Instead of “Yakshaya” why not try “Rajapaksha” and see the results yourself Jehan. Hopefully someone will have to do a demonstration to the President. This was one of the visuals I planned for the Sirasa TV debate.
Donald Gaminitillake
Colombo
This was confirmed by 194 of the same blog
194 Keerthi on Jul 10th, 2006 at 9:32 pm
191. Donald Gaminitillake what he say there is correct the intention was not to criticize any one but to look at more lucrative solution. When Gamini was getting to an argument with Dr. Gihan I only suggested why not you tell a word and let Dr. Gihan type and copy to a different package. That was not successful. It was not a story like Jehan was trying to explain and let me clear Gamini from that since he want with me and Dilan.
I write just to clarify this matter only.
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote
nybody who challenges saying Unicode is incompatible and incomplete should have a solution of his own
Unqoute
I SAY ONLY “SINHALA UNICODE” is incorrect and incomplete
WHY — only limited number of sinhala characters were registered in the unicode consortium
Sri Lanka registered as a national standard with Unicode a table SLSI 1134 which is incomplete and incorrect. They have not given proper code points for all Sinhala characters. Therefore the font developers are unable to develop correct proper sets of fonts to use in computers.
I have the solution
My character allocation table for Sinhala Language is published ISBN 955-98975-0-0 (Contents do have Copyright areas & Patent pending areas©2000-2006)
Donald Gaminitillake
Colombo
Donald Gaminitillake
Re News paper issue
Limited to specific users. the sinhala news papers cannot be read in the web without downloading the font that created the text.
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote’
What proof you need that newspaper companies use computers for layout purposes? You are free to visit any newspaper company (ANCL, Wijeya, Upali, Rivira, Maubima, Leader etc) and see yourself.There is no dire need to use Unicode compatible fonts here, because there is no dire need for information exchange.
Unquote
News papers are a part of information exchange.
Free media movement and free and fare publication for the public to read and obtain facts are given from the News paper
“”no dire need to use Unicode compatible fonts here”” you are censoring the free flow of information.
I think the journelist shoould comment on this issue
This is a part of censorship imposed
Donald Gaminitillake
Colombo
Helaya
JC,
Saw the same stupid response?
Do you think there is any use talking to a complete idiot like this?
I ignore this modaya. (like so many wise men had done before me) When somebody replies to him out of sheer courtesy (thinking this is somebody who does not know the facts and willing to learn) he takes advantage of it. Then he goes on repeating the same stupid arguments. (which already been answered for the umpteenth time.)
I don’t see any need to discuss this issue further, because there is nothing to discuss.
It is all settled. Unicode can support any Sinhala character and Unicode standard is inter-exchangeable over different applications and operating systems. This is all what we need. So the issue can be put to rest.
If you have any other issues we can discuss.
Donald's son
Macho Dino’s son, I agree with you.
My dad is a bigger crook than yours.
Your dada flicked from the $ 15 million Samsung deal so you can enjoy the life in USA. My dada is trying to get a patent for the Sinhala alphabet thinking that he would earn enough money to send me to USA.
However, your dada is smarter than my dada so the former always wins.
I hope soon he will be appointed as CEO of ICTA, so he would buy me a bicycle. You know, the CEO of ICTA is being paid millions of dollars for doing nothing. So it is an ideal job for my dada.
JC Ahangama
In one word, Exasperation!
I expected to discuss one point at a time. Instead, there came a barrage of a response. Then Helaya, you say it is settled, which I cannot blame you for. Isn’t it a bit unfair, though? Here , it’s impossible for a balanced discussion and you cleverly take advantage of it. I just laugh. How can anybody hear when everyone is screaming different things to drown out the others? I hope I am not doing the same, sigh.
This is not the forum for discussing anything substantial. But please let me say the following. Note: This is in the interest of the big picture — the future! And we are together.
Unicode Sinhala is here to stay. Therefore, let’s get it to a condition that is best for us. I can apply for a revision. You can too. The Koreans did it too. They call it the ‘Korean Mess’. But this is not for one man to do but for you people back at home.
I would like to have a good backup plan for Sinhala and Tamil. Let’s back up a bit. This is for you to see what I mean. Back up plan for what? Backup plan for Unicode Sinhala. The Thais have it. What they did was, they have the characters at Latin-1 space and the same set so many codepoints higher (just behind Sinhala) in exactly the same order. This way, they add or subtract one number to go from one code page to the other.
They were able to do this because they are always united behind the king and, though their language is similar to the Indics, they did not get caught up in the Abugida thing. Abugida essentially says that these languages are strange. So strange that they cannot be supported the way they do the European languages. (I think that this is the result of Indians harping too much on the grandiosity of Sanskrit, godly letters, cerebral sounds etc. I would too like to do something mischievous to such people).
Using the little linguistics I know, I see plainly that this idea of our being too strange is untenable. ISO too did not think we are strange until like 1999. Something happened after Unicode took control of character tables (soon after they talked to ISO guys bringing them down to America). They quickly disbanded the ISO-8859 committee. We were thrown into the abyss of Abugida by our own ineptitude.
First Sinhala and Tamil are different from the Indics of India. Tamil is Dravidian and all other Indics listed on Unicode are mixtures of Indo-European and Dravidian. Sinhala has no Dravidian at all. That is why I say that the ancestors of Sinhalese must have spoken an Indo-European sound set and not an ancient Dravidian. Therefore we should not have allowed Unicode to model Unicode according to Devanagari.
Now, if Sinhala is so Indo-European, we should be able to write like the other Indo-European languages and have cognates with them too. Both are correct. Romanized Sinhala is proof of the former. The idea of cognate is important. That allows attesting some spelling. We ignored this fact and stipulations of old grammar books and included taaluja sanyuga naasikyaya in the Unicode set. Look at jçaana and know. They are cognates and start with a consonant followed by a nasal. Most Europeans stick to this pattern of spelling though pronunciation has simplified. Why do we add confusion to something already quite clean? For those who do not know romanized Sinhala, jç means hal jayanna saha þaaluja naasikyaya.
Read this article by venerable Microsoft. It tell what a daunting task lies ahead for Internationalization, code word for writing programs to accommodate those who are given codepoints above Latin-1:
http://www.microsoft.com/typography/unicode/cs.htm
Some excerpts from it:
BDCS
No leadbytes fall within the lower 127 (ASCII) range, but some trailbytes do.
ASCII
the only means of interchanging data across all major languages (without risk of character mapping loss) is to use ASCII (or have all sides understand Unicode)
Unicode
Unicode is not a technology in itself. Sometimes people misunderstand Unicode and expect it to ‘solve’ international engineering, which it doesn’t. Unicode is an agreed upon way to store characters, a standard supported by members of the Unicode Consortium. (
JC Ahangama
continuation…
Character
…no single character is assumed to identify a language in itself. character “a” can be a French, German or English “a” (or Sinhala “a” (JC))
appearance should reside in the font as an artistic issue, not the code point as an engineering issue. (Hence, Sinhala ‘a’ could look like Sinhala ayanna (JC)).
Although it’s technically possible to ship one font which covers all Unicode characters, it would have very limited commercial use, since end-users in Asia will expect fonts dedicated and designed to look correct in their language. (What happened to the wonderful idea that Unicode is plain text?)
I say, we should reconsider Unicode Sinhala with practical use of it in mind (not a set of trillion codepoins please, Donald — will never happen). Until such time, romanized Sinhala is eminently capable to support Sinhala in *current* technology and forward if necessary.
Donald Gaminitillake
Like JC says a revision of Sinhala Unicode is a must
The problem here is they do not accept the fact that present Sinhala unicode need any revision or a correction
They also say Sinhala need not be standared
When Unicode consortium say we need a standard and all the major languages are been working on a ISO or Unicode standard we Sinhalese say need not have an Standard
Quote from Unicode consortium
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Unquote
People with brains will understand and know who is winning.We need a proper sinhala character allocation table like what I have published to revise the Sinhala incorrect unicode
Donald Gaminitillake
Colombo
Noam Chomsky
Even though I do not want to get in to answering/supporting detailed technical stuff here, I can’t help. Donald and Helaya both are attacking each other using half truth, because if they reveal the complete facts it will expose what they are trying to hide. So they make some theories (which is acceptable or profitable for them) using half truths and then go in to make it work some how. When it does not work on it’s technical credibility (because there is no full credibility to it) then they reach out to the politicians to help it. Use the political power to make it work. This is the sad truth about our country. From J.L. Peris, Epasinghe, V.K. Samaranayeka and Gihan Dias to Donald that is the truth. Ok now we get down to the facts.
1. I hate to even mention this question but what can we do.. Can Unicode support letter “Du”. Donald says can’t but Helaya (and Gihan and V.K. and Prof. J. B. Disanayeke ) says it can support. So what is the truth? Yes there is no one single code point for “Du” but using two code points Unicode can represent “Du”. So what is the real question here?
a. Do we really need single code point to represent each possible character in singhala alphabet? Is it possible? Is it desirable technically? If possible and desirable why Singhala Unicode front did not go fore it. Was it so hard to come up with this so obvious solution for those PhD guys? (One to one mapping is the simplest and obvious solution any fourth grader can come up with in a few minutes, yeah.. its so easy Donald) Or is there some thing else these people don’t tell us? What’s up with those legal battle stuff mentioned in fonts.lk given as the reasons why they are so late to come up with the Unicode Standard.
b. Can current Unicode Standard support all possible characters support using one or more code points? Are there some characters left out. If so what is the reason for it. Was there any acceptable academic or community process/debate followed to get in to these decisions? Was it a pragmatic decision not to give a direct code point to “Yansaya” and give a direct code point to “Fayanna” (or is it just because Prof. J. B. Disanayeke invented. Did the people worked on this had any spine to come up with genuine questions like that. Or are they just afraid to talk about things like that. Ultimately can V.K. Samaranayeke et. el. create such free rained intellectual environment?)
If any one wants to make this debate intellectual they have to talk about above question (a) and (b). Ideally if Helaya (or who ever want to support Singhala Unicode) and Donald are real intellectuals they should go back and spend some time to write some kind of white-paper on similar analysis and publish in there respective sites. Then we can talk about it with proper references.
2. Yes now there are so many news papers in internet and uses singhala. There are even mobile phones supporting Singhala. (Very long time back 1997, I was personally involved in deploying Singhala Pager ) By saying that Helaya is trying to say there is no problem in using singhala in IT. But is it true? No not at all. Just some fonts.
3. Don’t take supplementary technologies like OCR here Having it or not having done does not prove any thing.
Sorry I’ll continue later
Noam Chomsky
What the fuck? The times and the order it apears in this list got messed I think.
Noam Chomsky
May be caching and refreshing I hope..
Noam Chomsky
I think JC’s suggestion on applying a revision for current Singhala Unicode is good. But do we need one, if yes why?
Donald need it just to push forward his copyright solution may be (I am not over judgmental Donald, You really spoiled your credibility here, but I think you can work on it, we are humans after all). If not, say clearly that what ever get in to this revision should be free for everyone personal and business. Helaya may say “no we don’t need”. But get real here man. Was that “Yakshaya” problem resolved (I really don’t know).
JC may have some good suggestions. (May be to expand the scope to include Pali and sanskrit)
I just want to have prominence to “Yansaya” insted of “Fayanna” ;-) what the fuck.
And any answers to my two questions in above post also give some suggestions.
Also I would like to suggest at least for the most part this revision has to be backward compatible with existing Singhala Unicode. (Other wise that Meegamman guy will get pissed. No not really, if he is a real business man. Because he can sign new set of contracts to upgrade for existing clients. Money, money, maony, mo..ney.. So get in here brother..)
So people who are working on these matters for real get back to a proper discussion. Shall we?
Helaya
JC / Noam,
The very concept behind Unicode is to have a UNIVERSALLY ACCEPTED pages for each language currently used in the world, instead of proprietary pages for each language.
For instance, when I say Unicode 0D85, I do not have to mention anything about Sinhala, but everybody knows I refer to ‘A’ in Sinhala. There is no other letter (in any language, say English, Spanish, Hindi or Urdu etc) or symbol is represented by the same code. Or vice versa (this is very important) Sinhala ‘A’ (ayanna) is not represented by any other code.
So there is always a one to one UNIVERSALLY ACCEPTED relationship between the code 0D85 and Sinhala ‘ayanna’
The benefit of this is not seen by Sinhalese, because Sinhala is only used in Sri Lanka. However, if you take a language like Tamil which is used in so many different countries there is a clear benefit, because there is no need to have national standards. Unicode is the international standard so it is only a question of adapting the same. (Imagine what will happen if India has one national standard for Tamil and Malaysia has another national standard incompatible with the Indian one.)
Conclusion: Whatever the issues we have we cannot move from Unicode. That is the way forward. (JC, we do not have any objection of Romanised Sinhala. You can use it for specific purposes. But still it is a proprietary system.)
Now, let me come to my point.
Since Unicode has to make allocations for ALL languages in the world it gives only 128 codes for one language.
This is not a problem for languages like English, which has only limited number of letters. However, all Indian languages have thousands of characters, so they cannot be obviously represented in a matrix of 8×16.
That is the simple reason why we need a combination of “two” codes for “Du”. However there is no rule that a character should always be represented by 8 byte code. If it is represented by 16 byte code or 32 byte code there is no issue at all. That would have been a problem in 1960s where computer memory was a luxury, but definitely not today.
What our Donald mottaya says is that we should have a proprietary character allocation table and map each character in Unicode with a proprietary 8 byte code.
That is not necessary. I can give many reasons why.
(a) Adding another proprietary component will add no value to Unicode. In fact, it will reduce the value of Unicode.
(b) What he suggests is already there. There is no need to add anything extra. For example Nokia phones or Helawadana use their proprietary character allocation tables. There is not need for these proprietary character allocation tables to be standard. It is like building buses. You can have the same standard chassis, but you can build the body the way you want.
(c) If anyone still wants a standard character allocation table, one can always expand the same Unicode chart to have the sacred matrix Donald is talking. In fact, one does not need to have it on paper. It can be done mentally. In the matrix these are the relevant code points. DU = 0DAF0D8B, KU = 0D9A0D8B, GU = 0D9C0D8B. These are NOT sequences, but the code points in the Unicode character allocation table.
Finally why Donald uncle is so cross is he planned to make some money by taking a patent for something he thought an ingenious invention, but now he realizes that it is not possible. It is not the fault of Prof. VKS or Dr. GD. Donald mottaya is simply too stupid to think something original. Nobody gives patents for duplicate work.
CONCLUSION:
Unicode Sinhala is now complete and accurate. It is inter-exchangeable among operating systems and applications. There is no need to “correct” it. Why try to mend something when there is nothing wrong in it? So now the issue is how best to use Unicode Sinhala in applications.
Hope my long explanation settle the issue forever.
We can discuss things but please do not ask the same questions again and again. (It is a pain explaining these basics again and again. How many times we have done this now? I feel like teaching a class of some really dumb students.)
Donald Gaminitillake
Quote
Since Unicode has to make allocations for ALL languages in the world it gives only 128 codes for one language.
Unquote
Unicode cannot restrict the number of characters per language
Unicode consortium accepts unconditionally any National standard any number of characters.
This is a very worng concept of Helaya
Quote’
I say Unicode 0D85, (…..=”A” ayanna)
Unquote
Likewise give the code point ofr “DU” “KU”abd “GU”
These are not registered in the unicode Sinhala
I never said anything agaiunt UNICODE CONSORTIUM but ONLY on SINHALA UNICODE which is incorrect and incomplete set
As per unicode registration
0D9A = SINHALA LETTER ALPAPRAANA KAYANNA
0D8B = SINHALA LETTER UYANNA
Where is “KU”?
Please do not fool the public man
Donald Gaminitillake
Colombo
Donald Gaminitillake
To Noam
I have the copyrights by the law of the country.
Any author of a publications do have copyrights
A Character allocation table for Sinhala is only done by Donald Gaminitillake
ISBN 955-98975-0-0 (Contents do have Copyright areas & Patent pending areas©2000-2006)
This is the standard for Sinhala unicode. I offered this rights to the SLSI 1134 through objections but declined. 2003 to CSSL they too declined. Now every one is worried only about my copyrights
SLSI will have to give full credits to Donald to revise the Sinhala unicode and SLSI 1134
Dino and his group has no place on this issue.
Donald Gaminitillake
Colombo
Donald Gaminitillake
Again Helaya is misguiding the public
see
Latin script
Unicode Blocks : Latin-1 Supplement
Unicode Blocks : Basic Latin
Unicode Blocks : Latin Extended-A
Unicode Blocks : Latin Extended-B
Unicode Blocks : Latin Extended Additional
Five pages are allocated. every latin character is represented by a code point
There is a proper standard.
Likewise Sinhala too need a proper standard to move the IT forward in Sri Lanka
Donald Gaminitillake
Colombo
Helaya
JC / Noam,
I have explained things in detail. There is nothing more to say.
So I do not try to respond when a typical mottaya ask the same stupid questions I have addressed for the umpteenth time.
Donald Gaminitillake
The pubic TV debate is still open event
Why dont you come for a public debate with me with computer visuals
Only on the topic “Sinhala unicode is incorrect and Incomplete”
Donald Gaminitillake
Colombo
Donald Gaminitillake
Dear Noam
With any uncicode sinhala font — type our preseidents name “Rajapaksha” using kayanna badhi shayanna- Our president write with kayanna badhi shayanna—
into the note pad then copy and paste it to word then to Linux then into Helawadana then into Thibas then into Apple
Publsih the results in this blog
Donald Gaminitillake
Colombo
Donald Gaminitillake
Our President is also the Minister for IT
Our preseidents name “Rajapaksha” using kayanna badhi shayanna-
Like the Unicode Sinhala 0D9A = SINHALA LETTER ALPAPRAANA KAYANNA
Our presidents kayanna badhi shayanna – is not represented in the Sinhala Unicode Chart.
Poor President he is unable to write his own name using the Sinhala unicode chart.
Donald Gaminitillake
Colombo
Noam Chomsky
Hi Helaya,
I like when people try to explain things clearly. Thanks for your post 45. I hope every one understands the universal-ness of Unicode. and all languages has to coexist in it without any overlapping or ambiguities. There should not be any question about it. JC understands it and thats why he says his solution is kind of intermediate, I think. But what I am not certain is whether there a set limit for the number of code points for any given language. Helaya says it is 128. Donald is saying no. Please, Can some one enlighten us. (At least it can’t be unlimited I hope) Basically this will resolve part of my question (a) in post 41. Lot more to go. So Helaya, don’t come to conclusions so fast.
Noam Chomsky
Hi Donald,
I think if cut and past does not work form one application to another application then its a problem with those applications. Once it is defined that 0D85 is “Ayanna” it should not matter who(application ) reads it, its the applications responsibility to take 0D85 as “Ayanna” and show “Ayanna”. But my doubt is whether there is any characters in singhala language (including Pali and Sanskrit) which is not possible to represent using current Singhala Unicode standard. (Please understand I am Ok with having combinations of codes to represent one character) Then we can say it is incomplete.
Helaya
It is 128 fixed. For any language it is the same.
You can check Unicode charts any other language.
http://www.unicode.org/charts
Helaya
Noam,
[q] So Helaya, don’t come to conclusions so fast. [uq]
What I told you so far were NOT conclusions, but basic fundamentals of Unicode, everyone who talks about the subjects should know.
I cannot help people do not know even these fundamentals.
In our country, it is the fashion that people who knows nothing about subjects to give “pora talks” about them.
Wimal Weerawansa too is an O/L dropout and he tries teaching us economics. Our gon thadiya here too is an O/L dropout and he does not know the difference between Dhammapada and Loveda Sangarawa.
Ane me mee harakunta tokkak annindawath kenek nethi heti!
Donald Gaminitillake
Quote
It is 128 fixed. For any language it is the same.
Unuote
Why Latin script falls into 5 pages!!!
Examples Arabic falls into several pages. Korean Japanese Chinese
No one can dictate terms for a National Language in any country
Noam
“A” ayanna is registered in Sinhala unicode but not “DU” “KU’ “GU” “Yansaya” ‘Repaya” and many more unregistered in unicode.
Give me the authority I will register the sinhala characters & show you the results
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote
Ane me mee harakunta tokkak annindawath kenek nethi heti!
Unquote
Show this blog to the H.E. President and Mr Wimal Weerawansa you will get it from them
Donald Gaminitillake
Colombo
Helaya
My dear Donald uncle,
Educating you is a very challenging task. Your teachers might have had a hard time with a mottaya like you.
You are an O/L dropout but you behave more like a grade 5 dropout.
Not all languages are the same. There are things called language families. (People with low IQ levels sometimes might find it difficult to realize that)
Chinese, Japanese and Korean languages use one way of writing.
Latin languages use another way of writing.
Arabic uses another way of writing.
Sinhala is an Indic language and it uses another way.
When I say other languages I meant “similar languages”. All Indic scripts have 128 code points and that is more than enough. I never thought there were people so stupid to compare Sinhalese with Chinese or Japanese. But apparently there are such gon thadiyaas in the world.
Donald, you are stupid. But do not demonstrate it. Try to hide that. Donr behave like a donkey.
All Sinhala characters are represented in Sinhala Unicode. You are just too stupid to realize that. That is your problem. Not ours.
Sinhala Unicode is complete and correct. There is no need to mend something which is not broken.
Donald Gaminitilake
Writing direction have no problems with unicode
Japanese could be written left to roght
top to bottom or right to left
Give me the authority I will register the sinhala characters & show you the results
Donald Gaminitillake
Colombo
Helaya
Noam / JC,
I think now it is clear if anything needs to be amended it is Donald’s brain. That is what is incomplete, not Unicode.
Unfortunately, even the modern science cannot make a completely stupid person intelligent.
We the Unicode users have never found anything wrong with it. So I am surprised, how just one non-user can find errors when the users themselves do not complain.
You will realize it when you see this is only an attempt make a business. This is an attempt to monopolize the Sinhala language for commercial purposes. What Prof. Samaranayake had done was to stop it. (The nation is ever thankful to him for doing it.)
Helaya
Donald,
[q] Give me the authority I will register the sinhala characters & show you the results [uq]
I hereby give you the authority. Now show your results.
Noam Chomsky
Hi Helaya,
Looking at the chats in http://www.unicode.org/charts it looks like there are many languages which has more than 128 code points. Please refer me to a rule which says it allows only 128 code points for a language or some thing similar.
I said “So Helaya, don’t come to conclusions so fast. ” because in your post 45 you had a whole section as conclusions.
[Q]
CONCLUSION:
Unicode Sinhala is now complete and accurate. It is inter-exchangeable among operating systems and applications. There is no need to “correct” it. Why try to mend something when there is nothing wrong in it? So now the issue is how best to use Unicode Sinhala in applications.
[UQ]
What I wanted to say was your CONCLUSION was kind of premature. Because the facts you have mentioned are still under debate (At least in this blog ;-) ). So we will try to get them clear one by one.
Does it really matter even if Donald is a O/L dropout. Does it matter that VK Sam has a PhD. I don’t think so. We will get into those some other time.
Thanks anyway.
Noam Chomsky
Hi Donald,
[Q]“A” ayanna is registered in Sinhala unicode but not “DU” “KU’ “GU” “Yansaya” ‘Repaya” and many more unregistered in unicode.
[UQ]
I told you clearly that I am Ok with having combinations of codes to represent one character. Why you again and again say this fucking “DU” “KU’ “GU” thing. Don’t try to mix it with “Yansaya” ‘Repaya” and “Rakaransaya”. Missing “Yansaya” ‘Repaya” and “Rakaransaya” is not similar to not having direct code point for “DU”. I assume what Prof. J.B. Disanayeke thinks is we can write singhala without “Yansaya” and dont need it at all.
Helaya
Noam,
I do not know why you people find it so difficult to understand this.
Sinhala is an Indic language. All Indic languages are handled the same way. There is nothing special for Sinhala. ALL Indic languages have 128 codes.
My conclusions are NOT premature. I am talking about accepted standards, nationally and internationally. Not about things hanging from the air.
Nobody needs any special authority to change Unicode. It is a system works on consensus. If one can prove ones own solution is superior to the rest, they accept it. So if you don’t like it, even you can make the suggestions and change it. However, it is a different question whether others agree with you or not.
Helaya
Noam,
As you say, this ‘DU, KU, GU, HU’ rubbish and not having yansaya, repaya and raakaransaya are two different issues.
All three yansaya, repaya and raakaransaya are there in SLS 1134 now, handled indirectly.
However, if you think they should appear in the Unicode Sinhala chart itself, you can suggest the same to Unicode consortium. Still there are enough empty squares left for these characters in the Sinhala chart.
Unicode chart is not something static. It is a dynamic and developing chart. so you can always add new characters WITHIN Unicode chart. Several Tamil and Bengali characters were added to respective charts like that.
All you have to do it to wrote to Unicode and suggest the additions.
That is the way to do it, not ranting here in this blog.
Donald Gaminitillake
Helaya “Unicode chart is not something static.”
You have admitted the Dino’s Sinhala unicode is incomplete and incorrect.
Noam we need these “Du’ “KU’ ‘GU’ and many more to write sinhala and scan sinhala in the OCR and other applications.
We can add thousands of Sinhala characters to Unicode Consortium not restricted to 128
Sinhala Language has to be protected from the people like Helaya & Dino who bend their knees to the US Organizations and distroy Sinhala
Hey what aboyt the debate why are you running away!!
You have not answered how to write the name of our president
He cannot write his name using the Sinhala unicode chart
Sinhala unicode is incorrect and incomplete
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote
‘All you have to do it to wrote to Unicode and suggest the additions.
unquote
I wont write to Unicode
ICTA chairman will have to write and say the present Sinhala unicode that he made is incorrect and Incomplete. Accept the new set as per Donald chart
Donald Gaminitillake
Colombo
Donald Gaminitillake
Quote
suggest the additions
Unquote
This clearly prove that the present Sinhala unicode is incorrect and incmplete
Donald Gaminitillake
Colombo
Donald Gaminitillake
Noam
Quote
I told you clearly that I am Ok with having combinations of codes to represent one character.
Unquote
We are not typing. typewriter “yes” not in a computer.
The input method could be a combination but the “DU” has to be in the unicode chart to represent
Best example is
the German umlaut character Ä is listed in unicode as one character. Defined as ISO 10646 Table 2 row 00 Latin -1 Supplement
DEC 196 as = latin capital letter A with diaeresis
but to keyin this umlaut character Ä we have to combine several keys. Irrespective to the input method umlaut character Ä will appear in any platform –why — it has a proper code point in the unicode consortium.
Combine input methods are acceptable but the character has to be registered in the unicode consortium. That is why I ask you “where is DU in the Sinhala unicode chart”
Donald Gaminitillake
Colombo
Helaya
Noam / JC,
See this gon thadiya is repeating the same rubbish.
Do we have to waste our time because of his mee haraka’s stupidity?
Moo pissan kotuwen penala aapu ekek.
Donald Gaminitillake
Noam
Hope you still remember the “basic” on DOS
Those days if we key in
?1 2
he computer gives
3
“?” “1” ” ” “2” “3” all reside inside the computer
Likewise the “DU” has to be registered inside the unicode sinhala chart to comeout when you key in the combinations
Donald Gaminitillake
Colombo
Donald Gaminitillake
“plus” character gone missing
?1 plus 2 ok
Donald
Donald Gaminitillake
Helaya why dont you come out with your proper name given by your parents?
Donald Gaminitillake
Colombo
Helaya
Noam,
[q] Likewise the “DU” has to be registered inside the unicode sinhala chart to comeout when you key in the combinations [uq]
This mee haraka does not know it, but it already happens in Unicode.
His problem is just because he does not know things he assumes they are not there.
Man kivve, moota ara paiththiyan. Koheda yana pissek.
samarajiva
GSM Association Press Release 2006
Microimage & Open-Plug Take Top Prizes at Inaugural Asian Mobile Innovation Awards
17th October 2006 – Singapore: The GSM Association (GSMA) last night announced the winners of the inaugural Asia Mobile Innovation Awards at the world-famous Raffles Hotel. Microimage won the award for Most Innovative Mobile Application or Content and Open-Plug scooped the prize for Most Innovative Technology Development. The competition – sponsored by Ericsson – is exclusively for young, small and start-up companies across Asia developing exciting, innovative technologies, applications and content for the fast moving mobile space.
Microimage has developed the world’s first patented local language messaging and content browsing application to provide customised, local language support for entry-level applications on mobile devices in emerging markets.
“We are absolutely delighted that the judges selected our localised messaging and content browsing service as Asia’s Most Innovative Mobile Application or Content offering,” said Harsha Purasinghe, CEO, Microimage of Sri Lanka. “The award is fitting testimony to the hard work of the team at Microimage and will act as a catalyst to our company’s international expansion. We would like to thank the operators, partners and customers who made this happen.”
Open-Plug’s ELIPS is the first open software framework designed for mobile phones, which enables ELIPS-based handsets to be tailored and configured far more quickly according to the requirements of operators.
“This award will catapult Open-Plug onto the global mobile stage and will be a major milestone in the development of our small, innovative company,” said Nicolas Sauvage, CEO, Open-Plug Taiwan. “For our component-based ELIPS framework and MMI to be named Most Innovative Technology Development in Asia is an incredible achievement. We couldn’t be more proud of, and grateful to, the people that made this happen – our employees and partners.”
Microimage and Open-Plug were judged winners by a prestigious judging panel comprising senior representatives from Bharti-Airtel, KTF, Smart Communications, SpinVox – winners of this year’s global innovation award – and Ericsson after a round of ‘elevator’ pitches at the 3GSM World Congress Asia in Singapore.
In addition to their awards, the two winners will receive an automatic place on the shortlist for the innovation category of the GSMA’s Global Mobile Awards at the 3GSM World Congress in Barcelona next February. For full details on the Global Mobile Awards, visit: http://www.gsmawards.com
About the GSM Association:
The GSM Association (GSMA) is the global trade association representing 700 GSM mobile phone operators across 215 countries of the world. In addition, more than 180 manufacturers and suppliers support the Association’s initiatives as key partners.
The primary goals of the GSMA are to ensure mobile phones and wireless services work globally and are easily accessible, enhancing their value to individual customers and national economies, while creating new business opportunities for operators and their suppliers. The Association’s members serve more than two billion customers – 82% of the world’s mobile phone users.
For further information contact:
Mark Smith/David Pringle
GSM Association
Tel: 44 7850 229 724 / 44 795 755 6069
Email:press@gsm.org
Richard Fogg / Alex Sowden
Companycare
Tel: 44 118 9395900
Email: richard.fogg@companycare.com / alexs@companycare.com
Donald Gaminitillake
Quote
[q] Likewise the “DU” has to be registered inside the unicode sinhala chart to comeout when you key in the combinations [uq]
Unqquote
This again clearly confirm a hidden set of characters exsist
If it is in the UNICODE SINHALA CHART show the location like the “ayanna”
It is not in the Sinhala unicode chart.
Therefore Unicode Sinhala is incomplete set
Donald Gaminitillake
Colombo
Noam Chomsky
It looks like Donald has some incurable mental disease. Otherwise he should not be making this kind of ignorent points. I have to agree on that with Helaya.
But there is no use attacking each other like this as “Pissu ballao dennek”(පිස්සු බල්ලෝ දෙන්නෙක් වගේ)
We dont need one to one mapping for all characters just for the heck of it, even if it is possible. But Helaya also trying to make some inacurate rules by saying Unicode allowes only 128 code points for a given language. You dont have to say that. if 128 is enough, then thats ok. You dont have to lie, it makes Donald take it for his advantage.
My point is in Unicode chart “Papilla” has a code point. But “Yansaya” dont have a point. I saw some guy called Anuradha’s old blog https://beta.blogger.com/comment.g?blogID=15283331&postID=114286904904617600 trying to explain this fact.
[Q]
Input from several Sinhala scholars and experts have been taken into account to decide that repaya, rakaaransaya and yansaya should not be basic code points, but should be produced by using sequences of code points, as they are linguistically alternatives forms. In other words, they are there as sequences of code points, not as single code points. Nevertheless, they are there, so the claim is wrong.
If Mr Donald’s claim is “yansaya, rakaaransaya and reepaya should be individual code points”, that would be more valid. However, somebody has to eventually decide what’s basic and what’s not, and it has already been done. Technically, this is not an issue at all.
[UQ]
So one thing is clear, Donald already knows about ins and outs of this but he is trying to fool new guys like us.
Other thing is even above blogers debate does not clerly and certainly say what kind of intelectual resoning allowed them to be so strict about not to allow any alternative. It only make things so confusing. And allow implementers do all this monkey business. Thats why there is a standerd. But I understand thechnically anything is possible. What I think is either who ever those “several Sinhala scholars and experts” were missled by some false like 128 character rules, Or they were some ego maniacsa or bunch of ligustic Dictators.
Just forget about Doneld and please to explain your statement
[Q]
All three yansaya, repaya and raakaransaya are there in SLS 1134 now, handled indirectly.
[UQ]
little bit more here.
P.S.
Hi JC
Is JC vacationing this last few days of fall? I am also planing to go south for few days. Where are you? You are silent for some time now.
Helaya
Noam,
I am not lying. Unicode allows 128 for every Indic language, and Sinhala is an Indic language. If you know any other language like Malayalam, Hindi (devanagari), Tamil, Bengali, Oriya, Panjabi, Gujarati, Kannada, Telgu, Nepali etc (more than 15 in the list) you will appreciate my point. All these languages are similar with minor differences and differences in letters. So Unicode handles all of them in the same manner.
128 codes are allocated to every one of these languages. Sinhala is no exception.
On the other hand Unicode handles English like languages in another manner, Chinese-Japanese and Korean languages in a completely different manner and Again Arabic in a different manner.
You can compare Sinhala with Hindi/Tamil. You cannot compare Sinhala with Chinese.
I think Dr. Ruvan Weerasinghe touched the point you raise here and has given a clear explanation. (I will search for it and report it for you.)
Noam, the decisions regarding Unicode have not been taken in ad hoc manner. Actually, Sinhala Unicode has been designed clearly following the guides how other SIMILAR languages are handled in Unicode. For example, for repaya, Sinhala Unicode follows the same way it was handled in Devanagari.
In fact there are so many things happening at the University of Colombo Language Technology Research Lab right now on this topic. If you need more info please visit http://www.ucsc.cmb.ac.lk/research/ltrl/index.html
Helaya
[q] This again clearly confirm a hidden set of characters exsist [uq]
Gon thadiya has finally made a discovery.
Of course, all these applications can have and they may actually have “hidden” (ie. hard coded) sets of character allocation tables. So what? What is wrong with it?
Any proprietor software developer hides his source. That is his IP.
There is absolutely no need for that to be there in the Unicode chart.
Donald keeps his private parts “hidden” by clothes. But that does not mean he cannot and does not use them. Neither does anybody can force Donald to bring his pants down and expose his private parts. Same here.
Helaya
Congratulations Microimage and Harsha.
You are in the right track. You have a long way to go ahead. Surely there are so many people who will stand in your way because they are green with jealousy and they try to block others just because they themselves cannot come up with anything original.
Please neglect such stupid Gon thadiyas. Critics have never made this world. Builders did.
I wish you all the best for the future. You have brought glory to the motherland. Our heartiest congratulations for you!
Helaya
Noam,
This is Dr. Ruvan Weerasinghe’s long explanation on this issue. Hope it will be useful. Especially please read the second part where he talks about Donald G.
hi,
tried to read as much of this thread as humanly possible in one sitting. being someone who was either involved or aware of the history of sinhala support on computers let me try to summarize very briefly the way i see it:
1. several parties including the university of colombo developed sinhala font support (which amounted to both keyboard and display support in the good old days!).
2. a cintec committee comprising of sinhala scholars of various persuations sat down to set some standards for terminology and keyboard layout and a standard ‘code’ in the 80’s.
3. this work progressed failry slowly especially since most work was done on microsoft operating systems which kept changing drastically their input and rendering methods (from dos to win 3.x to win 95…).
4. since none of these parties (including university of colombo) was a microsoft developer no access to internals were available so that sinhala support lagged the OS by several years!
5. almost when the people involved were giving up with keeping up with microsoft’s constant changes, unicode came on the scene. to sri lanka’s surprise, a foreigner had submitted a proposal for a standard for sinhala in unicode.
6. a team from the cintec committee defended the proposed sri lankan standard against (5) above in 1997 and got the initial sinhala unicode proposed by sri lanka accepted in 1998.
7. several discussions followed with microsoft (with the help of some sri lankans working within) to no avail. sinhala just hadn’t shown up on the world scripts map as yet as far as electronic support was concerned – pure economics. mind you, even indic scripts hadn’t made it yet with all the potential of a huge market.
8. indians kept taking up issue with microsoft well into the new millenium before they got a hearing.
9. microsoft’s own language support has only really matured over the past 2-3 years. and obviously they started with support for the economically sensible ones first. their unicode support still has to rely on successive versions of the rendering engine being written/updated – this is one of the reasons you need to download software to see sinhala unicode (you are not just downloading a font, you are infact downloading a display driver too – not to mention a keyboar driver).
10. incidentally, why one does not need to download thai fonts is because thai support was added to windows with win 2000 or XP already (unlike sinhala). conversely why you don’t need to download sinhala fonts in some sites is because of a font embedding technology which still ‘infects’ your computer with (yet another) proprietary font!
10. the lk-lug meanwhile has also developed increasingly maturing sinhala unicode support on linux.
11. incidentally, all references to unicode sinhala refer to the SLS 1134 standard. the confusion about dates is due to the fact that, while the unicode standard was accepted in 1998, the SLS 1134 re-adopted it with some minor changes proposed during the unicode consultations only in 2001 (i.e. SLS 1134:2001). this was further updated with more detail (since some details were not spelt in the original) in 2004 which is the current standard SLS 1134:2004. this is a fairly comprehensive document which spells out in fair detail how the system works. unfortunately the unicode system itself is non-trivial to the layman. it would have been much easier to understand if there was just a contiguous space containing all 642 or 1440 over 2000 or indeed 64,000 (depending on how inefficiently you want to represent sinhala) composite glyph shapes of sinhala.
12. over a year ago microsoft finally showed signs of being serious about sinhala (aparently owing to a BBC tender for indic language support for its site). since the reason for their interest that time died away, we were still unable to get anything but a homegrown workaround to ensure sinhala unicode support on win 2000 and XP.
13. more recently, as someone in this thread had pointed out, they seem to be finally serious about getting sinhala into vista.
14. unfortunately, all is not yet ironed out – be realistic, sinhala is still not as important for microsoft in an economic sense. it is only important for them in a political sense (bcos many countries in the region are embracing open source).
15. support on linux doesn’t suffer all the drawbacks of having to deal with economic interests of a single company. it is an excellent opportunity to race ahead by empowering ourselves. unfortunately, sri lanka has not put in enough effort to promote FOSS – instead taking a ‘neutral stand’ on it. it is in the interest of coutries like ours to push the FOSS agenda so that we no longer have to rely on big business in other countries to determine our fate.
though i don’t wish to ‘throw mud’ at anyone, i think i also need to put this thread in context with respect to donal G:
1. donald G was someone who i first met online over 2 years ago.
2. i listened to his ’solution’ just as did some others like gihan D
3. like some of you on this thread, i thought he needed to be enlightened – after all he knew and cared a lot for how the printed word looked (being from that industry), but had no real grasp of the complexities of unicode.
4. like several of you i and gihan did try to show him how the system works (and how it is adopted by the whole world – not only for representing language as text, but also by programming languages, XML, databases, the works).
5. while initially he didn’t realize the pervasiveness of unicode (and consequently the futility of fighting it), later he came to realize it and in this thread he distances himself from that stand. aparently he is now against sinhala unicode (SLS 1134) rather than unicode per se.
6. this is why i now believe that he actually understands quite a bit (though not fully the intricacies involved).
7. may take on his reason for ‘pretending to not understand’ is (a) he had spent some 2 years in japan figuring out his so called sinhala representation – making the fatal mistake of aligning it with the japanese (or CJK) – there’s much more granularity in indic languages such as sinhala and (b) based on this he had a mistaken notion that he would be able to take out a patent for his ’scheme’.
8. two years ago, he mentioned about his patent and his impending ‘proof’ by implementing his ’scheme’. in this thread i find he claims he needs 12-18 months of work (if he gets the funds) to implement it.
9. i was present at the SLSI meeting where donald G’s objections were heard. it was almost frustrating that the chairman (mr. rohan wijeratne) gave him such a long audience. he was very tolerant. at the end of the meeting all the printing association supporters who he brought were satisfied that sinhala unicode was the way to go – but not donald. i know some of them from the industry, and they are no more with him on this.
10. mr. ahangama is the only other person to bring a serious objection to sinhala unicode, but i find one can discuss openly with him. he may not yet be convinced (by the look of his posts on this thread) but i hope the technical explainations given by people will convince him. like him, a casual user like me would rather prefer to type latin characters and get a simple transliteration scheme to output the sinhala…
11. with donald it is different. after gihan and i argued sincerely trying to convince him, various others tried too. while i told them my experience, i never discouraged them from trying. anuradha, several others at lk-lug, several harsha’s and now harshula. the end result is the same. misquoting, misrepresenting and misleading others.
12. for this reason, i wouldn’t be surprised if no one takes him on in a TV debate – unless we can find someone who has not been on the job of convincing donald already! this is not because they’d be afraid of his argument, but because all of them without exception have come to the conclusion that he is no longer sincere in his motive.
finally with respect to the practicality of sinhala unicode:
1. at the SLSI meeting referred to above, the lakehouse folk (arguably one of the biggest consumers of this technology) confirmed that they have been working with sinhala unicode and had no problems of the sort brought up by donald.
2. wasantha deshapriya has outlined some of the practical uses ICTA has put sinhala unicode to use to
3. we at the university of colombo (ucsc) are developing the collation algorithm, text corpus, OCR, TTS etc using sinhala unicode (see http://www.ucsc.cmb.ac.lk/ltrl/)
4. we will be the first to admit that microsoft support for sinhala unicode is not perfect yet. nor is linux support quite there. the point is that unicode support IS there on both these platforms and all other technologies.
5. rest assured, the unicode consortiums policy will *never* allow for donalds proposed scheme of representing all individual composite glyph patterns (ligatures) for sinhala since all indic languages (as all european ones) are allocated just a single code page (with only CJK languages allocated more owing to their pictogram nature).
it is time for donald to honestly admit that his interest is no longer in language but in a patent or compensating for what he must perceive as a waste of 2 years of work designing his scheme in japan.
if you truly love the sinhala language and want to help its progress in the e-world, you need to redirect your immense energies and that of others in this thread to get on with the job…
if not, at least let us know what more ‘proof’ you need to do so: there are websites, wordprocessed documents, spreadsheets, databases, OCR, TTS, mobile apps…
please do not consider this a flame. i have tried to be as impartial as i can be given the immense amount of harm caused by donald’s campaign of misinformation.
regards,
ruvan.
ps: i must admit in a quircky kind of way, i admire your ability to still oppose this against all the evidence! it is a rare skill – unfortunately put to destructive use.