|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
By Dr. Nikolai Bezroukov
Perevesti deloproizvodstvo na latinskij alfavit
Ilf and Petrov
I believe that there are too much Russian code pages (KOI, Dos-alternative(866), Windows(1251), and many others like a couple of Ukraninan codepages ;-). That sad fact puts webmasters of Russian language sites in a definite disadvantage. Either each page need to be transliterated on the fly via CGI script or one needs to store two or three copies of the same page. Neither solution looks very appealing. Unicode seems to be a perfect solution but its adoption is slow...
Usage of Cyrillic alphabet lead to another drag on resources -- the necessity to install and use a special codepage and localized version of a word processor and other software. Usage of non-localized word processor for Cyrillic texts is possible, but some functions do not work properly (spellchecker is one of the main problems -- even if you know format of it and it's possible to add user dictionary, you will have problems with word selection, etc. as Russian letters codes are not recognized as legitimate alphabetic letters). Also working with HTML in many non-localized editors in some circumstances can lead to conversion of Cyrillic letters into hex equivalents (saving in Netscape, etc.). Of course Unicode can save us from a lot of troubles, but it has a long way to go before universal adoption.
It’s probably time for another attempt "Perevesti deloproizvodstvo na latinskij alfavit" (the Russian catch phrase that can be very approximately translated as "to convert documentation to the Latin alphabet" -- sorry I cannot imitate Bolsheviks bureaucratic jargon of early twenties in English) as Ilf and Petrov recommended in their famous novel.
There are several, often incompatible requirements for Cyrillic-Russian transliteration schemes. Among them:
I will try to solve this problem by proposing yet another Cyrillic-Latin transliteration scheme the I called Softpanorama scheme as I will use it in converting old Russian texts from Softpanorama into HTML. The scheme is symmetric in a sense that pure Russian text can be converted correctly back to Cyrillic encoding. Mixed texts need an additional tag to switch the language (<en> and </en> and <ru> </ru> can be used for HTML).
Proposed transliteration scheme was tested on MS Word, MultiEdit and Kedit and proved to be compatible with existing spellcheckers. It main advantage is that it does not use any special symbols other that ` and ‘. So Russian words can be added to the dictionary and existing spellchecker can used.
The proposed encoding use three ideas:
Here is the proposed transliteration scheme is close to GOST 16876-71
It's clear that this transliteration scheme does not preserve the sorting order of the Russian alphabet. Implementation of sorting can use conversion to unicode or a look ahead buffer.
The author is deeply grateful to Stanislav V. Fjodorov <faber@tomcat.ru> for his article on transliteration of Cyrillic alphabet, published in the newsgroup fido7.ru.english. Webliography below is based mainly on the Stanislav V. Fjodorov findings.
Copyright 1996-2000, Nikolai Bezroukov. This article is distributed under GNU license or artistic license. Standard disclaimer applies.
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Created: May 16, 1997; Last modified: May 25, 2008