Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


A New Cyrillic to Latin Symmetrical Transliteration Algorithm
That Facilitates Usage of Non-Localized Software
Version 2.2

By Dr. Nikolai Bezroukov

Perevesti deloproizvodstvo na latinskij alfavit

Ilf and Petrov

I believe that there are too much Russian code pages (KOI, Dos-alternative(866), Windows(1251), and many others like a couple of Ukraninan codepages ;-). That sad fact puts webmasters of Russian language sites in a definite disadvantage. Either each page need to be transliterated on the fly via CGI script or one needs to store two or three copies of the same page. Neither solution looks very appealing. Unicode seems to be a perfect solution but its adoption is slow...

Usage of Cyrillic alphabet lead to another drag on resources -- the necessity to install and use a special codepage and localized version of a word processor and other software. Usage of non-localized word processor for Cyrillic texts is possible, but some functions do not work properly (spellchecker is one of the main problems -- even if you know format of it and it's possible to add user dictionary, you will have problems with word selection, etc. as Russian letters codes are not recognized as legitimate alphabetic letters). Also working with HTML in many non-localized editors in some circumstances can lead to conversion of Cyrillic letters into hex equivalents (saving in Netscape, etc.).  Of course Unicode can save us from a lot of troubles, but it has a long way to go before universal adoption.

It’s probably time for another attempt "Perevesti deloproizvodstvo na latinskij alfavit" (the Russian catch phrase that can be very approximately translated as  "to convert documentation to the Latin alphabet" -- sorry I cannot imitate Bolsheviks bureaucratic jargon of early twenties in English) as Ilf and Petrov recommended in their famous novel.

There are several, often incompatible requirements for Cyrillic-Russian transliteration schemes. Among them:

I will try to solve this problem by proposing yet another Cyrillic-Latin transliteration scheme the I called Softpanorama scheme as I will use it in converting old Russian texts from Softpanorama into HTML. The scheme is symmetric in a sense that pure Russian text can be converted correctly back to Cyrillic encoding. Mixed texts need an additional tag to switch the language (<en> and </en> and <ru> </ru> can be used for HTML).

Proposed transliteration scheme was tested on MS Word, MultiEdit and Kedit and proved to be compatible with existing spellcheckers. It main advantage is that it does not use any special symbols other that ` and ‘. So Russian words can be added to the dictionary and existing spellchecker can used.

The proposed encoding use three ideas:

  1. to use the letter "y" as an escape character for representing Russian lovels "yo, yi, ye, yu, ya, but representing the letter y when no ambiguity arise like in staryj;
  2. the use of letter h as trail letter for representing complex sounds ch sh zh, a compromise "tch"  was adopted due to phonetical reasons with th as a substitute in (extremely rare) cases where ambiguity arise.
  3. use of the  letter kh instead of the letter h in all cases where due to use of (2) ambiguity arise ("ishod" should be written as "iskhod").  So "skhodka" starts with "s" not, but "shopot" starts with a letter "sh" not letter "s".

Here is the proposed transliteration scheme is close to GOST 16876-71

  1. b
  2. v
  3. g
  4. d
  5. e
  6. yo -- for example -- yolka (generally this letter is considered to be obsolete and one can use e instead)
  7. zh-- for example zhaba, zhivot, ozhog
  8. z
  9. i
  10. -- for example -- jog, jod,  postoj, oy, domoj.
  11. k
  12. l
  13. m
  14. n
  15. o
  16. p
  17. r
  18. s
  19. t
  20. u --  for example  tupolev
  21. f
  22. h or kh, (but always kh after c,s,z) -- hudozhnik, but skhodka
  23. c -- for example cvetok, cvet, car'
  24. ch -- for example chislo, chechnya, cherep, but Ckhaltubo
  25. sh -- for example shkola, but skhodka
  26. shh  or tch whatever is inambigous. - - for example  shhelkunchik or tchelkunchik
  27.   -- for example  kon’, ogon'
  28. -- for example ot`ehat
  29. or yi (yi before a,e,o,u)  -- for example -- my, but milyie
  30. ye --  for example--  yeho, yeto
  31. yu -- for example --  yula, yug
  32. ya -- for example -- ya, yasno, yavno, yabloko

Final Comments

It's clear that this transliteration scheme does not preserve the sorting order of the Russian alphabet. Implementation of sorting can use conversion to unicode or a look ahead buffer.

Acknowledgment

The author is deeply grateful to Stanislav V. Fjodorov <faber@tomcat.ru> for his article on transliteration of Cyrillic alphabet, published in the newsgroup fido7.ru.english. Webliography below is based mainly on the Stanislav V. Fjodorov findings.

Webliography


Copyright 1996-2000, Nikolai Bezroukov. This article is distributed under GNU license or artistic license. Standard disclaimer applies.



Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Created: May 16, 1997; Last modified: May 25, 2008