Cyrillic in LaTeX and Postscript and Unicode
Cyrillic, LaTeX, Postscript and Unicode
Let's see how to deal with the Cyrillic alphabet in
LaTeX, Postscript, and Unicode.
You might also be interested in this
free on-line journal on Postscript and PDF.
And just in case you need a PDF to Word converter, use
OpenOffice
with its
PDF Import Extension.
You can import PDF and export as Word, all with free software!
Here are some other great sources of detailed information on
how to deal with LaTeX fonts:
Fonts and TeX
The LaTeX Font Catalogue
Cyrillic in LaTeX
The following produces Cyrillic Postscript output for me. There are other ways of doing this, see this tug.org page for a starting point.
1 — Use the cyrillic package
Include this line in your latex preamble:
\usepackage{cyrillic}
If that fails with an error about being unable to find
the cyrillic package, and you cannot find the
right software package to add, you could try
my cyrillic.sty file.
Put it somewhere, and now you will use it like this:
\usepackage{/home/cromwell/.latex/cyrillic}
Note that you do not include the ".sty"
part of the file name.
2 — Define some Cyrillic fonts
Include these lines in your latex preamble,
right after the above \usepackage line:
\newcommand{\cyrrm}{\fontencoding{OT2}\selectfont\textcyrup}
\newcommand{\cyrit}{\fontencoding{OT2}\selectfont\textcyrit}
\newcommand{\cyrsl}{\fontencoding{OT2}\selectfont\textcyrsl}
\newcommand{\cyrsf}{\fontencoding{OT2}\selectfont\textcyrsf}
\newcommand{\cyrbf}{\fontencoding{OT2}\selectfont\textcyrbf}
\newcommand{\cyrsc}{\fontencoding{OT2}\selectfont\textcyrsc}
%%%% cyrrm = "Roman", or really upright, normal font
%%%% cyrit = Italic (cursive forms of letters)
%%%% cyrsl = Italic (non-cursive forms of letters)
%%%% cyrsf = Sans-serif
%%%% cyrbf = Bold-face
3 — Use transliteration
For the most part, latex will "do the right thing"
turning your ASCII typing into Russian, if you are careful.
Examine your output carefully, and adjust as needed.
I have no idea about transliteration of other Slavic
languages that use Cyrillic — Ukrainian and
Belarussian are probably close enough, but for Serbian,
Macedonian, Bulgarian, and other South Slavic languages,
look at some of those web sites above.
Some special characters:
\cprime
|
ь | "soft sign" |
\cdprime
|
ъ | "hard sign" |
\u{i}
|
й | "i-kratkaya" |
\"{e}
|
ё | "yoh" |
\`{e}
|
э | "e-oborotnoye" |
\`{E}
|
Э | "E-oborotnoye" |
For the last two, э/Э, notice that the quote character is the one that slopes down, like an opening quote mark within text, and not the more commonly used single-quote. That is, ASCII 0x60 and not 0x27.
You may need to specify that a letter or a letter pair
should stand on its own.
For example, this:
{\cyrrm{Tsiolkovski\u{i}} y Krushchev}
will generate this:
Циолковский
ы
Крущев
But if you really want this instead:
Тсиолковский
Ы
Крушчев
you need to specify that the T and sh should
not be combined with what follows.
Put them inside curly braces:
{\cyrrm{{T}siolkovski\u{i}} y Kru{sh}chev}
This page is very helpful for Cyrillic and many other character sets: http://www.bitjungle.com/isoent/index_files/isoent-ref.pdf
Here is an example, the same silly text in several fonts:
%% Start the document
\documentclass[letterpaper,12pt]{letter}
\usepackage[dvips]{color}
\makeatother
%% Cyrillic font definitions
\usepackage{/home/cromwell/.latex/cyrillic}
\newcommand{\cyrrm}{\fontencoding{OT2}\selectfont\textcyrup}
\newcommand{\cyrit}{\fontencoding{OT2}\selectfont\textcyrit}
\newcommand{\cyrsl}{\fontencoding{OT2}\selectfont\textcyrsl}
\newcommand{\cyrsf}{\fontencoding{OT2}\selectfont\textcyrsf}
\newcommand{\cyrbf}{\fontencoding{OT2}\selectfont\textcyrbf}
\newcommand{\cyrsc}{\fontencoding{OT2}\selectfont\textcyrsc}
\newcommand{\lat}{\fontencoding{OT1}\selectfont}
%%% Support for "\begin{alltt}...\end{alltt}"
\usepackage{alltt}
%%% Support \euro for Euro symbol
\usepackage{textcomp}
\makeatother
\newcommand{\euro}{\textsf{\texteuro}}
\begin{document}
{\cyrrm{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
{\cyrsl{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
{\cyrit{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
{\cyrsf{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
{\cyrbf{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
{\cyrsc{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}
\end{document}
And, here is the result,
after generating Postscript with latex,
and converting that to PNG and cropping it with
convert from the ImageMagick suite:
Alternative method, using the Babel package
As an alternative, you can use the Babel package. This allows you to type Cyrillic characters and get them nicely typeset. The problem is that you are required to type Cyrillic! Use the method shown above to get LaTeX to transliterate your ASCII based writing.
Greek, on the other hand, works easily with the Babel package:
\documentclass[letterpaper,12pt]{article}
\usepackage[russian,greek,english]{textcomp}
\usepackage[latin1]{inputenc}
\usepackage[T1,T2A]{fontenc}
\begin{document}
The last language listed will be the active
(or default) one. The others can be chosen
for large blocks:
\selectlanguage{russian}
Горбачёв
\selectlanguage{greek}
Ellhnik`o ke`imeno.
\selectlanguage{english}
You can also insert short pieces of text in
arbitrary languages, even within paragraphs
of a different language:
The capital of Russia is
\foreignlanguage{russian}{Moskva.}
The capital of Greece is
\foreignlanguage{greek}{Ajhna.}
\end{document}
As for the mapping from ASCII input and Greek output:
a |
b |
g |
d |
e |
z |
h |
j |
i |
k |
l |
m |
n |
x |
o |
p |
r |
s |
c |
t |
u |
f |
q |
y |
w |
| α | β | γ | δ | ε | ζ | η | θ | ι | κ | λ | μ | ν | ξ | ο | π | ρ | ς | σ | τ | υ | φ | χ | ψ | ω |
'a |
'e |
'h |
'i |
"i |
"u |
'o |
'u |
'w |
| ά | έ | ή | ί | ϊ | ϋ | ό | ύ | ώ |
Also see:
LaTeX/Internationalization
Wikibooks
Multilingual LaTeX with the Babel Package
Reed College
How to make LaTeX2e understand Russian
Greek in LaTeX
Cyrillic in Postscript
The theory is that you can do something like the following and get Postscript that renders Cyrillic:
%!%%Creator: Your Name Here%%BoundingBox: 0 0 792 611%%%% Postscript Cyrillic demo%%%% Define measurements in millimeters, 1 mm = 2.834645 Postscript point/mm { 2.834645 mul } def%% Use the Cyrillic-Italic font. Could be just Cyrillic, etc:/Cyrillic-Italic findfont 12 scalefont setfont%% Move to the location (50mm, 50mm) and Russify my name:50 mm 50 mm moveto (Robert Vilhelmoviq Kromvell) midshowshowpage
You have to figure out the quirky character-to-character mapping. Some letters are obvious, just the ASCII letter that is pronounced in a Roman-alphabet language much like the corresponding Cyrillic one is in a Slavic language. Others are not, like these in the following list.
The one that I cannot figure out is the Cyrillic character "ya" or я — if you know how to do this with the ASCII encoding, without remapping your keyboard to a Cyrillic character set, please let me know!
-/_ for "eh/EH"
j/J for "zh/ZH"
y/Y for "e-kratkaya/E-KRATKAYA"
[/{ for "yuri/YURI"
]/} for "yu/YU"
h/H for "kh/KH"
q/Q for "ch/CH"
w/W for "sh/SH"
x/X for "shch/SHCH"
c/C for "ts/TS"
+/\# for "YAT/yat"
Cyrillic in Unicode
The real answer is what you find at the Unicode organization's site. I have this HTML table for my own use — I have a copy on my laptop, and I don't have to bother with rendering the Unicode PDF file. Plus, you can see how well your browser renders Unicode... Both Firefox and Chrome (and even Konqueror last I checked) do a fine job on Linux and OpenBSD.
Unicode describes the codes as:
0400-040f — Cyrillic extensions
0410-044f — Basic Russian alphabet
0450-045f — Cyrillic extensions
0460-0481 — Historic letters
0482-0489 — Historic miscellaneous
048a-04f9 — Cyrillic extensions
04fa-04ff — Additions for Nivkh
0500-050f — Komi letters
0510-0513 — Cyrillic extensions
Codes 048a-04ff are mostly for Cyrillic representation of
non-Slavic languages like Sami, Azerbaijani, Yakut, Tatar,
and so on.
0500-0513 are entirely for Cyrillic representation of
Komi, Enets, Khanty, Chuckchi, etc.
Read the Unicode pages
to see how arcane some of these are, and to get explanations
or at least names and language attributions for all the
characters.
To use this table:
Place the code between
&#x
and
;.
So, the Russian word
да
is created with:
да
| Basic Russian Alphabet | |||||
Ѐ 0400 | А 0410 | Р 0420 | а 0430 | р 0440 | ѐ 0450 |
Ё 0401 | Б 0411 | С 0421 | б 0431 | с 0441 | ё 0451 |
Ђ 0402 | В 0412 | Т 0422 | в 0432 | т 0442 | ђ 0452 |
Ѓ 0403 | Г 0413 | У 0423 | г 0433 | у 0443 | ѓ 0453 |
Є 0404 | Д 0414 | Ф 0424 | д 0434 | ф 0444 | є 0454 |
Ѕ 0405 | Е 0415 | Х 0425 | е 0435 | х 0445 | ѕ 0455 |
І 0406 | Ж 0416 | Ц 0426 | ж 0436 | ц 0446 | і 0456 |
Ї 0407 | З 0417 | Ч 0427 | з 0437 | ч 0447 | ї 0457 |
Ј 0408 | И 0418 | Ш 0428 | и 0438 | ш 0448 | ј 0458 |
Љ 0409 | Й 0419 | Щ 0429 | й 0439 | щ 0449 | љ 0459 |
Њ 040a | К 041a | Ъ 042a | к 043a | ъ 044a | њ 045a |
Ћ 040b | Л 041b | Ы 042b | л 043b | ы 044b | ћ 045b |
Ќ 040c | М 041c | Ь 042c | м 043c | ь 044c | ќ 045c |
Ѝ 040d | Н 041d | Э 042d | н 043d | э 044d | ѝ 045d |
Ў 040e | О 041e | Ю 042e | о 043e | ю 044e | ў 045e |
Џ 040f | П 041f | Я 042f | п 043f | я 044f | џ 045f |
Ѡ 0460 | Ѱ 0470 | Ҁ 0480 | Ґ 0490 | Ҡ 04a0 | Ұ 04b0 |
ѡ 0461 | ѱ 0471 | ҁ 0481 | ґ 0491 | ҡ 04a1 | ұ 04b1 |
Ѣ 0462 | Ѳ 0472 | ҂ 0482 | Ғ 0492 | Ң 04a2 | Ҳ 04b2 |
ѣ 0463 | ѳ 0473 | ҃ 0483 | ғ 0493 | ң 04a3 | ҳ 04b3 |
Ѥ 0464 | Ѵ 0474 | ҄ 0484 | Ҕ 0494 | Ҥ 04a4 | Ҵ 04b4 |
ѥ 0465 | ѵ 0475 | ҅ 0485 | ҕ 0495 | ҥ 04a5 | ҵ 04b5 |
Ѧ 0466 | Ѷ 0476 | ҆ 0486 | Җ 0496 | Ҧ 04a6 | Ҷ 04b6 |
ѧ 0467 | ѷ 0477 | ҇ 0487 | җ 0497 | ҧ 04a7 | ҷ 04b7 |
Ѩ 0468 | Ѹ 0478 | ҈ 0488 | Ҙ 0498 | Ҩ 04a8 | Ҹ 04b8 |
ѩ 0469 | ѹ 0479 | ҉ 0489 | ҙ 0499 | ҩ 04a9 | ҹ 04b9 |
Ѫ 046a | Ѻ 047a | Ҋ 048a | Қ 049a | Ҫ 04aa | Һ 04ba |
ѫ 046b | ѻ 047b | ҋ 048b | қ 049b | ҫ 04ab | һ 04bb |
Ѭ 046c | Ѽ 047c | Ҍ 048c | Ҝ 049c | Ҭ 04ac | Ҽ 04bc |
ѭ 046d | ѽ 047d | ҍ 048d | ҝ 049d | ҭ 04ad | ҽ 04bd |
Ѯ 046e | Ѿ 047e | Ҏ 048e | Ҟ 049e | Ү 04ae | Ҿ 04be |
ѯ 046f | ѿ 047f | ҏ 048f | ҟ 049f | ү 04af | ҿ 04bf |
Ӏ 04c0 | Ӑ 04d0 | Ӡ 04e0 | Ӱ 04f0 | Ԁ 0500 | Ԑ 0510 |
Ӂ 04c1 | ӑ 04d1 | ӡ 04e1 | ӱ 04f1 | ԁ 0501 | ԑ 0511 |
ӂ 04c2 | Ӓ 04d2 | Ӣ 04e2 | Ӳ 04f2 | Ԃ 0502 | Ԓ 0512 |
Ӄ 04c3 | ӓ 04d3 | ӣ 04e3 | ӳ 04f3 | ԃ 0503 | ԓ 0513 |
ӄ 04c4 | Ӕ 04d4 | Ӥ 04e4 | Ӵ 04f4 | Ԅ 0504 | |
Ӆ 04c5 | ӕ 04d5 | ӥ 04e5 | ӵ 04f5 | ԅ 0505 | |
ӆ 04c6 | Ӗ 04d6 | Ӧ 04e6 | Ӷ 04f6 | Ԇ 0506 | |
Ӈ 04c7 | ӗ 04d7 | ӧ 04e7 | ӷ 04f7 | ԇ 0507 | |
ӈ 04c8 | Ә 04d8 | Ө 04e8 | Ӹ 04f8 | Ԉ 0508 | |
Ӊ 04c9 | ә 04d9 | ө 04e9 | ӹ 04f9 | ԉ 0509 | |
ӊ 04ca | Ӛ 04da | Ӫ 04ea | Ӻ 04fa | Ԋ 050a | |
Ӌ 04cb | ӛ 04db | ӫ 04eb | ӻ 04fb | ԋ 050b | |
ӌ 04cc | Ӝ 04dc | Ӭ 04ec | Ӽ 04fc | Ԍ 050c | |
Ӎ 04cd | ӝ 04dd | ӭ 04ed | ӽ 04fd | ԍ 050d | |
ӎ 04ce | Ӟ 04de | Ӯ 04ee | Ӿ 04fe | Ԏ 050e | |
ӏ 04cf | ӟ 04df | ӯ 04ef | ӿ 04ff | ԏ 050f | |