eoconv 1.0


NAME

eoconv - Convert text files between various Esperanto encodings


SYNOPSIS

eoconv [-q] --from=encoding --to=encoding [file ...]

 Options:
   --from       specify input encoding (see below)
   --to         specify output encoding (see below)
   -q, --quiet  suppress warnings
   --help       detailed help message
   --man        full documentation
   --version    display version information
 Valid encodings:
   post-h post-x post-caret pre-caret html-hex html-dec
   iso-8859-3 utf7 utf8 utf16 utf32


DESCRIPTION

eoconv will read the given input files (or stdin if no files are specified) containing Esperanto text in the encoding specified by --from, and then output it in the encoding specified by --to.


OPTIONS

--from=encoding
Specify character encoding for input

--to=encoding
Specify character encoding for output

-q --quiet
Suppress non-essential warning messages

-? --help
Print a brief help message and exit.

--man
Print the manual page and exit.

--version
Print version information and exit.

CHARACTER ENCODINGS

post-h
ASCII postfix h notation

post-x
ASCII postfix x notation

post-caret
ASCII postfix caret (^) notation

pre-caret
ASCII prefix caret (^) notation

html-hex
ASCII HTML hexadecimal entities

html-dec
ASCII HTML decimal entities

iso-8859-3
ISO-8859-3

utf7
Unicode UTF-7

utf8
Unicode UTF-8

utf16
Unicode UTF-16

utf32
Unicode UTF-32


ESPERANTO ORTHOGRAPHY

Esperanto is written in an alphabet of 28 letters. However, only 22 of these letters can be found in the standard ASCII character set. The remaining six -- `c', `g', `h', `j', and `s' with circumflex, and `u' with breve -- are not available in ASCII; neither are they among the characters available in the common 8-bit ISO-8859-1 character encoding. Therefore, while the six special Esperanto characters pose no problem for handwritten texts, they were impossible to represent on standard typewriters, and are somewhat problematic even on modern-day computers. Various encoding systems have been developed to represent Esperanto text in printed and typed text.

POSTFIX-h NOTATION

This was the solution proposed by the creator of Esperanto, L. L. Zamenhof. He recommended using `u' for `u-breve' and appending an `h' to a letter to indicate that it should have a circumflex. However, the letters `u' and `h' are already part of the Esperanto alphabet, so using them for another purpose invites ambiguity and mispronunciation. It also makes conversion of Esperanto text to postfix-h notation `lossy' or one-way; it is generally not possible to convert from postfix-h notation via automated means. This notation suffers from the additional drawback that the text cannot be sorted with standard rules for ASCII text.

POSTFIX-x NOTATION

This is the most common ASCII notation encountered today. It involves appending an `x' to a letter to indicate that it should have an accent (be it circumflex or breve). Since `x' is not a letter in the Esperanto alphabet, no ambiguity results. However, ASCII sorting algorithms still fail with postfix-x text.

PREFIX- AND POSTFIX-CARET NOTATION

Two slightly less popular ASCII encodings are to prepend or append a caret (`^') to a letter to indicate that it should have an accent.

ISO-8859-3 (LATIN-3)

ISO 8859-3, also known as Latin-3 or South European, is an 8-bit character encoding for Turkish, Maltese, and Esperanto. High-bit characters are used to encode the accented Esperanto letters.

UNICODE (ISO/IEC 10646)

Unicode is a standard for matching every character of every human language to a specific code. The mapping methods are known as Unicode Transformation Formats (UTF). Among them are UTF-32, UTF-16, UTF-8 and UTF-7, where the numbers indicate the number of bits in one unit.

HTML ENTITIES

Unicode codes for Esperanto characters can be escaped in HTML documents by using HTML entities. The codes can be represented in either decimal (base-10) or hexadecimal (base-16) notation; the two are functionally equivalent.


BUGS

Because postfix-h notation is inherently ambiguous, conversion from postfix-h text is unlikely to result in coherent text. Use at your own risk, and carefully proofread the results.

Report bugs to <psychonaut@nothingisreal.com>.


COPYRIGHT

Copyright (C) 2004 Tristan Miller.

Permission is granted to make and distribute verbatim or modified copies of this manual provided the copyright notice and this permission notice are preserved on all copies.