Using Babylon-based dictionaries on your Kindle

Friday, 29 June 2012

Using Babylon-based dictionaries on your Kindle

UPDATE! A Follow-Up Post on this Project
Since this post got wide attention, I've decided to follow-up on this project.
See my new Babylon-based dictionaries on Kindle - Round 2 post.
Now the project is shared as open-source and pre-built dictionaries are organized and shared.

Lost in translation

The problem

Addressing this issue started by by trying to purchase an Italian-English dictionary for my 2nd generation Kindle, running Kindle software v2.5.3.

One dictionary was offered for sale (as an ebook) on Amazon's website. The problem was that the dictionary was not actually available for the device for another whole year..

Good translations

Babylon, on the other hand, offers high-quality dictionaries, spanning over pretty much every language. Babylon Translator is a paid software for Windows. Its dictionary files (.BGL) are offered as free downloads.

In a perfect universe
If I only had a way to import Babylon's free content dictionary into my Kindle and use it as the built-in dictionary, it would have been perfect..

The solution presented here was tested on my Kindle 2. I'm pretty sure it should work on newer versions of Kindle as well.

The same Babylon dictionary, used on my PC (Left) and on my Kindle (Right)
(Click for full size)

Article Level:
Reasonably moderate

Cracking the Unicode codepage code

Spoilt Kindle 2
There are a few things to know about multilingual support and Kindle (if you wish to view non-Latin international texts):
Kindle 2 does not natively support non-Latin unicode characters. This means if you try to view an ebook which contains non-Latin text (e.g. Cyrillic), you will see blank squares instead of letters.
This is a huge miss on Amazon's side for 2 reasons:

Unicode characters are already supported on all platforms, computers, tablets, phones, websites, etc. All modern devices can natively display any character set. All except the Kindle 2, that is.
Kindle is not a laptop, nor a tablet, nor a smatphone.
It's one and only purpose is to be an electronic book reader. The only thing it should do well is display texts. Why not have it natively support any text in any language? Especially since the resources for that are so common and so obvious already.. It isn't 1994 anymore...

There is a workaround (a hack) which enables Kindle 2 to display all unicode characters. It's described in detail in this great blog post, which includes links to all the necessary files to make it work, elaborate instructions and links to alternative fonts which may be installed for improved readability as well.
I am not sure how right-to-left books are displayed (e.g. ebooks in Hebrew), in terms of text-alignment and order of characters, because I have not tested such books yet. For left-to-right (e.g. in Bulgarian) languages everything seems to be OK.

And there's more..
Three more points to take into account:

Kindle models of generation 3 and above do support unicode natively.
This means that they properly display ebooks in any non-Latin language.
Even after hacking my Kindle 2 to display non-Latin characters, I didn't manage to use the integrated dictionary to look up words in non-Latin languages.
For example, if I'm reading a Bulgarian book and I wish to use a Bulgarian to English translator as the default integrated dictionary (i.e. point the cursor on a word to look it up), the solution described in this post doesn't seem to work (the lookup functionality does not look up).
It seems that the integrated dictionary look-up functionality supports Latin characters only. Perhaps newer generations of Kindle don't suffer from this problem.
I'd love to get enlightened by anyone who has succeeded to achieve this with a Kindle of any generation.
Setting a new default dictionary worked nicely on my Kindle device itself.
However, I found it difficult to use my custom dictionary on my computer running Kindle for PC or on my phone running Kindle for Android app.

My Kindle 2

Ingredients

A quick download list to the tools you will need:

A Babylon .BGL dictionary.
Get a dictionary you wish to convert and use from Babylon's free content section.
BabylonToHtml
This is a tool I've made which converts Babylon dictionaries (.BGL files) to HTML files.
You may download just the executable file, or the full C# solution source code, as you prefer.
Mobipocket Creator.
This tool converts files of many common formats into Kindle compatible eBooks. This includes HTML dictionary files.

Step 1: Get the dictionary file

In order to create your custom Babylon dictionary file for Kindle you will need a Babylon dictionary file.
Go to Babylon's free dictionaries page, choose one (or more) and download it. All done, right? Not quite.

The dictionary file you've downloaded from Babylon's site is actually an .EXE installer, which contains the dictionary file archived in it.
There are some suggestions that it may be possible to extract the .BGL file from the installer with 7-Zip, but I did not manage to do so. The easiest way to get the dictionary file out is to run the installer, which will install Babylon (at least in trial mode).
Once Babylon is installed the .BGL file resides in %LOCALAPPDATA%\Babylon (Windows Vista/7). You may repeat the process for as many dictionaries as you require. Copy out the precious .BGL file(s) and keep or uninstall Babylon as you wish.

Step 2: Use my magic tool: BabylonToHtml

The next step is to convert the binary .BGL dictionary to textual HTML file (of a very specific structure, of course) which will be used as the source of the eBook.

About my magic tool
The binary structure of .BGL files has already been cracked (not by me). This knowledge is commonly out in the open and shared across various open-source projects. I have combined a few of those resources into one easy-to-use command-line utility.

One source was dictconv, a dictionary conversion tool for Linux which comes with its full C++ source. I used parts of this code (ported by me into C#) in order to analyse the meta-data of the dictionary file (text encoding, author etc).
Another resource is is an open-source project named ThaiLanguageTools. It's written in C# but the contents of the code looks suspiciously similar to the code of dictconv mentioned above (similar variable names, comments etc) which suggests it's a porting as well.
The content of Babylon's .BGL files is encoded in compressed GZip format. In order to decompress the data, I have incorporated the free open-source SharpZipLib into the project as well (as source code, so there is only one executable needed to run my app in the end. no additional DLLs).

To all the above I added my very own simple HTML generator. It structures the entries from the dictionary file in a markup compatible with the next step (converting it into an eBook).

Get the tool (with or without the source code)
If you wish to browse through the sources (and improve them!), you can download in the full Visual Studio solution from this link.
You may just want to get the executable itself and this can be done with this link.

Use it
You'll need to run my BabylonToHtml tool in a command prompt window.
If you run it without any additional parameters, you'll receive some basic help:

A handy message for the perplexed user..

Command line parameters:

In most cases all you have to provide is the name (and potentially the path) of your .BGL file.
The output .HTML is encoded in UTF-8 (Unicode).
However, the entries read from the .BGL dictionary are encoded with specific character sets (and sometimes with more than one).For example: in a Chinese - Bulgarian dictionary the source language entries are encoded with Chinese characters and the target language entries are encoded in Cyrillic.
BabylonToHtml will try, by default, to get the right encoding (this info is available in the meta-data of the .BGL file in most cases), but it may make mistakes.
These encodings can be enforced:
It is possible to set the codepage of the source language by specifying the -se command line argument.
It is possible to set the codepage of the target language by specifying the -te command line argument.

So something like the following should be sufficient in most cases:

BabylonToHtml.exe English_Bulgarian.BGL

If your .BGL file does not reside in the same folder with the .EXE, a full path should be specified (may be wrapped with double-quotes if needed).

The encoding (and other information about your dictionary) is be parsed and progress of the process is presented...

Running...

Once the process is done, a new HTML file resides next to the original .BGL file
The new file's name matches the original .BGL file (just with .HTML extension):

All done. A new HTML file is generated. Magic!!

Step 3: Convert the dictionary to a Kindle compatible eBook

For this you will need to download, install and run the free Mobipocket Creator. The process itself is fairly simple. Here is the illustrated version:

On the main window, under "Import From Existing File" click the "HTML document" link.

Import from: HTML (duh!)

On the next screen:
Click "Browse..." on the "Choose a file" field and select the HTML file generated by BabylonToHtml.
In the "Encoding" drop-down select "International (UTF8)".
Click the "Import" button..

Import the HTML file

Click "Book settings" on the left-hand-side list and set the fields:

Set the "Encoding" drop-down to "International (UTF8)".

Check the "This eBook is a dictionary" box.

Set the Input language and the output language of your dictionary appropriately.

Click the "Update" button..

Dictionary settings..

Click "Metadata" on the left-hand-side list and set the mandatory fields:

Give a title for your eBook, set the author, language and main subject.

Now scroll all the way down...

Metadata(1/2): Fill a title, author, language and main subject

At the bottom of the "Metadata" screen, fill the "Suggested Retail Price" field (it cannot be left empty, "0" is also fine).
Click the "Update" button..

Metadata(2/2): Set the retail price :-)

On the top bar click the "Build" icon...

Build(1/4): Click Build

In the "Build Publication" screen click the "Build" button...

Build(2/4): Click Build

Wait for the build.
Depending on the size of your dictionary (and the size of the generated HTML file) this may take some time.

Build(3/4): Wait...

Once the process is finished, select the "Open folder containing eBook" radio button and click "OK" to get your dictionary eBook.

Build(4/4): All is done!

Your dictionary-eBook is a file with .prc extension:

Your eBook is produced with a .prc extetnsion

Step 4: Transfer the dictionary to your Kindle and start using it

Transfer
Plug the Kindle to the computer (duh!). Transfer the new eBook to the usual Documents folder, alongside your other books, and unplug.

Note: In some newer versions of Kindle, the dictionaries have been moved from the Documents folder to the Documents/Dictionaries subfolder. If the dictionary is not recognized by your Kindle device, move it there.

Set as default
Click the "Home" button, then click "Menu" and go to "Settings" and Enter:

Home screen > Menu > Settings

In the Settings screen click "Menu" again and go to "Change Primary Dictionary":

Settings screen > Menu > Change Primary Dictionary

Your newly created dictionary should appear next to the default Oxford one.
Select it and Enter:

Choose your custom dictionary

Then Click Home to leave the Settings page.
Your dictionary is now the default translator whenever you select a word in a book:

Babylon dictionary on Kindle!

You may also manually look up words in your custom dictionary as you do with the default English one.

Bonus tip: Take screenshots from the Kindle

To take a screenshot from the Kindle device:
Press the Shift key (

) + ALT key + G simultaneously. The screen will flicker.

Plug the kindle to the computer, your screenshot files are in the Documents folder, named screen_shot*.gif.
Note: This process sometimes needs to be repeated. You may not find your screenshots every time. Not sure why.

Kindle screenshots!

49 comments:

krasin5 July 2012 at 16:46
Very useful, thanks for sharing! Could you give a link to the English_Bulgarian.prc file?
ReplyDelete
Replies
SPiRiTCORE20 September 2012 at 00:57
Hi Alon!

Thank you for your post!

Any idea how to port it to the android app?
I'm trying to put an Hebrew dict to the android app.

Thanks!
ReplyDelete
Replies
Anonymous29 October 2012 at 23:39
I was trying to convert NEW_Babylon_German_English_dictionary.BGL
to html with the help of your program. The result was somewhat garbled.
e.g. The definition of "Abdichtung" came out this way:

proofing, sealing, act of closing off against entry or leakagebdichtung (die)

The last words should be "leakage Abdichtung (die)"

but it came out "leakagebdichtung". The character "A" of "Abdichtung" was swallowed and the word was appended to "leakage"

This is the same for all the definitions.

Can you suggest what can I do?

Thank you for your reply
ReplyDelete
Replies
Anonymous11 December 2012 at 03:56
Hello Alon,
I kinda have difficulties going through the second step - I keep getting an error. If it's not too much to ask, could you please create a Bulgarian-English dictionary (prc or html) and post it here?
If you have some free time to do that, it would be a huge favour. Thanks in advance
ReplyDelete
Replies
Anonymous5 January 2013 at 21:50
:D That's true. Mine is Kindle keyboard and probably that is the reason. Anyway thanks a lot for your time. i really appreciate it. good luck with you blog and other things
ReplyDelete
Replies
Anonymous12 January 2013 at 00:04
Hi and thank you very much! I managed a nice Spanish english dic!
However I'm trying to make an english english and an english - french one, but the html comes out weirdly like this :
"cos$531761$
. Due to the fact that -os"
All the defined words are like this with weird dollar symbols etc., am I doing it wrong?
Thanks a lot!

Btw, do you know how to add cp932 for japanese please? If it's not complicated, because I don't know anything about programming! :)
ReplyDelete
Replies
Anonymous13 January 2013 at 22:48
Hi,
I have Kindle Keyboard (i.e. 3rd generation). The firmware version is 3.4.
I did all the instructions as you wrote on English-Hebrew Babylon dictionary. There are few issues:
1. Hebrew appears left to right. But I do see Hebrew letters (and not Gibberish)
2. There are strange stuff like $531761$ (like other people reported)
3. The dictionary is not popping-up in the Kindle when the cursor is hovering a word. Sometime it does with "a delay" and it displays previous hovered word.

I was able to upload free Hebrew eBooks (which are displayed perfectly) according to the instruction in http://kneidlach.info/
ReplyDelete
Replies
Unknown19 January 2013 at 09:44
great job but it doesn't decrypt images.i really need a tool that extract images too but babylontohtml didn't do so.

in this page a reverse engineer has explained the process and he mentioned named resources too but i am not good at c++ and i can't convert it. if you had sometime please take a look at it:
http://www.woodmann.com/forum/showthread.php?7028-BGL-(babylon-glossary)-to-GLS-(babylon-glossary-source)&p=44981&viewfull=1#post44981
ReplyDelete
Replies
Anonymous5 February 2013 at 12:14
HI the main problem with this system is that it mainly works only from ENGLISH to other languages...
if you try italian russian for instance it doesnt lookup plural and feminine and conjiugated verbs...so as a practical results only very few works are translated...
i would need a source dictionary with conjugated verbs...
ReplyDelete
Replies
Anonymous5 February 2013 at 12:38
i meant few WORDS are translated..
ReplyDelete
Replies
Anonymous14 February 2013 at 09:18
Pleas exactly tell how many software is need?.i cant run your software on pc.
ReplyDelete
Replies
Anonymous18 February 2013 at 19:34
Very interesting idea and process. One not so small problem. My anti-virus software identifies every Babylon download as malware. Have you found any way to get the dictionaries without the intrusive add-ons that are really hard to get rid of?
ReplyDelete
Replies
Anonymous19 February 2013 at 06:19
I just processed the Babylon English-Hebrew.bgl. The resulting book contains a lot of those $012345$ strings. I thought about removing all of them, but a brief examination shows that some of them are needed, or are at least not displayed as $012345$ strings.

Do you have any further thoughts on how to remove only the ones that mess up the book?
ReplyDelete
Replies
Simon Brenncke4 July 2013 at 19:31
Thank you so much for this information!! I was desperate for getting a Russian-English dictionary, and finally made one thanks to your explanation.

However, even though downloading the glossary for the Russian-English dictionary from Babylon, after installation there was no Russian-English among the .BGL files. Instead, I searched for the file on google and found one. (that is, babylon_russian_english.bgl). For what it's worth, I'll mail you the dictionary, should you want to add it to your site.
ReplyDelete
Replies
Anonymous25 July 2013 at 20:05
Hi Alon,
I just bought kindle paperwhite and was looking for english-hebrew dictionary.
After long search I found yours. I must admit that you did a terrific job. this dictionary is very helpful. however, there is still a small problem:
when using Babylon English-Hebrew Dictionary.sdr the words appear backwards.
I tried using Babylon English-Hebrew Dictionary - MG Reversed Words.prc and the word appear OK. but the order of the words is wrong.
for example, the word everywhere is: מקום בכל
instead of
בכל מקום

is there something you could do about it?
if there is anything I could help with. I'd be glad!

ReplyDelete
Replies
Wimmer8 August 2013 at 13:29
What about inflection in Babylon dictionaries, like "make, makes, made".
ReplyDelete
Replies
Wimmer9 August 2013 at 12:17
Reply doesn't work, strange...

My Babylon dictionaries, for example Interlingua-Polonese, have inflections prepared clasically:

parlar|parla|parlate|parlara|parlava|parlante
mówić

So, I must try ;-)
Thank you, Alon.

BTW - a few years ago a Chinese wrote his Lingoes and promised to prepare a tool for creating dictionaries and converting the Babylon dictionaries (GLS files) but it has never appeared.
http://lingoes.net/
ReplyDelete
Replies
Wimmer9 August 2013 at 12:38
I have just tried the Russian-English dictionary (on a sample of "Idiot") and it recognizes Russian inflections.
So, I have to try to convert my own dictionaries.
ReplyDelete
Replies
Walter White14 July 2014 at 05:18
Hi, Thank you for the great tool but one thing got my attention. I tried to convert babylon's spanish english dictionary, everything works ok except partofspeech sections, like noun, adj etc. I used pyglossary converter and I saw they are there but somehow lost during conversation. Please can you fix that part or at least show us how can we do it.
ReplyDelete
Replies
Gregory Davis24 August 2019 at 07:04
Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work
convert pdf to grayscale
ReplyDelete
Replies
or2 December 2019 at 20:02
great job alon.
the year is almost 2020 and your project still helping people,
i am an israeli that spend few months right now on US,
i just now bought a new kindle and thought maybe to develop somthing for english-hebrew to kindle.
then i saw your project and you save me alot of time and thinking, thanks!
ReplyDelete
Replies
sid seo20 January 2021 at 17:40
If you are looking for more information about flat rate locksmith Las Vegas check that right away. Safe Way of Using an Essay Writing Service
ReplyDelete
Replies
Dominick17 March 2021 at 17:59
As competitors are obtaining tough, company owner can no more pay for to neglect the international market, there develops a demand to connect with the target market. As business increases to various nations, equating our message in the language and game localization they recognize comes to be extremely essential. This is why language translation solutions are obtaining interest today.
ReplyDelete
Replies

Add comment