DISCLAIMER: Please follow any links below while reading this from your desktop computer not your Kindle! The below discussion assumes one does E-book development work on a desktop computer targeting the Kindle for your resulting E-Books.
Hint: When I wrote up this documentation I was using .prc file extensions, now I am switching to .mobi file extensions since Kindle only seems to want to directly download correctly .mobi files. As far as Kindle is concerned, these seem to be the same things -- its just the name of the file extension that has changed. If you have a non-Kindle e-book reader that doesn't know the .mobi extension try changing the extension back to .prc and see if that works.
-----
Before I forget and go any farther, please check out the following MOBI file I created to find out what Unicode Glyphs Kindle actually supports: UList4.mobi
You can view this file from a desktop MOBI Reader, for example, to see that the file actually has the Unicode characters implemented correctly. Now transfer it to your Kindle in order to "see" what kind of Unicode aka UTF-8 Kindle's Glyphs actually support. The good news is that "Unicode" seems to be actually more-or-less correctly implemented in Kindle. The bad news is that only a small subset of Languages is supported, in words:
U+0000 < Kindle-Supports <= U+00FF (Latin)
U+00FF < Kindle-Approximates < U+0250 (Also Latin)
Greater than U+250 Kindle just blows it off, [correctly] representing the [?] ["Huh?"] Glyph instead.
Basically this means ONLY Latin languages on Kindle. Somewhat disappointing this means no Greek, and not Hebrew, which one could imagine might have been nice to have for Bible support, etc.
Another way of saying this is that Kindle kind-of supports ISO 8859-1 see for example http://htmlhelp.com/reference/charset/
Project Gutenberg supports two versions of "Plain Vanilla ASCII" the less plain version of which is ISO 8859-1, so Kindle support of this flavor of Gutenberg files is pretty good.
----
Note that I talk below about "File Format Conversion" and not book editing or authorship because those are not my interests. Obviously, much I talk about is as applicable to E-Book editing and authorship as to file format conversion for people interested in those topics as well.
Also note that Amazon offers some file conversion options where you email them stuff or something and see what happens and I am just not interested in that stuff I have not even ever tried it. Maybe some day I will at least check it out.
Hopefully, someday soon there will be a discussion here of what incomplete information I know about doing development of Kindle-compatible PRC files, PRC file format, etc. In the meantime, here is a bare list of Internet resources that I use to do my file-format conversions.
I am not interested in discussing or helping with any acts of piracy.
I do my best to respect copyright, trademark, and patent laws -- it is how I have earned the bread I feed my children! For those who know essentially nothing about patent and copyright laws in the US let me just say that they were originally designed to represent a TRADE-OFF between the rights of authors and inventors to profit from their good works without having them ripped off immediately before they could hope to profit from their efforts, vs. a long-term right of society to eventually inherit those works for the general public good. I personally believe that this tradeoff is a good thing: Short-term ownership of intellectual property by an author or inventor followed by long-term eventual ownership of that intellectual property by society as a whole. But: as always -- the devil is in the details!
Also in this directory I have placed my motley collection of self-generated tools without comment in the hopes an intelligent developer can gleam how I use them. I have included one example each of file-format converting a txt file and a html file. This is assuming a Project Gutenberg txt file for input.
Incidentally, I THINK although I have not confirmed this directly that a MOBI file format and a PRC file format is essentially the same thing. I can rename a MOBI file as a PRC file and it still works fine on Kindle. Also I think the new AMZ format is "just" a PRC formatted file plus DRM (Digital Rights Management). I THINK a PRC file is similar to a Palm PDB file. I do not have any Palm development background so I am sure other people must know more about this...
PRC files can be but need not be compressed. It is your choice.
PRC files can be but need not be encrypted. It is your choice.
PRC files can or cannot allow "cut and paste". It is your choice.
PRC files can or cannot contain images. It is your choice.
A PRC file that is not compressed and not encrypted should be pretty easily reversible. HTML -> PRC and PRC -> HTML. With maybe some link modifications or something. Although I have not actually bothered to try to do this.
-----
E-Books NOT in PRC file format (with a few "trial" expections). If you are satisfied reading a "Plain Vanilla ASCII" .txt file without correctly displayed author and title information on your Kindle you can use these files directly:
WARNING: I have had many frequent and strongly negative experiences, aka "software hangs" trying to access Project Gutenberg directly from my Kindle using Amazon's "experimental" HTML reader included on Kindle. So far I have always eventually been able to recover from these hangs by turning off and on one or both switches on the back of Kindle. But, suffice to say I have learned this is a BAD IDEA and I am not doing it any more! Download from Project Gutenberg to your desktop and from there to your Kindle.
Also I strongly suggest you get an SD card for your Kindle right away. I spent $15 for a 2 Gig generic SD card from Fries and have not been disappointed. It will hold many more books than you can comfortably manage using your Kindle's [pathetic] "Home" user interface.
I am personally NOT satisfied reading a "Plain Vanilla ASCII" .txt file on Kindle which is the whole point of my file re-formatting work. Project Gutenberg also has excellent "HTML" formatted books which are very close to being something Kindle is happy to use. In fact PRC files are essentially encapsulated HTML files. However, trying to use HTML directly on Kindle I have found very trying, with Kindle hanging frequently. Which is why I LOVE to file-format convert HTML to PRC file format! Frequently the result is very beautiful on Kindle!
Note that many of HTML formatted books I have file reformatted for Kindle exclude graphics. This is in part because I have not been impressed by the practicality of images on Kindle vs. the increase in file size to include those images. IE files often are 10X larger in order to put a half-dozen grey smudges on Kindle's screens. However, in some cases the results are worth it and some of my Kindle file reformatting efforts include images.
-----
A Handy PRC Reader for the desktop that allows me to preview my work prior to transferring it to the Kindle:
http://www.mobipocket.com/en/DownloadSoft/ProductDetailsReader.asp
Sometimes books that a very unattractive on the Desktop are still quite attractive on the Kindle. Or Vice Versa.
I also think this viewer is worth having if you want to read PRC books without even owning an E-book Reader such as Kindle. I'd much rather read in this PRC Reader than try to read from a given Word Processor.
-----
A GUI file-format conversion program which is helpful for one getting started or one who does not have many E-Books to format convert to Kindle:
http://www.mobipocket.com/en/DownloadSoft/ProductDetailsCreator.asp?edition=Publisher
-----
A command-line oriented program which is harder to get started with but which proves to be more powerful in the long run. You need to learn how to create OPF specification files and use them as the input to this program to get anywhere interesting with this:
See "Downloads Mobigen" at:
http://www.mobipocket.com/dev/
Run Mobigen from a command line with absolutely no arguments and it will spit out a listing of the command line options it supports.
-----
A tool that understands [more-or-less] Project Gutenberg common formatting and markup conventions within "Plain Vanilla ASCII" files in order to "Pretty-Print" them [ouch!] into HTML file formats. [When is a plain vanilla ASCII file NOT just a plain vanilla ASCII file?] I have not researched this tool to date -- I just use it. Sometimes I am very happy with the results. Sometimes not. When it does bad things [IE makes bad "Pretty-Print"ing decisions] the results typically look less bad on the Kindle than they do on the desktop -- so take heart!
http://www.sandroid.org/GutenMark/
Another way of saying this is that I primarily use this tool as a magic black box to turn Gutenberg txt files into HTML files, which I find as a better input file format for the other tools I use. If Gutenberg already has an HTML version of a book in "flat file" format excepting images I try to use the HTML version and I tend to leave out the images. Some Gutenberg HTML versions are chunked into multiple HTML files and I find it difficult to figure out how to get them going in the finite couple of minutes per Gutenberg text I am willing to invest.
-----
In general, you can see from my examples that the approach I am currently using is:
* I'm using Vista as my development system [ouch!]
* If using Vista make sure everything I am talking about here including your cmd window is "Run as Administrator" otherwise you will get at best many warnings and at worst some tools don't run correctly at all! Again, make sure EVERYTHING is "Run as Administrator" !
* I download a Gutenberg txt or HTML E-book file to a dev folder on my desktop computer. So far I have not been willing to do major scarfs from Gutenberg as being potentially too disruptive to both their system and to my desktop. So to date its been "one file at a time". Please read the Gutenberg requests about friendlier ways to Scarf if you are considering doing some major Scarfing!
* If I am starting from a Gutenberg txt file I first convert that to an HTML file using GutenMark.
* I use a single HTML file as input [unless using the Mobipocket GUI tool]
* I hack-generate an OPF file based on author_lastname, author_firstname, gutenbergfilenameminuxsuffix, "Book Title in Quotes"
* I think I get the best results when I do this setting both the input and output formats in the OPF file to "UTF-8"
* I input that OPF file to Mobigen, which outputs a "MOBI" formatted file
* I change that MOBI extension to a PRC extension assuming I hope they are in fact the same thing [this works on Kindle at least!]
-----
To build the Examples in the Dev directory, copy the entire Dev directory tree to your machine at the location c:\Gindle\Dev. If you don't have a recent Windows machine you will need to compile the trivial program "echostrip.cpp" locally or find an equivalent. Also install GutenMark and Mobigen in the indicated directories under \Dev. Then run the "makeit.bat" file in each example directory. Everything will magically work as intended. Trust me, how could this possibly not work? Remember everything including your cmd window must be "Run as Administrator" under Vista!
-----
Incidentally, the files on this site are typically not encrypted nor DRM'ed. These means that one can "reverse engineer" the files if you like. See for example the tools at https://dev.mobileread.com/trac/mobiperl/wiki
Some people are using these tools to remove the "Gutenberg" header statements. This is as legitimate as removing the header statements in the original Gutenberg file formats, no more nor less. Read the detailed Gutenberg "copyright" information at their sites if you are contemplating doing something "creative" with the files. Gutenberg claims you can do almost anything you want with the files, but I am not a lawyer, get your own copyright lawyer if you have any questions. Based on the timing of e-book arrivals on some other sites I am guessing that some people are taking these files and taking out the headers and putting them on other sites. Again, I personally have no problems with this one way or the other, although one could just as easily do a "forward engineering" of the files using my tools directly from the Gutenberg files. What would be really great would be if sufficiently motivated people would do additional formatting work to "clean up" files for Kindle that the automated file-format-conversion tools I use do not do a good job on. Poetry, for example is often poorly handled by my choice of tools.
What is disappointing to me, however, is that some people are choosing to impersonate the name of this site "Free Kindle Books." These "poaching" efforts make it harder for users of Kindle to find Kindle-ized versions of PG files. If you can help spread the word about the location of this "legitimate" site it would be helpful to my efforts. It is not that I didn't anticipate these developments, its just that I made the deliberate decision, following the philosophy of PG itself I think, to try to ignore this "poaching" as much as possible and get on with doing what I enjoy doing.
-----
Things I need to do:
* Generate a tool that will extract title and author name back out of a PRC file. I think I know how to do this, hopefully just a day's work.