Tuesday, February 27, 2007

Ebooks in XML - The IDPF/OEB standards

I received a notice today about a conference being held by O'Reilly on digital publishing. The conference has some tutorial sessions on using XML to create digital books, but I fear that these will not include the work being done to create an XML standard for ebooks. Once again proof that the East coast (where "traditional" publishing takes place) and the West coast (where technology happens) are very far apart.

The International Digital Publishing Forum (IDPF - once known as the Open eBook Forum) has recently announce a beta version of its e-book coding standard. I've been watching, and sometimes participating in, this group for a while, and I really think they deserve our attention and support.

To begin with, the IDPF publication structure standard (still termed OEBPS - Open Ebook Publication Structure) is designed to be used by publishers in the preparation of files that will be sent to the technology companies that transform the raw files into actual ebooks. As you know, there are dozens of e-book formats (PDF, Microsoft Reader, Mobipocket, Palm reader... etc.). The publishers need to create a single file that can be transformed into all of those formats, and the OEBPS standard is designed to meet that need. It is also designed to be an ebook format in its own right, and the upcoming Adobe ebook reader, "Adobe Digital Editions," based on Adobe's flash technology, will be able to display books in the OEBPS format.

The standard will seem overly simple to many people. It is that way on purpose. The original OEBPS standard used HTML, based on the assumption that even the publishers, who are notoriously lacking in technology chops, would have someone on board who knows HTML. The second version of the standard, the one out for comment, uses XHTML and CSS. I think this is brilliant. It means that 1) anyone can create a book and 2) anyone can display it, even in a simple browser. The KISS principle is essential for industry acceptance of the standard.

Another key thing to mention is that the OEBPS has been greatly influenced by members of the accessibility community who participate in the IDPF. The Digital Talking Book standard, which was first developed by the DAISY consortium and is now NISO standard Z39.86, uses an earlier version of the OEBPS as its book structure. This is the format that allows synchronization between a text and a reading of the text by a human reader, making it ideal for sighted and non-sighted readers alike (read it in bed, then continue listening in the car).

There is a DTD for the publication structure, although I am currently unable to get it to validate and behave. I have a question out to the authors of the DTD and will post here when I get an answer. Meanwhile, you can comment out the offending entity definition and play with the DTD.

A companion to the ebook standard is the Open Packaging Format (OPF). The easiest way to understand this is to take a look at it. Download Thoughts.epub.
Now open it in Winzip -- yes, it is a simple zipped file. In it you can find the raw xhtml of the publication; an OPF file that is the manifest for the package, and contains Dublin Core metadata for the item; a file that contains the mimetype; any images or other files that are required by the document; and an XML document that defines the overall container. Note that this is a very simple publication. The examples in the documentation show how you would create a document with multiple chapters, cover art, and illustrations. It also covers the areas of encryption and keys, for files that will be transmitted in protected formats. There is a nifty tutorial that steps you through the creation of an OCF file using Winzip.

If you have comments, suggestions, questions, or whatever, go to the discussion area of the IDPF web site and say your piece. And let me know if you have any thoughts on these standards, especially as to how they might be applicable to digital libraries.

3 comments:

Anonymous said...

Karen,

Thanks for this excellent posting on OCF and OPS and we appreciate your support.

Some additional information for your readers:

1. The next version of OEBPS will be renamed OPS (Open Publication Structure) and the version will be 2.0 (the current version of OEBPS is 1.2).

2. OPS is the file format spec and includes the Open Packaging Format (OPF). OPF is the root of the publication and points to all of the other component pieces of the publication which is called OPS. So, simply, OPS is the markup and OPF is the structure of the publication. Draft version of these specs can be found at http://www.idpf.org/forums

3. The Open Container Format (OCF) is the zip-based package for putting in your OPS files, metadata, rights info and alternate renditions of the publication if you wanted (like a PDF version) and sending through distribution. These files have the .epub extension.

4. The two vocabularies you can use to create an OPS publication are XHTML and DTBook (Daisy). We expect that DTBook scripted books will be NIMAS compliant.

Any questions anyone might have I encourage you to post at www.idpf.org/forums and you'll get an answer pretty quickly.

Thanks,
--
Nick Bogaty
Executive Director
International Digital Publishing Forum (IDPF)
www.idpf.org

Peter Brantley said...

Hi Karen -

I also wanted to touch your concern about O'Reilly's Tools for Change Conference, and IDPF standards work. I am both an IDPF board member and one of the organizers for the ToC conference; while the focus of the conference is somewhat different than production of "traditional" ebook packages, the issues and individuals overlap significantly. I would anticipate that these worlds will see more convergence than we might have anticipated even a year or two ago.

Peter Brantley
Executive Director
Digital Library Federation
http://diglib.org

Karen Coyle said...

Thanks, Nick, for the update. And Peter, convergence would be great. Let me know if there's anything I can do.