Understanding Celtx RDF/XML structure

I wanted to attempt improving the open source screenplay/pre-production software Celtx. It’s based on other open source components such as Firefox and built using XUL, e.g. a lot of XML, Javascript, HTML and CSS. Quite a simple and brilliant idea, making it interoperable and customizable for the handy hacker. But before I start making hacks I need to examine how Celtx works.

In a previous post I took a look at the translation of Celtx to other languages (in Swedish), but here are my findings on how Celtx v2.9.1 is working with the content rather than displayed language. I’m focusing on the screenplay perspective, which is most interesting to me.

I’m making the tedious effort to study output files instead of looking at the source code, but when you save a script in Celtx it ends up in a project file with the extension .celtx, e.g. MyMovie.celtx. It’s just a zipped file which contains all the data for a project in clear text, so it’s easy to examine. A project library can contain several types of media and data, at the moment Celtx defaults to seven kinds of templates:

  1. Film (Screenplay)
  2. Audio-Visual (Directors schedule?)
  3. Theatre (Stageplay)
  4.  Audio Play (BBC standard for radioplays/podcasts)
  5. Storyboard (based on index cards and images)
  6. Comic Book (comic stageplay)
  7. Novel (basic book writing).

The most important things, like the text of the script, are stored html text files, while the intelligence and meta data is stored in rdf/xml files.

Unzip/extracting a virgin MyMovie.celtx file gives:

  • local.rdf
    This file contains the ”master” resource definitions for the whole project.
  • project.rdf
    This file contains the meta data, the order of scenes, character info etc.
  • scratch-xeb.html
    This file contains a ”scratchpad” placeholder for ideas and information, which is related to the script but put in a separate structure as to not interfere with the original script.
  • script-xeb.html
    This file contains the actual/original screenplay script content, tagged with html.

If we add images/photos to the project they will appear next to the files above, as seen further down on this post.

Let’s have a look inside each file!

The local.rdf file
This file declares a Project Master ID (my wording), which all it’s containing rdf/xml files connect to, as well as telling what view was last open when we open Celtx. A project can contain a Novel, a Screenplay etc. This it’s all kept together logically in the local.rdf file.

It may look like this:

<?xml version="1.0"?>
<RDF:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:cx="http://celtx.com/NS/v1/"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:NC="http://home.netscape.com/NC-rdf#"
         xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <RDF:Seq RDF:about="rdf:#$3DD1u">
    <RDF:li RDF:resource="http://celtx.com/res/1kEqvojZWTHA"/>
    <RDF:li RDF:resource="http://celtx.com/res/vbHM8my1Pyk0"/>
  </RDF:Seq>
  <RDF:Description RDF:about="http://celtx.com/project/fly1RztQ01di">
    <cx:opentabs RDF:resource="rdf:#$3DD1u"/>
  </RDF:Description>
</RDF:RDF>

First we have the rdf/xml declaration, the usuals. The interesting parts are:
The ID’s of the contained content types/templates, in this case a Screenplay and it’s Master Catalog:

  <RDF:Seq RDF:about="rdf:#$3DD1u">
    <RDF:li RDF:resource="http://celtx.com/res/1kEqvojZWTHA"/>
    <RDF:li RDF:resource="http://celtx.com/res/vbHM8my1Pyk0"/>
  </RDF:Seq>

Where:
A Master Catalog is a view on tagged elements of a script, e.g. characters, sound effects, animals etc. and some meta data notes for each element (where applicable).
I’m guessing the Celtx Installation ID (my wording) is ”#$3DD1u”
The Screenplay ID is ”1kEqvojZWTHA”
The Master Catalog ID is ”vbHM8my1Pyk0”

The 12 character long strings are generated by Celtx and appears to be unique for each project file, while the Celtx Installation ID seems to stay fixed.

Then we have the declaration of what elements are open in the project, in this case the Master Catalog script:

  <RDF:Description RDF:about="http://celtx.com/project/fly1RztQ01di">
    <cx:opentabs RDF:resource="rdf:#$3DD1u"/>
  </RDF:Description>

Where the Master Project ID, again in lack of appropriate wording, is ”fly1RztQ01di”.

project.rdf
This file contains the brains and structure of e.g. a Screenplay script; the characters, the order of scenes, the meta data about characters etc. It does not contain the actual script contents, as they are stored in a separate html file.

The file is always initiated with an xml statement:
<?xml version="1.0"?>

This is followed by the master RDF declaration:
<RDF:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cx="http://celtx.com/NS/v1/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:NC="http://home.netscape.com/NC-rdf#"
xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

At the bottom of the file we find this closing section:
<cx:Project RDF:about="http://celtx.com/project/fly1RztQ01di"
cx:fileVersion="1.4"
dc:title="testing"
dc:modified="2011-07-23T14:31:35Z">
<cx:components RDF:resource="http://celtx.com/res/U0xHoW9oCmfD"/>
</cx:Project>

Again we see a reference to the ”Master Project ID”: ”fly1RztQ01di”
EXPLORE: What is the cx:components ”U0xHoW9oCmfD”?

Between the initial and closing sections we find stuff more randomly ordered, or in reverse order? I guess it doesn’t really matter. Since most personal computers these days have a fair amount or RAM and CPU horsepower, and a screenplay script is just a couple of hundred pages (around 90-120 in case of Hollywood standard specs), it shouldn’t really matter from a performance perspective.

OK, moving on.

Scenes are stated like this:
<RDF:Description RDF:about="http://celtx.com/res/5r38m100"
cx:sceneid="7k38m100"
cx:location=" "
dc:title="INT. SCENE 2 - NIGHT"
cx:intext="INT"
cx:setting="SCENE 2"
cx:daynight="NIGHT"
cx:ordinal="2"
cx:sortord="0002">
<cx:members RDF:resource="rdf:#$E+uWw1"/>
<cx:markup RDF:resource="rdf:#$F+uWw1"/>
</RDF:Description>

Where the non-obvious are:

sceneid		A key linked to the content in the script-xeb.html file.
sortord		Tells us the scene number.
cx:members	(We'll figure this out later.)
cx:markup	(We'll figure this out later.)

Cast is marked up like this, here with a character called ”Daniel”:
<cx:Cast RDF:about="http://celtx.com/res/dF4dFMsdbEdt"
dc:title="DANIEL" />

The cast characters are detailed in the Master Catalog view. Opening this in Celtx and adding info, this section can look like this:
<cx:Cast RDF:about="http://celtx.com/res/dF4dFMsdbEdt"
dc:title="DANIEL"
cx:tags="DANIELTAG"
NS1:character-full-name="Daniel The Barbarian"
dc:description="This is Daniels description: He's a mean bastard."
NS1:character-age="36"
NS1:character-hair="Brown hair"
NS1:character-eyes="Green/brown"
NS1:character-height="184"
NS1:character-princ_func="protagonist"
NS1:character-goal="Daniel wants to understand Celtx and improve it. He'd like to see a screenplay of his turn into a full feature film some day, but enjoys hacking Celtx for the fun of it."
NS1:character-ach_goal="The character's way to achieve his goal is to study the source code, understand rdf/xml better and then attempt to write and publish some extended features."
NS1:character-fam_back="Daniel is the only son in an intercontinental family."
NS1:character-habits="Dives into stuff 110% for a period of time. Then dives into something else. Zooms in and out of stuff quickly."
NS1:character-education="While a BSc in Mechanical Engineering, he's mostly self made when it comes to IT."
NS1:character-person="A nice guy."
NS1:character-likes="Watching good flicks, hunting and learning how stuff works."
NS1:character-dislikes="He dislikes the limited number of hours per day."
NS1:character-traits="My character traits are treating me well."
NS1:character-dist_feat="I'm so normal and have no distinguished features!"
NS1:character-weight="about 90 kg's, in a showroom mirror.">
<cx:media RDF:resource="http://celtx.com/res/WDOkyiQ3kwhM"/>
</cx:Cast>

In the Master Catalog we can add images/photos of the character.
The cx:media resource is thus ”WDOkyiQ3kwhM”

Further down in the project.rdf we find:

  <cx:Image RDF:about="http://celtx.com/res/gZc7yg59RFIg"
                   cx:localFile="protagonist.gif"
                   dc:title="protagonist.gif" />

Other files
The image file protagonist.gif is stored in the root of the zipped .celtx file, next to the rdf files.

How does Celtx know that the image is contained related to the character, e.g. how does Celtx know what scenes this character is in? (as shown in Master Catalog)?

Ah! It adds a separate tag for this in the project.rdf file:
<RDF:Seq RDF:about="http://celtx.com/res/WDOkyiQ3kwhM">
<RDF:li RDF:resource="http://celtx.com/res/gZc7yg59RFIg"/>
</RDF:Seq>

See! WDOkyiQ3kwhM is the resource ID in the cx:Cast for the character ”Daniel”, and it contains a reference to an element gZc7yg59RFIg, which in turn states it’s an image with the file name protagonist.gif!

Now we are getting at it. The project.rdf file is a sort of database, containing connections between different resources; be it scenes, characters and their meta data. We can start to work out a structured schema of how things are put together based upon these connecting ID’s.

Once we have a complete view of the schema, it’s easy to build things that need the meta data and structure, for instance export a custom report to Excel/LibreOffice/OpenOffice/Google Docs Sheets. If we know the XML structure of Final Draft, we can build an ”Export/Import Final Draft screenplay”. While this may be useful, my intention is to build something basic first. And I don’t have Final Draft to explore it’s structure. I’ll give this some more thought and try to come up with a useful ”Hello World” case eventually.