16 January, 2008

Simple RTF to XML converter

RTFEditorKit (javax.swing.text.rtf.RTFEditorKit) from Sun Java API - special class for operations with RTF (Rich Text Format) documents.

I've created java sample that converts RTF document to XML.

From:


to:


This is the source of this converter:


Rtf2XML.java


import javax.swing.text.AbstractDocument.BranchElement;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.BadLocationException;
import javax.swing.text.rtf.RTFEditorKit;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.UnsupportedEncodingException;

public class Rtf2XML {
private DefaultStyledDocument rtfSource;
private org.w3c.dom.Document xmlTarget;
private org.w3c.dom.Element xmlRoot;

private void expandElement(javax.swing.text.Element rtfElement) {
for (int i = 0; i < rtfElement.getElementCount(); i++) {
javax.swing.text.Element rtfNextElement = rtfElement.getElement(i);
if (rtfNextElement.isLeaf()) {
try {
addElement(rtfNextElement);
} catch (Exception e) {
e.printStackTrace();
}
} else {
expandElement(rtfNextElement);
}
}
}

private void addElement(javax.swing.text.Element rtfElement)
throws UnsupportedEncodingException, BadLocationException {

String style = new String(rtfSource.getLogicalStyle(rtfElement.getStartOffset())
.getName().getBytes("ISO-8859-1"));

String text = new String(rtfSource.getText(rtfElement.getStartOffset(),
rtfElement.getEndOffset() - rtfElement.getStartOffset())
.getBytes("ISO-8859-1"));

org.w3c.dom.Element node = xmlTarget.createElement("p");
node.appendChild(xmlTarget.createTextNode(text));
node.setAttribute("style", style);
xmlRoot.appendChild(node);
}

public void convert(String sourceFileName) throws Exception {
rtfSource = new DefaultStyledDocument();
RTFEditorKit kit = new RTFEditorKit();
kit.read(new FileInputStream(sourceFileName), rtfSource, 0);

xmlTarget = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().newDocument();

BranchElement rtfRoot = (BranchElement) rtfSource.getDefaultRootElement();
xmlRoot = xmlTarget.createElement("data");
expandElement(rtfRoot);
xmlTarget.appendChild(xmlRoot);

Transformer t = TransformerFactory.newInstance().newTransformer();
t.transform(new DOMSource(xmlTarget),
new StreamResult(new FileOutputStream(sourceFileName + ".xml")));
}

public static void main(String[] args) {
if (args.length != 1) {
System.err.println("Usage: *.rtf");
return;
}
try {
new Rtf2XML().convert(args[0]);
} catch (Exception e) {
e.printStackTrace();
}
}
}



But RTFEditorKit isn't so powerful and friendly as I want.
I think, iText will be better for operations with RTF (and other document formats).

It's the good and free decision.

No comments: