16 January, 2008

Simple RTF to XML converter

RTFEditorKit (javax.swing.text.rtf.RTFEditorKit) from Sun Java API - special class for operations with RTF (Rich Text Format) documents.

I've created java sample that converts RTF document to XML.

This is the source of this converter:



Rtf2XML.java import javax.swing.text.AbstractDocument.BranchElement; import javax.swing.text.DefaultStyledDocument; import javax.swing.text.BadLocationException; import javax.swing.text.rtf.RTFEditorKit; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.UnsupportedEncodingException; public class Rtf2XML { private DefaultStyledDocument rtfSource; private org.w3c.dom.Document xmlTarget; private org.w3c.dom.Element xmlRoot; private void expandElement(javax.swing.text.Element rtfElement) { for (int i = 0; i < rtfElement.getElementCount(); i++) { javax.swing.text.Element rtfNextElement = rtfElement.getElement(i); if (rtfNextElement.isLeaf()) { try { addElement(rtfNextElement); } catch (Exception e) { e.printStackTrace(); } } else { expandElement(rtfNextElement); } } } private void addElement(javax.swing.text.Element rtfElement) throws UnsupportedEncodingException, BadLocationException { String style = new String(rtfSource.getLogicalStyle(rtfElement.getStartOffset()) .getName().getBytes("ISO-8859-1")); String text = new String(rtfSource.getText(rtfElement.getStartOffset(), rtfElement.getEndOffset() - rtfElement.getStartOffset()) .getBytes("ISO-8859-1")); org.w3c.dom.Element node = xmlTarget.createElement("p"); node.appendChild(xmlTarget.createTextNode(text)); node.setAttribute("style", style); xmlRoot.appendChild(node); } public void convert(String sourceFileName) throws Exception { rtfSource = new DefaultStyledDocument(); RTFEditorKit kit = new RTFEditorKit(); kit.read(new FileInputStream(sourceFileName), rtfSource, 0); xmlTarget = DocumentBuilderFactory.newInstance() .newDocumentBuilder().newDocument(); BranchElement rtfRoot = (BranchElement) rtfSource.getDefaultRootElement(); xmlRoot = xmlTarget.createElement("data"); expandElement(rtfRoot); xmlTarget.appendChild(xmlRoot); Transformer t = TransformerFactory.newInstance().newTransformer(); t.transform(new DOMSource(xmlTarget), new StreamResult(new FileOutputStream(sourceFileName + ".xml"))); } public static void main(String[] args) { if (args.length != 1) { System.err.println("Usage: *.rtf"); return; } try { new Rtf2XML().convert(args[0]); } catch (Exception e) { e.printStackTrace(); } } }


But RTFEditorKit isn't so powerful and friendly as I want.
I think, iText will be better for operations with RTF (and other document formats).

It's the good and free decision.

7 comments:

Smith said...

Very nice blog, Thanks for sharing such great information. hope you keep sharing such kind of information rtf to doc converter

Smith said...

I Find it Very interesting and supportive. Thanks for sharing such great information. hope you keep sharing such kind of information doc to rtf

Warnerhill said...

Very Nice Blog, Thanks for sharing such a nice blog. It is very simple to use while being compatible with all the popular versions of WindowsRTF to Word Converter

Smith said...

Nice blog.This Batch Word to RTF converter is an indispensable tool for those who have to convert .DOC format file to .RTF format files in bulk on day to day basis. word to rtf converter

Brandon William said...

Such an use full post, thanks very much for sharing. I would like to share and free online file converter here https://onlineconvertfree.com/convert/rtf/

John smith said...

Wow !! nice post about RTF to PDF Converter is very unique and useful to all.

Rohit said...

wow it is really simple thanks for sharing with us this well explained post