How to update XML file without reformatting it

Recently, I needed to modify content of a few thousands XML files. The modification itself was nontrivial and consisted of reading two attributes from an element and then updating a value of a previously read attribute. And to make it even more interesting, I wanted to keep the formatting of all files intact. My goal was creating a reusable utility that could be used also in the future, should such a need rise again.

Because of the need to preserve the formatting, I rejected all tools I usually use (Dom4j, SAX, Jaxb) and started to search for a suitable tool. Finally I found the answer in a form of a recommendation on http://stackoverflow.com, the tool recommended was VTD-XML (or on github).

The home page looked promising and it's advantage list is probably written by Horst Fuchs. I highly recommend reading it. But I cared more about the actual usage of the tool. The most important part of the page for me was "Code samples".

You can download the library directly, or if you are using maven, just add the dependency.
<dependency>
  <groupid>com.ximpleware</groupid>
  <artifactid>vtd-xml</artifactid>
  <version>2.11</version>
</dependency>
The first part of the code is quite simple, you need to create instances of several key classes and link them together.
import com.ximpleware.*;

VTDGen vg = new VTDGen(); // base instance
vg.parseFile("path_to_file", true); // file load
VTDNav vn = vg.getNav(); // navigator
XMLModifier xm = new XMLModifier(vn); // object handling modifications
AutoPilot ap = new AutoPilot(vn); // object handling search
Once you acquire those instances, you can process the file.
ap.selectElement("element"); // element we are searching
while (ap.iterate()) { // loop for each "element" found
   int idLoc = vn.getAttrVal("id"); // find value of attribute id
   int controlLoc = vn.getAttrVal("control"); // find value of control
   String id = idLoc != -1 ? vn.toString(idLoc) : null; // read id
   String control = controlLoc != -1 ? vn.toString(controlLoc) : null;

   if (id == null || control == null) {
      // not the element we are looking for
      continue;
   }
   // for demonstration purpose, extend control by id
   xm.updateToken(controlLoc, control + id);
}
// finally save the modified file
xm.output(new FileOutputStream("path_to_new_file"));
What you can see is that there are no placeholder objects present. The current element is completely hidden. The attributes are represented only by their locations and if you want to read the actual value, you need to make another call of the navigator.

This code snippet can be used on a file like this.
<?xml version='1.0' encoding='UTF-8'?>
<dataset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../../dataset.xsd">
   <element id="8514903" version="1" control="123"/>
   <element id="8514905" version="1" />
</dataset>

Conclusion

While the code of this library is not as easy to read as the code using Dom4j, it does have other features. If you are looking for a library that offers speed, low memory footprint, or you just need to update XML file a little, without reformatting them. This library might be exactly what you are looking for.

Comments

  1. Having testing accomplished by iTech Labs ensures that games and gaming systems adjust to all relevant requirements, and that they are honest, dependable and resilient. Casino.org is the world’s main impartial on-line gaming authority, 정카지노 providing trusted on-line on line casino news, guides, critiques and data since 1995. To make a deposit, you may want your financial institution details at hand. You'll additionally want to give the net on line casino private information corresponding to your name, tackle, date of delivery and so on. Try video poker free of charge and be taught the basic of the sport. Choose from over forty free games from main providers like NetEnt and Betsoft proper here.

    ReplyDelete
  2. Casino.org je samozřejmě autoritou ve světě hazardních her. Specializují se však na hazardní hry obecně, a to po celém světě. Pro místní kasina v České republice je lepší použít místní webové stránky, například NewCasinoCZ - poskytují informace konkrétně o českých online kasinech.

    ReplyDelete

Post a Comment

Popular posts from this blog

Automatic jsp recompile on Jboss AS 7

Running scripts written in Clojure as an executable using Inlein