Home

XML File Corpus (back to pubs)

E-mail 

 

This page provides hyperlinks to the XML binary formats, tools, and compressors described in the paper:

 

Chris Augeri*, Barry Mullins, Dursun Bulutoglu, Rusty Baldwin, and Leemon Baird. “An analysis of XML compression efficiency”, in the Proceedings of the 1st Workshop on Experimental Computer Science (ExpCS), San Diego, CA, ACM, 13–14 June 2007.

files / links: supporting material, paperslidesproceedings, conference, citations — 1, 2, 3, 4

 

DISCLAIMER: All links, sample usages, etc., were as used in the paper—please contact me if you have any questions.

 

Support Tools

Tool

Output

Tidy

Clean up XML files

Meagher

Calculate 0-order entropy

CoLinux

Linux emulator, also 1, 2, 3

XML Stats

Reports various statistics, e.g., # lines, characters, etc.

The Wayback Machine

The internet archive, used to find a version of XML-ZIP

SAXON

XML Schema extractor

JMP

Statistical analysis

Gateway E-series 6300

The computer series used to run our tests

 

Compressors &/or Binary Formats

Compressor

Sample Usage(s)

BZIP2

BZIP2 -k -f -v foo.xml

CACM3

arith -e -t word -m 255 -c 20 foo.xml 1 > foo.cac

FISSun, ASN.1

java -cp FastInfoset.jar com.sun.xml.fastinfoset.tools.XML_SAX_FI foo.xml foo.fis

Gzip (2)

GZIP -9 -c -f -v foo.xml 1>foo.gzp

PAQ

pasqda -7 foo.paq foo.xml

PPMd (Russian creator)

wzzip –ep foo.ppd foo.xml

PPMZ2

ppmz2 -e foo.xml foo.ppm

WBXML

libwbxml

Trantor

kXML

xml2wbxml -k -o foo.wbx foo.tdy

WinZip (PPMd)

wzzip -en –ee foo.wzp foo.xml

XBIS

java -Dorg.xml.sax.driver=com.bluecast.xml.Piccolo -cp Piccolo.jar;saxxbis.jar;.; test.RunTest XBIS foo.xml

XGrind

./compress foo.tdy H V A N

XMill (1 2)

xmill -f -v -w foo.xml

xmill -f -v -w -p "//(*)" foo.xml

xmill –m 470 -9 -f -v -w foo.xml

XMLPPM

xmlppm foo.xml foo.xpm

XML-ZIP: no longer posted

java -classpath xml4j.jar;. XMLZip foo.xml 2

 

 

NOT TESTED (links added if available/provided &/or as time permits)

Efficient XML (basis of EXI), MPEG-7 (BiM), XML-Xpress, XCQ, XPRESS, OpenGIS BXML (CWXML), Oracle, IBM DB2, MS SQL Server, Millau, AXECHOP, XCOMP, XCpaqs, XCQ