1. Perl
  2. Module
  3. here

XML::Simple - Easy to parse XML

The XML::Simple module allows you to easily parse XML and convert it into Perl data structures. "SAX that analyzes sequentially from the beginning" and "DOM that builds an XML tree structure" are famous as XML analysis methods, but XML::Simple is the best in terms of easy handling.

XML is a common format for writing data in text files. It can be used when writing a program configuration file.

# Load module and create object
use XML::Simple;
my $xml = XML::Simple->new;

Use the XMLin method to read the XML file. XML is transformed into Perl data structures (multidimensional hashes and arrays).

# XML file parsing
my $data = $xml->XMLin('conf.xml');

This is an example XML of the configuration file. Imagine an application that edits multiple logs on multiple systems. Save the file in UTF-8.

<? xml version = "1.0" encoding = "UTF-8"?>
<conf>
  <!-Version 1.01->
  <!-Application home directory->  <app-home-dir> c:\apphome </app-home-dir>

  <!-Library directory->  <lib-dir> lib </lib-dir>

  <!-Directory to store logs->  <log-dir> log </log-dir>

  <!-System type->  <system> xxx yyy zzz </system>

  <!-Log type->  <log-types>
    <cpu script = "parse_cpu.pl" from-dir = "raw/cpu" to-dir = "edit/cpu" />
    <mem script = "parse_mem.pl" from-dir = "raw/mem" to-dir = "edit/mem" />
    <gclog script = "parse_gclog.pl" from-dir = "raw/gclog" to-dir = "edit/gclog" />
  </log-types>
</conf>

This file is converted to the following data structure.

$conf = {
  'log-types' => {
    'cpu' => {
      'to-dir' =>'edit/cpu',
      'script' =>'parse_cpu.pl',
      'from-dir' =>' raw/cpu'
    },
    'gclog' => {
      'to-dir' =>'edit/gclog',
      'script' =>'parse_gclog.pl',
      'from-dir' =>' raw/gclog'
    },
    'mem' => {
      'to-dir' =>'edit/mem',
      'script' =>'parse_mem.pl',
      'from-dir' =>' raw/mem'
    }
  },
  'system' =>'xxx yyy zzz',
  'log-dir' =>'log',
  'app-home-dir' =>' c: \\apphome',
  'lib-dir' =>'lib'
};

Handling Japanese in XML::Simple

When a string containing Japanese is read by XMLin, the character code specified by the encoding attribute of the XML declaration is converted to the decoded string. Note that it is not a byte string.

<? xml version = "1.0" encoding = "UTF-8"?>
<conf>
  <name> Masuda </name>
</conf>

Since it is converted to an decoded string, if you want to output it, you need to convert it to a byte string using the encode function of the Encode module.

# When outputting, convert from decoded string to byte string
use Encode 'encode';
my $conf = $xml->XMLin('japanese.xml');
print encode('UTF-8', $conf->{name});

XML::Simple options

XML::Simple allows you to specify options in the constructor arguments.

# Option
my $xml = XML::Simple->new(ForceArray => 1);

The following two options may be used frequently.

Option name Explanation
ForceArray Make sure nested elements are array reference
KeyAttr Treat the specified attribute as a hash key

XML::LibXML - Standard and fast XML parsing

XML::Simple is not suitable for parsing complex XML. If you want to parse XML in a standard and fast way, you can choose the XML::LibXML module. XML::LibXML can read XML as a DOM tree. You can also access each element of XML using a method called XPath.

This method requires more coding than XML::Simple and is technically difficult, but it is the best method for some requirements.

What to do if the XML::LibXML module cannot be installed on CentOS

If the XML::LibXML module installation fails on CentOS, the libxml2-devel package is missing and you should install it.

yum -y install libxml2-devel

Related Informatrion