2024 twenty-four merry days of Perl Feed

A Time-Tested Powerhouse for Processing XML

XML::Twig - 2024-12-09

XML was the preferred communication language used by services in the early 2000s. During that time, governments were establishing their own e-government systems, while companies were developing their SOAP services. Then, BOOM! Services began to adopt JSON because it was a lightweight and efficient alternative to XML. However, even though JSON became the new standard, old services were still in use and being maintained.Rewriting a system from scratch is not easy, and it might not even be necessary. That might be why XML is still in use, or perhaps governments and companies simply adopt the "as far as it goes" mindset. Who knows! Let's parse some XML and beat that services up!

Here is where the sweetest and most beloved Perl library, XML::Twig, comes into play.

Parsing XML File

Suppose we provide a service to manage internet access for a hotel’s guests. The hotel's property management system stores its guests in a file using XML. We need to read that file in order to check if the user is a guest of the hotel. In order to gain access to internet a user should type their id and wifi password correctly.

The following is the XML format:

<?xml version="1.0" encoding="UTF-8"?>
<guestList>
  <guest room="201" wifiKey="ryan1234" name="Michael" surname="Scott" idType="P" id="CB5634431" gender="M" country="USA" checkIn="2024-11-29 14:03:23" checkOut=""></guest>
  <guest room="202" wifiKey="X0K7F2ie!" name="Dwight" surname="Schrute" idType="P" id="AB3056430" gender="M" country="USA" checkIn="2024-11-29 14:03:23" checkOut=""></guest>
  <guest room="305" wifiKey="SNbsnz" name="Selim Serhat" surname="Celik" idType="TC" id="00011122233" gender="M" country="TR" checkIn="2024-11-29 14:03:23" checkOut=""></guest>
  <!-- Other guest entries can follow here -->
</guestList>

This file is located at hotelpms/guests.xml. To parse file we need to provide this file path to parsefile function, the library also has a function named parse. According to documentation if your job is with file you should use parsefile but if you want to parse a string that contains whole XML document then you could use parse.

use strict;
use warnings;
use diagnostics;
use XML::Twig;

sub check_guest {
  my $id = shift;
  my $password = shift;

  my $twig = XML::Twig -> new();
  $twig -> parsefile( "/hotelpms/guests.xml") || # handle error;

}

In order to access guest tags, we first need to obtain guestList, which is the root of this XML document. The $twig->root method returns the direct parent of all other elements. On the other hand, the children method returns list of elements. The method can take an optional argument. If a string is passed to the method, XML elements that match the string will be fetched; otherwise, all elements of the current root will be fetched in document order. The returned list contains elements that are instances of the XML::Twig::Elt class. So, this means you can perform any operations that the XML::Twig::Elt class allows.

As you can see, information we are looking for is the attributes of the XML element tag. To reach the name of a person we must access name attribute. XML::Twig::Elt has various functions on attributes. The att method will do the job. To note that, because the XML we have is automatically generated by another system, so we haven't checked if the element has the attribute. To check if an element has a specific attribute, you can call the att_exist it will return true if the attribute exists for the element, false otherwise.

sub check_guest {
# previous lines above

  my @guests = $twig->root->children('guest');
# $guests[0]->att('name') will return Michael
}

If we put all the information we’ve discussed together, the function would look like this.

sub check_guest {
  my $id = shift;
  my $password = shift;

  my $twig = XML::Twig->new();
  $twig->parsefile( "/hotelpms/guests.xml" ) || return 0; # handle error
  my @guests = $twig->root->children('guest');

  foreach my $guest (@guests) {
    return 1 if (($guest -> att('id') eq $id) && ($guest -> att('wifiKey') eq $password));
  }

  return 0;
}

Parsing XML Response

Suppose that there are servers somewhere in this universe. We don't know exactly when the answer will be returned it changes depending on the data of our request, but we know the exact XML response format. It is said that this server provides answers about the Earth's past based on what it observes. So, demanding information related to an much earlier date will take much longer to travel due to the speed of light and it is open anyone to send request. A researcher wants to find out why the stones at Gobeklitepe were buried. Researcher knows that when the answer reach researcher will not be alive. All conditions have been maintained for the message to return, and a system was designed to await all requests by some world organization. The language selected was Perl.

Here is expected response format:

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: length

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <observeResult xmlns="http://universe.com/">
      <query>string</query>
      <observation>
        <year>int</year>
        <nation>string</nation>
        <purpose>string</purpose>
        <details>string</details>
      </observation>
    </observeResult>
  </soap:Body>
</soap:Envelope>

The first_child function in the XML::Twig module is used to retrieve the first child element of the current element in the XML document. XML::Twig::Elt class has text method that extracts the content from inside tag.

use strict;
use warnings;
use diagnostics;
use XML::Twig;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(POST => "http://universe.com?query=gobeklitepe" );
$req->content_type('text/XML; charset=utf-8');
$req->header(SOAPAction => 'http://universe.com/observations');

my $res = $ua->request($req);

my $xml_response = undef;

if ($res -> is_success) {
  $xml_response = $res->content;
}

my $twig = XML::Twig->new();
$twig -> parse($xml_response) || die "Unable to parse XML document";

my $observation = $twig->root->first_child('soap:Body')->first_child('observeResult')->first_child('observation');

# Reaching observation data one by one
my $year = $observation->first_child('year')->text;
my $nation = $observation->first_child('nation')->text;
my $purpose = $observation->first_child('purpose')->text;
my $details = $observation->first_child('details')->text;

Happy holidays!

Gravatar Image This article contributed by: Emine Sule Celik <esulecelik@gmail.com>