xml parsing - In Perl, extract text from related nodes, using XML::Twig -


following xml file want parse:

<?xml version="1.0" encoding="utf-8"?>  <topic id="yerus5" xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/">    <title/>   <shortdesc/>   <body> <p><b>ccu_cnt_addr: (address=0x004 reset=32'h1)</b><table id="table_r5b_1xj_ts">     <tgroup cols="4">       <colspec colnum="1" colname="col1"/>       <colspec colnum="2" colname="col2"/>       <colspec colnum="3" colname="col3"/>       <colspec colnum="4" colname="col4"/>       <tbody>         <row>           <entry>field</entry>           <entry>offset</entry>           <entry>r/w access</entry>           <entry>description</entry>         </row>         <row>           <entry>reg2sm_cnt</entry>           <entry>15:0</entry>           <entry>r/w</entry>           <entry>count value increment in extenral memory @ specified location.             default value of 1. count value of 0 clear counter value</entry>         </row>         <row>           <entry>ccu2bus_endianess</entry>           <entry>24</entry>           <entry>r/w</entry>           <entry>endianess of data structure bit</entry>         </row></tbody>     </tgroup>   </table><b>ccu_stat_addr: (address=0x008 reset=32'h0)</b><table id="table_mcc_1xj_ts">     <tgroup cols="4">       <colspec colnum="1" colname="col1"/>       <colspec colnum="2" colname="col2"/>       <colspec colnum="3" colname="col3"/>       <colspec colnum="4" colname="col4"/>       <tbody>         <row>           <entry>field</entry>           <entry>offset</entry>           <entry>r/w access</entry>           <entry>description</entry>         </row>         <row>           <entry>fifo_cnt</entry>           <entry>1:0</entry>           <entry>r</entry>           <entry>status. 0x0 indicates engine free. 0x1 on write             address</entry>         </row>         <row>           <entry>rfifo_cnt</entry>           <entry>3:2</entry>           <entry>r</entry>           <entry>status. 0x0 indicates there no pending read values ccu engine.</entry>         </row> </tbody>     </tgroup>   </table></p>   </body> </topic> 

after running following code (available @ in perl, xml::simple not able dereference multi dimensional associative array parsed data::dumper):

        use strict;     use warnings;     use xml::twig;      use data::dumper;      @headers;      $column_to_show = 'field';      sub process_row {         %entries;          ( $twig, $row ) = @_;         @row_entries = map { $_->text } $row->children;         if (@headers) {             @entries{@headers} = @row_entries;             print $column_to_show, " => ", $entries{$column_to_show}, "\n";         }         else {             @headers = @row_entries;         }     }      $twig = xml::twig->new(     'pretty_print' => 'indented_a',     twig_handlers  => { 'row' => \&process_row } )->parsefile ( 'your_file.xml' );  

i able access each data of <entry></entry>.

i not able extract details particularly each <b></b> text. yes, able extract <b></b> text. not able extract <row></row> each <b></b> separately. following sample output:

name: ccu_cnt_addr: (address=0x004 reset=32'h1) field: reg2sm_cnt  offset: 15:0  access: r/w  description: count value increment in extenral memory @ specified location. default value of 1. count value of 0 clear counter value   filed: ccu2bus_endianess  offset: 24  access: r/w  description: endianess of data structure bit   .  .  .  .  .  .  . name: ccu_stat_addr: (address=0x008 reset=32'h0)  field: fifo_cnt  .  .  .  .  .  .  . 

i tried following not working:

foreach $b ( $twig -> get_xpath ("//b") ) # extract text of <b></b> {  print $b ->text, "\n";     foreach $row ( $twig -> get_xpath ("//row") )     {         print $row ->text, "\n";     } } 

ok, given example - it's irritating, because xml doesn't explicitly associate 'heading' 'table' (e.g. encapsulating them within xml node).

however can use prev_sibling method previous element @ same level.

#!/usr/bin/env perl use strict; use warnings; use xml::twig;  $twig = xml::twig->new->parsefile ( 'your_file.xml' );  foreach $table ( $twig->get_xpath('//table') ) {     $header = $table->prev_sibling->text;     print "name: $header\n";     @headers;     foreach $row ( $table->get_xpath("tgroup/tbody/row") ) {         %entries;         @row_entries = map { $_->text =~ s/\n\s+//rg; } $row->children;         if (@headers) {             @entries{@headers} = @row_entries;             foreach $field (@headers) {                 print "$field: $entries{$field}\n";             }         }         else {             @headers = @row_entries;         }     }     print "----\n"; } 

note - assumes 'element before table' header. works in specific case, work if there always element directly preceding <table> want display.

  • we run 'foreach' loop, picking out elements called table (of there 2 in sample.
  • each table, assume previous sibling element header. in case, that's <b> elements. wary of though, <b> denotes bold in html , formatting tag.
  • we same thing otherwise - each table, decompose rows such have header , bunch of columns, , print them 1 per line.
  • as part of doing this, use regex remove 'linefeed , whitespace' (s/\n\s+//gr) because formatting on description looked bit 'off'. can remove if it's undesired. (note - works on newer perl versions - 5.14+ iirc)

this produces:

name: ccu_cnt_addr: (address=0x004 reset=32'h1) field: reg2sm_cnt offset: 15:0 r/w access: r/w description: count value increment in extenral memory @ specified location.default value of 1. count value of 0 clear counter value field: ccu2bus_endianess offset: 24 r/w access: r/w description: endianess of data structure bit ---- name: ccu_stat_addr: (address=0x008 reset=32'h0) field: fifo_cnt offset: 1:0 r/w access: r description: status. 0x0 indicates engine free. 0x1 on write toaddress field: rfifo_cnt offset: 3:2 r/w access: r description: status. 0x0 indicates there no pending read values ccu engine. ---- 

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -