Description
We want to extract every IMG element from an HTML file,
where
- There may be more than one IMG elements in a line.
- An IMG element may span more than one lines.
- Assume that no literal >
exists in attribute values. To take into account literal
> enclosed in double quotes,
for example,
<img src=url alt="a > test...">,
please visit p.20090921a.html.
|
| Raw Input
| This is the first image <IMG src=url_1 width=200 height=400>, followed
by the second <IMG src=url_2> and third <IMG src=
url_3 width=220 height=220>. There may be several IMGs in one line, like
<IMG src=url_4>, <IMG src=url_5> and <IMG src=
url_6 width=250 height=300
alt="test">.
|
|
| Desired Output
| <IMG src=url_1 width=200 height=400>
<IMG src=url_2>
<IMG src= url_3 width=220 height=220>
<IMG src=url_4>
<IMG src=url_5>
<IMG src= url_6 width=250 height=300 alt="test">
|
|
Script and Comments
Script1 [ 1] /<IMG( |$)/!d
[ 2] s/<IMG( |$)/\n&/
[ 3] s/^[^\n]*\n//
[ 4] :loop
[ 5] $!{
[ 6] />/!N
[ 7] />/!b loop
[ 8] }
[ 9] s/\n/ /g
[10] s/>/&\n/
[11] P
[12] D
| |