Description
Two lines are said to be `almost identical' if they are the same after removing all blanks. For example, T h i s   i s   a   test is `almost identical' to This is a test.
Raw Input Desired Output
first record
first r e c o r d
f i r s t  r e c o r d
s e c o n d record
third r e c o r d
t h i r d  record
final record
first record
s e c o n d record
third r e c o r d
final record
Script and Comments
Script1
[ 1] G
[ 2] h
[ 3] s/ //g
[ 4] /^(.*)\n\1$/{
[ 5] x
[ 6] s/^.*\n//
[ 7] }
[ 8] /\n/{
[ 9] g
[10] 1!s/^.*\n//p
[11] g
[12] s/\n.*//
[13] }
[14] $q
[15] h
[16] d
Comments
  1. The `-r' option of GNU sed must be used to make sed interpret REs as EREs.
  2. The Pattern and the Hold Space are abbreviated to PS, and HS, respectively.
  3. We refer to a group of consecutive `almost identical' lines as an `identical' group.
  4. Each time the first line of an identical group is found, we keep it in HS for later comparison.
  5. After a line was read, it is compared with the line kept in HS:
    • If they are `almost identical', skip the current line and start a new cycle.
    • Otherwise, the current line is the first line of another identical group. Print the kept line and then kept the current one.
  6. Step [1] appends the kept line from HS to PS. Now PS cosists of the current line followed by the kept line, separated by a newline character.
  7. Step [2] overwrites the contents of HS with PS.
  8. PS now contains the current and the kept line. To determine whether they are `almost identical', Step [3] is used to remove every blank.
  9. If both lines are `almost identical', only the kept line has to be kept. Remember that Step [2] saved both lines to HS, since we can not modify HS directly, we use
    • Step [5] to exchange the contents of PS and HS, then
    • Step [6] to delete the current line. After this step, PS contains no newline character, and Steps [8] thru [13] will be skipped.
    • Step [15] to overwrites HS with PS, and
    • Step [16] to start a new cycle to read the next line.
  10. Otherwise, both lines are NOT `almost identical', `print the kept and keep the current line' is done via
    • Step [9] copies the current and the kept line from HS,
    • Step [10] deletes the current one then print. (The `p' flag of s/// command prints the contents of PS if the substitution succeeded.)
    • Step [11] copies from HS the saved lines,
    • Step [12] deletes the kept one, and,
    • Step [15] to overwrites HS with PS. After this, HS contains the current line.