Description
  • Each line of the datafile consists of two fields separated by a colon.
  • A block consists of consecutive lines whose second fields are the same.
  • For each block, print the first line of the block and discard the others.
Raw Input Desired Output
0003~4330:Abraham
0003~ 293:Abraham
0003~2464:Abraham
0003~2284:Abt
0003~2286:Abt
0003~3446:Abt-Bloch
0057~2288:van Rhyn
0057~2289:van Rhyn
0310~4417:Von Newman
0003~4330:Abraham
0003~2284:Abt
0003~3446:Abt-Bloch
0057~2288:van Rhyn
0310~4417:Von Newman
Script and Comments
Script1
[ 1] :loop
[ 2] N
[ 3] s/^([^:]*:([^\n]*))\n[^:]*:\2$/\1/
[ 4] t loop
[ 5] P
[ 6] D
Comments
  1. The `-r' option must be used to make GNU sed interpret REs as EREs.
  2. The Pattern Space is abbreviated to `PS'.
  3. The way used is to keep the first line of a block in PS until the first line of another block is reached.
  4. Assume PS contains the first line of some block, then
    • Step [2] is used to append the next line to PS.
    • If that line belongs to the same block,
      substitution of Step [3] will succeed,
      and command `t' of Step [4] will make sed branch to Step [1];
    • otherwise, that line is the first line of another block.
      Step [5] is used to print the first line of current block, and
      command `D' of Step [6] will remove it from PS,
      then make sed branch to Step [1].