Description
  • A line may contain one or more lists enclosed by a pair of parentheses.
  • Each list consists one or more members separated by commas.
  • For simplicity, we assume there are no blanks inside a list.
  • We want to remove members NOT beginning with `XX_' from every list.
Raw Input
item1 (XX_item2,item3,XX_item4) item5 (item6,XX_item7) item8
Desired Output
item1 (XX_item2,XX_item4) item5 (,XX_item7) item8
Script and Comments
Script1
[ 1] s/^[^()]*\(/&\n/
[ 2] :loop
[ 3] s/(\n[^()]*)\)/\1\n/
[ 4] h
[ 5] s/^.*\n(.*)\n.*/\1/
[ 6] s/(^|,)(XX_)/\1\n\2/g
[ 7] s/(^|,)[^\n][^ ,]*//g
[ 8] s/\n//g
[ 9] G
[10] s/^(.*)\n(.*)\n.*\n(.*)/\2\1)\n\3/
[11] s/\n([^()]*\()/\1\n/
[12] /\(\n/b loop
[13] s/\n//g
Comments
  1. The `-r' option of GNU sed must be used to make sed interpret REs as EREs.
  2. `Pattern Space' and `Hold Space' are abbreviated to `PS' and `HS', respectively.
  3. It will take N iterations if a line contains N lists.
  4. Step [1] insert a newline before the first opening parenthesis of the line. This newline is used as a mark to designate the beginning of the list to be processed.
  5. Step [3] add a newline to the end of the list to be processed and removes the enclosing parentheses.
  6. Step [4] copies the line to the HS for later use.
  7. After Step [5], the PS contains ONLY the list to be processed:
    • Step [6] inserts a newline before every member beginning with `XX_'.
    • Step [7] removes every member NOT beginning with `XX_'.
    • Step [8] removes newlines inserted by Step [7].
  8. Steps [9] takes back the line saved by Step [4] and Step [10] replaces the original list with a processed one.
  9. Step [11] moves the mark to the next list.
  10. If there are non-processed lists, jump to Step [2] to start another iteration.
  11. The first some steps on the sample is shown as follows:
    After
    Step
    Pattern Space
    1 item1 (\nXX_item2,item3,XX_item4) item5 (,XX_item7) item8
    3 item1 (\nXX_item2,item3,XX_item4\n item5 (,XX_item7) item8
    5 XX_item2,item3,XX_item4
    6 \nXX_item2,item3,\nXX_item4
    7 \nXX_item2,\nXX_item4
    8 XX_item2,XX_item4