Description
  • Each graph consists of nodes and edges where an edge is used to connect two nodes.
  • For example, the graph in the right is a tree with 10 nodes.
  • The arrow of an edge is used to specify the direction of that edge. For example, is an edge starting from node A to node B. That information is stored as A,B, which is called a node pair. Note that A,B is not the same as B,A.
  • In this way, we can store the graph in the right as
    A,B A,C A,D C,D B,E D,F D,G D,H F,I G,I G,J, where two adjacent node pairs are separated by one or more spaces.
  • Since the order of node pairs is irrelevant, it can be stored as C,D A,D A,B A,C B,E D,F D,G G,I G,J D,H F,I,...
  • The name of a node consists of one or more non-space, and non-comma characters.
We want to get the nodes without outgoing edges, that is, nodes without edges starting from them.
Raw Input Desired Output
A,B A,C A,D C,D B,E D,F D,G D,H F,I G,I G,J
E H I J
Script and Comments
Script1
[ 1] s/^([^ ,]+ +)*([^ ,]+),/\2\n&/
[ 2] /\n/!b
[ 3] :loop
[ 4] s/^([^\n]+)\n(.* |)\1(,?( +|$)|,)/\1\n\2/
[ 5] s/^([^\n]+)\n(.*,)\1( +|$)/\1\n\2\3/
[ 6] t loop
[ 7] D
Comments -r
  1. This script use the following approach:
    • Find a node that has an edge starting from it, then remove it from the data.
    • Repeat the above step till there are no nodes having edges starting from them.
    • A,B not only states that its an edge from A to B, but also implies that A has at least one edge starting from it. Therefore, A must be removed.
    • In this way, if any instance of a node is followed by a comma, we have to remove from the data every instance of it.
    • Step [1] tries to find such a node. If found:
      • Step [1] inserts its name to the beginning of PS, and separates it from the data with a newline character.
      • Then the loop consisting of Steps [3] thru [6] removes from the data every instance of that node.
      • For example, if we want to remove node A from the data, we have to take into account the context where A resides:
        Case before removal after removal what was removed taken care by
        1 A,B B A and the comma Step [4]
        2 A,(+|$) empty A, the comma, and the following spaces Step [4]
        3 A(+|$) empty A, and the following spaces Step [4]
        4 C,A(+|$) C,+ only A, the comma and the spaces are kept Step [5]
      • In the first three cases, we can use A(,?(+|$)|,) to match the parts to be removed.
      Since there may exist several instances of the same node in the output, you may replace Step [2] with the following block to remove the redundant instances:
      /\n/!{ s/^ *[^ ]+/&\n/ :0 s/([^ ]+)\n(.* )\1( *|$)/\1\n\2/ t 0 s/\n( +[^ ]+)/\1\n/ /\n$/!b 0 s/ +/ /g s/^ | $|\n// q }
      You may visit Remove redundant instances from a list for this topic.