Description
Given a postfix logfile, each line contains several fields where
  • the first 3 fields constitute a timestamp,
  • field 4 is the hostname of the mail server,
  • field 5 is the name of the program/subprogram,
  • field 6 is the SMTP id of a transaction, and
  • field 7 and the following are information of the transaction.
For example,
Jan 22 18:10:00 mx postfix/smtpd[16290]: 4D6FB267DA: client=wsgw01.wasabi.net Jan 22 18:10:02 mx postfix/cleanup[15567]: 4D6FB267DA: message-id=<C387C1298A4AB44F8486C0C239F6EF723B3DDA@HKEXCHVS1.res.ghi.net> Jan 22 18:10:03 mx postfix/qmgr[4135]: 4D6FB267DA: from=<bong@brand.com>, size=28277, nrcpt=3 (queue active) Jan 22 18:10:05 mx postfix/smtp[16111]: 4D6FB267DA: to=<amber@regex.biz>, relay=127.0.0.1[127.0.0.1]:10024, delay=5.8, delays=4.1/0/0/1.7, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=15634-14, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 4F58927F92) Jan 22 18:10:05 mx postfix/smtp[16111]: 4D6FB267DA: to=<eric@regex.biz>, relay=127.0.0.1[127.0.0.1]:10024, delay=5.8, delays=4.1/0/0/1.7, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=15634-14, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 4F58927F92) Jan 22 18:10:05 mx postfix/smtp[16111]: 4D6FB267DA: to=<rosa@regex.biz>, relay=127.0.0.1[127.0.0.1]:10024, delay=5.8, delays=4.1/0/0/1.7, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=15634-14, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 4F58927F92) Jan 22 18:10:05 mx postfix/qmgr[4135]: 4D6FB267DA: removed

describes a mail transaction from <bong@brand.com> to three recipients. From above log messages, We want to get a summary consisting of the SMTP id, followed by the email addresses of the mail sender and all the recipients:
4D6FB267DA: from=<bong@brand.com> to=<amber@regex.biz> to=<eric@regex.biz> to=<rosa@regex.biz>

But the real log file is not so simple since log messages of several transactions may interleave, like INPUT, and the desired output is OUTPUT.
Script and Comments
Script1
[ 1] /^([^ ]+ +){6}(from|to)=/{
[ 2] s/^([^ ]+ +){5}([^ :]+: +[^,]+).*/\2/
[ 3] H
[ 4] }
[ 5] /^([^ ]+ +){6}removed/{
[ 6] s/^([^ ]+ +){5}([^ :]+:).*/\2/
[ 7] G
[ 8] :loop
[ 9] s/^([^:]+:)([^\n]*)\n((\n[^\n]*)*)\n\1([^\n]*)/\1\5\2\n\3/
[10] t loop
[11] P
[12] s/^[^\n]*\n//
[13] h
[14] }
[15] d
Comments -r
  1. The following approach is used:
    • If a line contains from= or to=, save the SMTP id along with that field to the temporary storage.
    • If a line contains removed, combine all fields of the same SMTP id together, print and then delete them from the temporary storage.
    • HS is used as the temporary storage. To edit the contents of HS, we have to
      • exchanges the contents of PS and HS via command `x',
      • do the modifications, and then
      • exchanges the contents of PS and HS again.
  2. If a line contains from= or to=, we have to save the SMTP id and the to/from field to HS:
    • Step [2] removes everything in PS except the SMTP id and the field.
    • Step [3] appends the result of Step [2] to HS via command `H'.
    • Since command `H' appends to HS a newline character followed by the contents of PS, and this script saves the SMTP id and to/from fields to HS via `H'. Therefore, the data in the temporary storage are of the form \n ID1: data \n ID2: data \n ID3: data..., which is matched with ^(\n[^\n]*)*$
  3. If a line contains removed, first we have to take the temporary storage every thing with the same SMTP id:
    • Step [6] removes everything in PS except the SMTP id.
    • Step [7] appends information kept in HS via command `G',
    • Steps [8] thru [10] constitute a loop which moves fields with that SMTP id to the first line of PS.
    Then,
    • Step [11] prints the SMTP id and all fields with it,
    • Step [12] deletes them, and
    • Step [13] overwrites HS with the remaining data.