Raw Input
Given a log file, where each line consists of several fields separated by spaces. The second field contains a timestamp, in either HHMM or MM format (H: hour, M:minute). For example,
httpd: 1125 access / from 192.168.0.10
httpd: 30 access /dir1 from 192.168.0.25
httpd: 32 access /dir2 denied from 192.168.0.10
httpd: 41 access /dir1/file2 from 192.168.0.4
httpd: 1144 access / from 192.168.0.20
httpd: 52 access /dir2 denied from 192.168.0.14
httpd: 1205 access /dir1/file2 from 192.168.0.7
httpd: 12 access /dir2/file1 denied from 192.168.0.13
Desired Output
Some timestamps lack the Hour parts. What we want to do is adding the missing Hour parts back to them:
httpd: 1125 access / from 192.168.0.10
httpd: 1130 access /dir1 from 192.168.0.25
httpd: 1132 access /dir2 denied from 192.168.0.10
httpd: 1141 access /dir1/file2 from 192.168.0.4
httpd: 1144 access / from 192.168.0.20
httpd: 1152 access /dir2 denied from 192.168.0.14
httpd: 1205 access /dir1/file2 from 192.168.0.7
httpd: 1212 access /dir2/file1 denied from 192.168.0.13
Script and Comments
Script1 [sed]
[ 1] /^[^ ]* [0-9]\{4\}/!{
[ 2] G
[ 3] s/^\([^ ]* \)\(.*\)\n\(.*\)/\1\3\2/
[ 4] b
[ 5] }
[ 6] h
[ 7] s/^[^ ]* \(..\).*/\1/
[ 8] x
Comments
  1. Each time a line with 4-digit timestamp is met, we will save its Hour part in the Hold Space by Step [6] thru [8]:
    • Step [6] will save the current line in Hold Space.
    • Step [7] will keep the Hour part and discard the others.
    • Step [8] will save the Hour part in Hold Space and take back the current line. Since the end of script is reached, sed will print the contents of Pattern Space, which now contains the current line.
  2. The following script can do the same job.
Script2 [sed]
[ 1] /^[^ ]* [0-9]\{4\}/!G
[ 2] s/^\([^ ]* \)\(.*\)\n\(.*\)/\1\3\2/
[ 3] t
[ 4] h
[ 5] s/^[^ ]* \(..\).*/\1/
[ 6] x