正则表达式匹配之间最有效的方法减少输出? - Most efficient way to cut an output between regex matches?

- 此内容更新于:2015-12-20
主题:

我想解析每个设备的输出。换句话说,这个示例输出:我希望能够逐个遍历每个文件和相关信息。我正则表达式使用检测的格式??:???其他地方是:$s将是多少设备在这种格式。我使用这个自非PCI设备列在不同的格式。在这种情况下,我想我可以得到每个匹配的行号,所以管上面的声明,然后使用sed削减从一个地区到另一个,但我觉得这不会是一个有效的方法。有什么建议吗?我正在考虑的另一种解决方案是在逐行阅读,将空格转换为一些象征:例如,如果一行开始,它是包括在内。这就需要一些技巧了,因为我需要一个外部变量在循环之外。当然我也可能添加\n在正则表达式的每个实例,然后设置:鉴于选项卡,运行良好。然而,我觉得最有效的方法还是以某种方式把整个部分然后grep我需要的信息,而不是逐行与随机变量。

原文:

I'm trying to parse an

lspci -k 

output by each device. In other words, with this sample output:

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device d000
    Kernel driver in use: i915
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
    Kernel driver in use: snd_hda_intel
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
    Subsystem: Gigabyte Technology Co., Ltd Device 5001
    Kernel driver in use: mei_me

I want to be able to traverse through each file and associated information individually. My regular expression I'm using to detect the format ??:??.? elsewhere is:

grep -E '^[0-9]\w:[0-9]\w\.[0-9]' <<< "$s" | awk -F ' ' '{print $1}'

where $s would be how many ever devices on the list in this format. I was using this since I had non PCI devices listed in a different format.

In this case, I was thinking I could get the line number of each match, so pipe the above statement into

grep -n 

then using sed cut from one region to the next, but I feel this wouldn't be an efficient way of going about this. Any suggestions?

Another solution I'm considering is reading in line-by-line and converting the whitespace into some symbol: e.g.

tr ' ' '%' 

and if a line starts with that, it is included. This could get tricky however, because I would need an external variable outside the loop. Of course I could also possibly add a \n after each instance of the regex and then just set the:

IFS=$'\n'

Given that they are tabbed, a

tr $'\t' 'x'

works well. However, I feel the most efficient way is still to somehow cut an entire section then grep the information I need, as opposed to going line-by-line with random variables.

楼主:我想会有一个简单的方法,但并不是完全确定究竟与awk。我必须设置IFSregex(如果可能的话)或相反的缺乏\t

(原文:I figured there'd be a simple way to approach this, but not entirely sure what exactly to do with awk. I'd have to set IFS to regex (if possible) or conversely set it to the lack of a \t)

解决方案:
下面的代码将从每个条目分为部分:通过设置制表符输入字段分隔符,我们可以确定哪些行是一个新的部分的开始,他们有多少领域;每个部分只有1场的开始。块中的代码演示了数组中的每个字段可以达到使用两个指标部分号码和数量。它只是遍历每一个但是你可以定制打印给定的字段的逻辑如果匹配一个模式,例如。
原文:

The following code splits each entry from lspci -k into sections:

$ /sbin/lspci -k | awk -F'\t' 'NF == 1 { ++n; f = 0 } { a[n, ++f] = $NF } 
END { 
    for (i = 1; i <= n; ++i) { 
        print "section", i; f = 0; while (a[i, ++f]) print a[i, f]; print "" 
    }
}'

By setting the input field separator to a tab character, we can identify which lines are the start of a new section by how many fields they have; the start of each section only has 1 field.

The code in the END block demonstrates the fact that each field can be reached in the array a using the two indices section number and field number. It just loops through each one but you could customise the logic to print a given field if it matched a pattern, for example.