为什么regexp只找到一个匹配,当有更多? - Why does Regexps find only one match, when there're actually more?

- 此内容更新于:2015-12-20
主题:

如果实验是“aaa”的文本和regexp模式”(a)+”——为什么我只有一场比赛:“aaa”?为什么我不让:“a”、“aa”、“aaa”(所有从位置0)“a”(从位置1)“aa”(从位置1)“a”(从位置2)?应该我一直介意一些虚构的光标,发现一场比赛——移动到终点,和新的搜索继续从这个新的职位?它与所谓的懒惰quantificators,如果我没弄错了。如果我使用“(a)+?“第一种模式,而是我得到3比赛,这是你所希望的。但这和我之前描述的无关。有可能得到所有直接发生吗?

原文:

If the text for the experiment is "aaa" and the regexp pattern "(a)+" - why do I get only one match: "aaa"?

Why don't I get:

  • "a", "aa","aaa" (all starting from position 0)
  • "a"(starting from position 1)
  • "aa" (starting from position 1)
  • "a" (starting from position 2)?

Should I keep on mind some imaginary cursor that having found one match - moves to the end of it, and the new search goes on from that new position?

It has nothing to do with the so called lazy quantificators, if I'm not mistaken.

If I use "(a)+?" instead of the first pattern, I get 3 matches, which is actually what you expect. But it has nothing to do with what I described before.

Is it possible to get all the straight forward occurrences?

网友:你用的哪种语言或正则表达式工具?或者这是一个简单的正则表达式理论问题吗?

(原文:Which language or regex tool are you using? Or is this simply a regex theory question?)

网友:我玩regexr.com,在想如果有可能的话,要珍惜这段我预期的方式。虽然有点违反直觉,但我猜..

(原文:I was playing with regexr.com and was wondering if it's possible at all to make it count the way I expected. Although it's a bit counterintuitive, I guess..)

网友:是的,它是可以让它做你想做的事情。然而,答案是语言相关的。如果你想看到一个回答为一个特定的语言请更新你的问题所需的语言。

(原文:Yes, it is possible to get it to do what you want. However, the answer is language dependent. If you wish to see an answer for a specific language please update your question for the desired language.)

解决方案:
这就是所谓的“贪婪”匹配。这是一个正则表达式(你的)是如何工作的。(实际上)说,如果你找到一个意味着“一个或多个”,继续追加任何对接,直到你不找一个(对接),那么,只有现在的比赛。因此,“贪婪”。得到你想要的/预计将需要一个更复杂的正则表达式,可能在一个循环内一个perl/awk/python/等。脚本。就是说,你最初的匹配,然后使用一个循环,输出你想从匹配文本的一切。
原文:

This is called a "greedy" match. It's how a [your's] regex works.

(a)+ says [in effect], if you find an a [+ means "one or more"], keep appending any abutting as until you don't find an [abutting] a--then [and only then] present the match. Hence, the "greediness".

To get what you want/expect will require a more complex regex, possibly inside a loop inside a perl/awk/python/etc. script.

That is, you do the initial match and then using a loop, output all the things you want from the match text.

解决方案:
找到匹配的字符串通常是以这种方式在大多数(如果不是全部)的语言。它开始匹配字符串的开始。它试图匹配正则表达式,并将“光标”匹配的字符。一旦找到匹配,它开始从下一个字符,它不移动“光标”。还要注意,有两种类型的匹配量词是量词之一。贪婪的将匹配尽可能多的字符(示例)和懒惰尽可能少(在你的例子)。但是比赛是不可能的——这是贪婪和懒惰。得到所有你想看到的比赛可能使用其他编程语言构造正则表达式。您还需要限制匹配实现not-greedy-nor-lazy匹配的边界。
原文:

Finding matches in a string usually works this way in most (if not all) languages. It starts matching at the beginning of a string. It tries to match against the regular expression and moves the "cursor" as the matching characters are found. Once the match is found, it starts again from the next character, it doesn't move the "cursor" back.

Also note that there are 2 types of matches for quantifiers (+ is one of the quantifiers). Greedy will match as many characters as possible (aaa in your example) and lazy as few as possible (a in your example). But the match of aa is not possible - it's neither greedy nor lazy.

Getting all the matches you'd like to see may be possible using some programming languages constructs other then regular expressions. You would need to also limit match's boundaries to achieve your not-greedy-nor-lazy match.