VB。NET帮助:除了符号分割的字符串 - VB.NET help: splitting the string except for symbols

- 此内容更新于:2015-12-20
主题:

我试着用这两个代码:和我的示例查询是“一只狗。“注意”之间有一个空格狗”和“。“当我检查splitQuery的长度,它给了我3和分裂的话,狗,和“。“我怎么能阻止计数”。“和其他符号词?我想单词/术语(字母数字)只存储在我的splitQuery数组。谢谢。

原文:

I tried using these 2 codes:

Dim splitQuery() As String = Regex.Split(TextBoxQuery.Text, "\s+")

and

Dim splitQuery() As String = TextBoxQuery.Text.Split(New Char() {" "c})

My example query is "a dog ." Notice there's a single space between "dog" and "." When I check the length of splitQuery, it gives me 3 and the split words are a, dog, and "."

How can I stop it from counting "." and other symbols as word? I want words/terms (alphanumeric) only to be stored in my splitQuery array. Thanks.

楼主:你打算削减非单词字符的字符串的结束?只使用然后分手。

(原文:Do you intend to trim the string from the non-word chars at the end? Just use Regex.Replace(str, "\W*$", "", RegexOptions.RightToLeft) and then split with \s+.)

楼主:这段代码为你工作吗?

(原文:Does this code work for you?)

网友:这个最有效的回答。谢谢你这么多!

(原文:this the most efficient answer. Thank you so much!)

楼主:我有了一个答案,请考虑接受。

(原文:I have posted an answer, please consider accepting.)

解决方案:
我建议做2步骤:用于删除非单词字符字符串的结束之后,分割:
原文:

I suggest doing that in 2 steps:

  • Use txt = Regex.Replace(TextBoxQuery.Text, "\W*$", "", RegexOptions.RightToLeft) to remove the non-word characters from the end of the string

  • Then, split with \s+: splits = Regex.Split(txt, "\s+")

解决方案:
您还应该能够创建一个字符串的字符,修剪stringsplitoptionRemoveEmptyEntries。
原文:

you should also be able to create a string of unwanted characters and trim them with a stringsplitoption to RemoveEmptyEntries.

dim unwanted as string = "./?!#"
Dim splitQuery() as string = yourString.Trim(unwanted.tochararray).Split(New Char() {" "c}), StringSplitOptions.RemoveEmptyEntries)
网友:我想这需要时间列出所有不必要的字符特别是所有的符号,我可能会忘记一些。我还是谢谢你的回答。我可以申请在其他情况下。谢谢。

(原文:I guess it would take time listing all the unwanted characters esp. all the symbols and I may forget some. I still thank you for your answer. I may apply it on other cases. Thanks.)

解决方案:
我会解决这个问题在两个部分。我想分手的文本空间像你做我将贯穿单词列表,删除任何非字母数字的查询项。下面是一个例子:这段代码如下:将文本框的文本声明了一个过滤查询项/单词循环通过所有的单词在分手的条款/单词然后检查如果术语字母词是字母数字,这是添加到数组列表中。如果它不是字母数字,是无视。最后,它投在ArrayList条款/单词回到正常的字符串数组并返回。因为这个解决方案使用了一个ArrayList,它需要进口。
原文:

I would tackle this problem in two parts.

  1. I would split up the text by spaces like you're doing

  2. I would then run through that list of words and remove any query terms that are non-alphanumeric.

The following is an example of that:

Imports System.Collections

' ... Your Other Code ...

    ' A function to determine if a string is AlphaNumeric
    Private Function IsAlphaNum(ByVal strInputText As String) As Boolean
        Dim IsAlpha As Boolean = False
        If System.Text.RegularExpressions.Regex.IsMatch(strInputText, "^[a-zA-Z0-9]+$") Then
            IsAlpha = True
        Else
            IsAlpha = False
        End If

        Return IsAlpha
    End Function

    ' A function to get the words from the textbox
    Private Function GetWords() As String()
        ' Get a raw list of all words separated by spaces
        Dim splitQuery() As String = Regex.Split(TextBoxQuery.Text, "\s+")

        ' ArrayList to place all words into:
        Dim alWords As New ArrayList()

        ' Loop all words and check them:
        For Each word As String In splitQuery
            If(IsAlphaNum(word)) Then
                ' Word is alphanumeric
                ' Add it to the list of alphanumeric words
                alWords.add(word)
            End If
        Next

        ' Convert the ArrayList of words to a primitive array of strings
        Dim words As String() = CType(alWords.ToArray(GetType(String)), String())

        ' Return the list of filtered words
        return words
    End Function

This code does the following:

  1. splits up the textbox's text
  2. declares an ArrayList for the filtered query terms/words
  3. loops through all the words in the split up array of terms/words
  4. it then checks if the term is alphanumeric
  5. If the term is alphanumeric, it is added to the ArrayList. If it's not alphanumeric, the term is disregarded.
  6. Finally, it casts the terms/words in the ArrayList back to a normal String array and returns.

Because this solution uses an ArrayList, it requires System.Collections as an import.

网友:@stribizhev的回答是短的,我猜。不过,谢谢你给我一个主意!

(原文:@stribizhev 's answer is shorter, I guess. Still, thank you for giving me another idea!)