VB。NET帮助:除了符号分割的字符串 - VB.NET help: splitting the string except for symbols

- 此内容更新于:2015-12-20



I tried using these 2 codes:

Dim splitQuery() As String = Regex.Split(TextBoxQuery.Text, "\s+")


Dim splitQuery() As String = TextBoxQuery.Text.Split(New Char() {" "c})

My example query is "a dog ." Notice there's a single space between "dog" and "." When I check the length of splitQuery, it gives me 3 and the split words are a, dog, and "."

How can I stop it from counting "." and other symbols as word? I want words/terms (alphanumeric) only to be stored in my splitQuery array. Thanks.


(原文:Do you intend to trim the string from the non-word chars at the end? Just use Regex.Replace(str, "\W*$", "", RegexOptions.RightToLeft) and then split with \s+.)


(原文:Does this code work for you?)


(原文:this the most efficient answer. Thank you so much!)


(原文:I have posted an answer, please consider accepting.)


I suggest doing that in 2 steps:

  • Use txt = Regex.Replace(TextBoxQuery.Text, "\W*$", "", RegexOptions.RightToLeft) to remove the non-word characters from the end of the string

  • Then, split with \s+: splits = Regex.Split(txt, "\s+")


you should also be able to create a string of unwanted characters and trim them with a stringsplitoption to RemoveEmptyEntries.

dim unwanted as string = "./?!#"
Dim splitQuery() as string = yourString.Trim(unwanted.tochararray).Split(New Char() {" "c}), StringSplitOptions.RemoveEmptyEntries)

(原文:I guess it would take time listing all the unwanted characters esp. all the symbols and I may forget some. I still thank you for your answer. I may apply it on other cases. Thanks.)


I would tackle this problem in two parts.

  1. I would split up the text by spaces like you're doing

  2. I would then run through that list of words and remove any query terms that are non-alphanumeric.

The following is an example of that:

Imports System.Collections

' ... Your Other Code ...

    ' A function to determine if a string is AlphaNumeric
    Private Function IsAlphaNum(ByVal strInputText As String) As Boolean
        Dim IsAlpha As Boolean = False
        If System.Text.RegularExpressions.Regex.IsMatch(strInputText, "^[a-zA-Z0-9]+$") Then
            IsAlpha = True
            IsAlpha = False
        End If

        Return IsAlpha
    End Function

    ' A function to get the words from the textbox
    Private Function GetWords() As String()
        ' Get a raw list of all words separated by spaces
        Dim splitQuery() As String = Regex.Split(TextBoxQuery.Text, "\s+")

        ' ArrayList to place all words into:
        Dim alWords As New ArrayList()

        ' Loop all words and check them:
        For Each word As String In splitQuery
            If(IsAlphaNum(word)) Then
                ' Word is alphanumeric
                ' Add it to the list of alphanumeric words
            End If

        ' Convert the ArrayList of words to a primitive array of strings
        Dim words As String() = CType(alWords.ToArray(GetType(String)), String())

        ' Return the list of filtered words
        return words
    End Function

This code does the following:

  1. splits up the textbox's text
  2. declares an ArrayList for the filtered query terms/words
  3. loops through all the words in the split up array of terms/words
  4. it then checks if the term is alphanumeric
  5. If the term is alphanumeric, it is added to the ArrayList. If it's not alphanumeric, the term is disregarded.
  6. Finally, it casts the terms/words in the ArrayList back to a normal String array and returns.

Because this solution uses an ArrayList, it requires System.Collections as an import.


(原文:@stribizhev 's answer is shorter, I guess. Still, thank you for giving me another idea!)