正则表达式——首先,包含,结尾 - Regular Expressions - Starts with, Contains, and Ends with

- 此内容更新于:2015-12-20
主题:

我有一个字符串,包含几个“\n”。我想看看和删除每一行每一行包含单词“香蕉”示例DF:我试过了:我想要的:行包含香蕉已被移除。谢谢。

原文:

I have a string that contains several "\n". I would like to look at each line and remove every line that contains the word "banana"

Sample DF:

farm_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'), stringsAsFactors=FALSE)

What I've tried:

farm_data$sentence <- gsub(".* bananas .* \n", "\n", farm_data$sentence)

What I want:

clean_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  why not just boil the fruits'), stringsAsFactors=FALSE)

Lines that contain banana have been removed.

Thanks.

解决方案:
我也许一个迂回的方式解决问题。我第一次把查询由换行字符。之后我删除这些元素产生的分裂包含这个词的“香蕉”。然后我锤一起使用函数。希望这不是太笨手笨脚的。:解决你的可用性的关注其他“水果”或字符串:编写成一个函数,并把它给水果(或文字)。@rawr的回答似乎有点清洁。
原文:

I address the question in perhaps a roundabout way. I first split the query by the line break character \n.

sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))

After that I remove those elements of the resulting split that contain the word "banana".

cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep('banana',x)})==1))]

Then I hammer it back together using the paste function.

clean_data <- data.frame(shop=c('fruit'),
                        sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)

Hopefully this isn't too ham-fisted. :)

To address your concern about the usability to other "fruits" or strings:

cleanFruit <- function(fruit = 'banana'){
    sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))
    cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep(fruit,x)})==1))]
    clean_data <- data.frame(shop=c('fruit'),
                            sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)
    return(clean_data)
}

Write it up into a function, and hand it a given fruit (or word). @rawr 's answer seems a bit cleaner.

楼主:谢谢你!如果我不想手动设置“水果”列名称。想象一下,如果我有两行数据。我如何动态地这样做。

(原文:Thank you. What if I don't want to manually set the "fruit" column name. Imagine if i have two rows of data. How will I do that dynamically.)

解决方案:

原文:
x <- 'the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'
cat(x)
# the basket contains apples
#                                   bananas are the best
#                                   are we going to eat bananas
#                                   why not just boil the fruits
#                                   let us make some banana smoothie

cat(gsub('.*banana.*\\n?', '', x, perl = TRUE))
# the basket contains apples
#                                   why not just boil the fruits