提取之间的最后一句话|| - Extract the last word between | |

- 此内容更新于:2015-12-20
主题:

我有以下数据集之间的最后一句话,我想提取作为一个新的变量即我已经尝试使用

原文:

I have the following dataset

> head(names$SAMPLE_ID)
[1] "Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Moraxellaceae|Acinetobacter|"
[2] "Bacteria|Firmicutes|Bacilli|Bacillales|Bacillaceae|Bacillus|"                            
[3] "Bacteria|Proteobacteria|Gammaproteobacteria|Pasteurellales|Pasteurellaceae|Haemophilus|" 
[4] "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|"             
[5] "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|"             
[6] "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|" 

I want to extract the last word between || as a new variable i.e.

Acinetobacter
Bacillus
Haemophilus

I have tried using

library(stringr)
names$sample2 <-   str_match(names$SAMPLE_ID, "|.*?|")
网友:简单的路线:

(原文:The easy route: vapply(strsplit(names$SAMPLE_ID, "|", fixed = TRUE), tail, "", 1))

网友:或者你不喜欢打字(或效率)

(原文:Or of you don't like typing (or efficiency) then sapply(strsplit(x, "\\|"), tail, 1))

解决方案:
我们可以使用数据
原文:

We can use

library(stringi)
stri_extract_last_regex(v1, '\\w+')
#[1] "Acinetobacter"

data

v1 <- "Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Moraxellaceae|Acinetobacter|"
解决方案:
仅使用基本R:
原文:

Using just base R:

myvar <- gsub("^..*\\|(\\w+)\\|$", "\\1", names$SAMPLE_ID)
楼主:工作得很好。由于@zelazny7

(原文:working well .. thanks @zelazny7)

解决方案:
用于去除最后matche休息。看到演示。也使用https://regex101.com/r/fM9lY3/45
原文:
^.*\\|\\K.*?(?=\\|)

Use \K to remove rest from the final matche.See demo.Also use perl=T

https://regex101.com/r/fM9lY3/45

x <- c("Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|",
       "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|" )

unlist(regmatches(x, gregexpr('^.*\\|\\K.*?(?=\\|)', x, perl = TRUE)))
# [1] "Streptococcus" "Streptococcus"
网友:@rawr谢谢很多:)

(原文:@rawr Thanx a lot :))

网友:为什么downvoted???????????

(原文:why downvoted???????????)

解决方案:
结局是你所需要的每@RichardScriven:
原文:

The ending is all you need [^|]+(?=\|$)

Per @RichardScriven :

Which in R would be regmatches(x, regexpr("[^|]+(?=\\|$)", x, perl = TRUE)