用字符串和替换替换整个线与一个字符串的一部分 - Replace entire line with string and replace part of a line with a string

- 此内容更新于:2015-12-20
主题:

我想清理以下数据集变化领域内保持一定程度的一致性。输入:我想删除所有行在给定领域第一个“>”是“一切都紧随其后。我想删除整行。“一切”的情况发生在第二次“>”,我想从“一切”到“:”替换为“q”输出:谢谢。

原文:

I'm trying to clean up the following dataset to maintain some consistency within the Changes field.

Input:

test_data <- data.frame(ID=c('john@xxx.com', 'sally@xxx.com'),
                        Changes=c('3 max cost changes
  productxyz > pb100  > a : Max cost decreased from $0.98 to $0.83
  productxyz > pb2  > a : Max cost decreased from $1.07 to $0.91
  productxyz > pb2  > b : Max cost decreased from $0.65 to $0.55', 
                                  '2 max cost changes
  productabc > Everything else in "auto & truck maintenance" : Max CPC increased from $0.81 to $0.97
  productabc > pb1000  > x : Max cost decreased from $1.44 to $1.22
  productabc > pb10000  > Everything else in pb10000 : Max CPC increased from $0.63 to $0.76'), stringsAsFactors=FALSE)
  1. I want to delete all lines within a given field where the first ">" is followed by "Everything. I'll like to remove that entire line.

  2. For cases where "Everything" occurs after the second ">", i'll like to replace from "Everything" to ":" with "q"

Output:

out_data <- data.frame(ID=c('john@xxx.com', 'sally@xxx.com'),
                        Changes=c('3 max cost changes
  productxyz > pb100  > a : Max cost decreased from $0.98 to $0.83
  productxyz > pb2  > a : Max cost decreased from $1.07 to $0.91
  productxyz > pb2  > b : Max cost decreased from $0.65 to $0.55', 
                                  '2 max cost changes
  productabc > pb1000  > x : Max cost decreased from $1.44 to $1.22
  productabc > pb10000  > q : Max CPC increased from $0.63 to $0.76'), stringsAsFactors=FALSE)

Thanks.

网友:和你尝试解决这个吗?

(原文:And what were your attempts at solving this yourself?)

网友:不是每一行中的一个单独的行数据集,对吗?你在这里只有两行被换行符分隔,哪一个?

(原文:each line is not an individual row in the data set, correct? you only have two rows here which are separated by newline characters?)

楼主:是的,是正确的!

(原文:Yes that is correct!)

解决方案:
也许不是最好的解决方案,但它得到你想要的东西:
原文:

Maybe not the best solution, but it gets what you want in the test_data:

clean_text <- function(x){
  x <- gsub("(> .* > )Everything else in .* :", "\\1 q :", x)
  x <- gsub("\n .* Everything else in .*?\n", "", x)
  x
}
out_data <- test_data
out_data[,2] <- clean_text(test_data[,2])
out_data
             ID
1  john@xxx.com
2 sally@xxx.com
                                                                                                                                                                                                                                                                                                                     Changes
1 3 max cost changes\n                                  productxyz > pb100  > a : Max cost decreased from $0.98 to $0.83\n                                  productxyz > pb2  > a : Max cost decreased from $1.07 to $0.91\n                                  productxyz > pb2  > b : Max cost decreased from $0.65 to $0.55
2                                                                                                2 max cost changes                                  productabc > pb1000  > x : Max cost decreased from $1.44 to $1.22\n                                  productabc > pb10000  >  q : Max CPC increased from $0.63 to $0.76
楼主:谢谢。我不认为新的维护换行符不过我会想维护他们。保持和原来的一样

(原文:Thanks. I don't think the new line breaks are maintained but I will like to maintain them. Keep them just like the original)