Indexing strings with an specific length inside

by Wencheng Lau-Medrano   Last Updated May 15, 2019 16:26 PM

I have a list of names which looks like this:

c("xxxxxx xx",             "xxx yyy xxxxx",       "xxx yy xxxxxx", 
  "xxxxxxx yyyyyyy xxxxx", "xxxx xxxx",           "xxx yyyyyy xxx", 
  "xxxxx yyyyy xxxxxxxx",  "xxx yyyyyyyy xxxx",   "xx xxx", 
  "xxxxx yyyyy xxxxx",     "xxxx yy xxxxxx",      "xxxxx yyyy xxx", 
  "xxxxxxx yy xxxxx",      "xxxxx yyyyyyy xxxxx", "xxxx yyyy xxxxxx", 
  "xxxxx yyyy xxxxx",      "xxxxxxxx  xxxxx",     "xxxxxx yyyyyyyy xxxxx", 
  "xxxxxx yy xxxxx",       "xxx yyyy xxxxxx")

I need to extract (index) all those names with word of 4-6 letters.

I know that I could split each string, calculate their number of characters with nchar and then index which ones have a length between 2 and 4. But, is there any way to do that with a single line using regular expressions?

The expected output must be a vector: Numeric

[1]  1  2  3  5  6  8  9 11 12 13 15 16 20

Or logical

[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE 
[11] TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE
Tags : r regex


Answers 1


Base R
You can use grepl

grepl("\\w{4,6}", my.text)
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

stringr
You can use stringr's str_detect with

library(stringr)
str_detect(my.text, "\\w{4,6}")
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

In both versions the keypoint is the regular expression which matches words of length 4 to 6.

Data

my.text <- c("xxxxxx xx", "xxx yyy xxxxx", "xxx yy xxxxxx", "xxxxxxx yyyyyyy xxxxx", 
             "xxxx xxxx", "xxx yyyyyy xxx", "xxxxx yyyyy xxxxxxxx","xxx yyyyyyyy xxxx", "xx xxx")
kath
kath
May 15, 2019 16:21 PM

Related Questions


Updated April 20, 2019 19:26 PM

Updated June 21, 2018 00:26 AM

Updated May 17, 2019 18:26 PM

Updated April 03, 2015 23:11 PM

Updated May 08, 2019 20:26 PM