Split column by multiple delimiters

  • Split column by multiple delimiters

     David updated 1 week, 5 days ago 2Members · 4 Posts
  • jordan.trachtenberg

    Member
    January 6, 2021 at 8:10 am

    I have some open-ended comment data that has been coded with a delimiter “, “. Each comment could have one or several tags. I’d like to be able to split this column by the delimiter and then pivot the data longer so that I can count the tag frequency associated with the comments (see sample data attached)

    I’ve looked through this method (https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns), but my function is not working correctly, and it seems like a clunky solution.

    Function:

    # Function that splits a column by a particular delimiter

    split_by_delimiter <- function(column, pattern, into_prefix) {

    cols <- str_split_fixed(column, pattern, n = Inf)

    cols[which(cols == "")] <- NA

    cols <- as_tibble(cols)

    # name the 'cols' tibble as 'into_prefix_1', 'into_prefix_2', ..., 'into_prefix_m'

    # where m = # columns of 'cols'

    m <- dim(cols)[2]

    names(cols) <- paste(into_prefix, 1:m, sep = "_")

    }


    # Applying function to data

    delimited_data <- clean_data %>%

    select(respondent_id, tags) %>%

    bind_cols(split_by_delimiter(.$tags, pattern = ", ", into_prefix = "tag"))%>%

    select(respondent_id, starts_with("tag_"))

    Error message:

    Error: Can't recycle ..1 (size 1998) to match ..2 (size 5).

    Run rlang::last_error() to see where the error occurred.

    > rlang::last_error()

    <error/vctrs_error_incompatible_size>

    Can't recycle ..1 (size 1998) to match ..2 (size 5).

    Backtrace:

    1. %>%(...)

    4. vctrs::stop_incompatible_size(...)

    5. vctrs:::stop_incompatible(...)

    6. vctrs:::stop_vctrs(...)

  • David

    Organizer
    January 7, 2021 at 2:19 pm

    Great question, Jordan! One I had myself not too long ago. I recorded a video to show you how I’d solve this. Let me know if this makes sense!

  • jordan.trachtenberg

    Member
    January 8, 2021 at 5:41 am

    Thank you, David. Your solution is about 1000x easier. I’m used to working in Power Query to clean data, and I always have to split into columns before pivoting longer, so that’s what I thought I’d have to do in R.

    This was my second attempt at a solution, but it means I would manually have to update the column names if I had more than 5 tags, which is horrible.

    separate(col = tags,

    into = c("tag1", "tag2", "tag3", "tag4", "tag5"), # Added 5 columns for now, may need more

    sep = ", ",

    extra = "merge",

    fill = "right") %>%

    pivot_longer(cols = c(contains("tag") & !contains("delimited")),

    names_to = "tag_number",

    values_to = "tag_value") %>%

    drop_na(tag_value)


    # Checking to see if data table has any more delimiters ", "

    # If so, need to add more column names to the separate() function above

    clean_data %>%

    select(tag_value) %>%

    filter(grepl(", ", tag_value))

    I will definitely use this instead!

    separate_rows(tags, sep = ", ")
  • David

    Organizer
    January 8, 2021 at 6:53 am

    Glad to help!

Viewing 1 - 4 of 4 posts

Log in to reply.

Original Post
0 of 0 posts June 2018
Now