Grouping similar times together

  • Grouping similar times together

     david.solet updated 7 months, 1 week ago 3 Members · 12 Posts
  • david.solet

    February 2, 2021 at 11:42 am

    Hello David and all. I asked this question on stackoverflow 2 weeks ago. I received 2 answers which I felt did not have valid solutions. I have a dataset of systolic blood pressure and date/time taken. I have created a fictional example dataset for an individual. The readings were sometimes taken within minutes of each other, sometimes not. An example is below:

    date_time <- c("Jan 29 2020 13:46:08" , 
                  "Jan 29 2020 13:42:53" , 
                  "Jan 29 2020 12:13:27" ,
                  "Jan 29 2020 12:11:19" , 
                  "Jan 29 2020 12:09:21" , 
                  "Jan 28 2020 12:22:26" , 
                  "Jan 27 2020 8:22:20")
    systol <- c(132  , 132  , 118  , 115  , 110 , 148 , 120)
    df <- data.frame(date_time , systol) %>%
      mutate(dtetime = lubridate::mdy_hms(date_time)

    I would like to group systolic readings together that took place in a span of 10 minutes. This is the result I would want:
    group 1: rows 1 and 2
    group 2: rows 3, 4, 5
    group3: row 6
    group 4: row 7
    The ultimate object is to then average readings that were taken close to each other in time. The criterion for "close in time" is readings that are all taken within 10 minutes--i.e,. the last reading in any group has to be within 10 minutes of the first reading in that group. Groups can have any number of readings, as long as the group meets the 10-minutes-from-start-to-finish criterion.

    I've tried to solve this with the lag function, but it is awkward and possibly inaccurate. I suspect this might be addressed by detecting time "clusters" but I'm not familiar with those methods. I apologize for asking something that has been asked on stackoverflow, but since I haven't gotten any helpful answers I'm not sure where to go next. (If you provide a solution, I would answer the question there with attribution. I suspect it is not an unusual problem since blood pressure readings taken close together in time are often averaged.)

    Many thanks for any help or referral.


  • David

    February 2, 2021 at 12:52 pm

    That’s a good question and not one I know the answer to! I’m going to post on Twitter to see if I get anyone with a good idea for you. I think one solution might be the tsibble package but I haven’t used it so I’m not 100% sure.

  • David

    February 3, 2021 at 6:40 am

    Actually, I’ve rethought this. I think the question is a bit too complicated for Twitter, but would be perfect for the RStudio Community. Would you consider posting there and seeing you get any good responses?

  • david.solet

    February 3, 2021 at 12:44 pm

    I agree, that was my next step. I’m still working on it and want to try a couple of things before going there. Thanks for taking a look and the suggestion.


  • David

    February 3, 2021 at 1:57 pm

    Sounds good! Let me know what you find out.

  • jordan.trachtenberg

    February 10, 2021 at 11:35 am

    When I don’t know the answer in R, I often look to the Power BI community to see how they solve it and see if I can apply to R. Here’s a similar post to the Power BI forum:

    Could you solve it using interval instead of lag?

    I’m thinking of having a column with start of interval, stop of interval, and then an indicator column of whether or not the time is in the interval, and then you’d group those times accordingly. And then you would tidy that up to do your average readings.

  • david.solet

    February 10, 2021 at 2:04 pm

    Hello Jordan, thanks for your suggestion. I’ve gone down that road and as far as I can see, while interval or something like it would help, I don’t think it solves the main difficulty.

    Assuming that the dataset is sorted ascending by dtetime, the program needs to
    (1) establish the start dtetime of the first group in row 1,
    (2) advance to the next row,
    (3) evaluate whether dtetime in that row is more than 10 minutes after the start dtetime in (step 1).
    If “no”, go to next row and repeat (step 3), recalling that the starting dtetime may be more than 1 row back. If “yes”, establish that row’s dtetime as the start of a new group and repeat steps 1-3.

    I posted in rstudio community


    with a fuller example dataset. There was some helpful discussion but no solution. I ended up thinking the solution involves a loop but I’m pretty green and haven’t been able to make anything work.

  • david.solet

    February 10, 2021 at 2:06 pm

    PS. If there’s any other R communities anyone would recommend posting I will try that. Thanks for any suggestions.

  • David

    February 10, 2021 at 2:10 pm

    Have you tried the R for Data Science community?

  • david.solet

    February 10, 2021 at 2:52 pm

    Thanks, I’m trying that.

  • David

    February 10, 2021 at 8:49 pm

    I still don’t know how to do this, but I think the slider package does what you need. Maybe the slide_period() function?

  • david.solet

    February 11, 2021 at 5:48 pm

    Interesting package, but there are limitations I don’t know how to get around. slide() does slide 10-minute periods together, but they overlap because the period is sliding. (I didn’t see a way to stop overlap or filter the ones I don’t want.) From what I can see, two other commands, slide_period() and block(), were promising for non-overlapping periods, but both depend on an origin date/time, which is not a dynamically assigned start date/time the group being formed. So the 10-minute increments are all calculated from the origin date/time, which inappropriately splits one of the groups. I’m going to work on a loop.

Viewing 1 - 12 of 12 replies

Log in to reply.

Original Post
0 of 0 posts June 2018