regex
loop
stringr
stringi
peeked
Published

December 1, 2023

Setup

The original challenge

Code
library(aochelpers) # for data loading
library(stringr); library(stringi)
input <- aoc_input_vector(1, 2023)

TLDR; Solutions

Part 1 ⭐

❓ What is my calibration number?

On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

Code
nums <- gsub("[a-z]", "", input)
first.num <- str_extract(nums, "^.")
last.num <- str_extract(nums, ".$")
as.numeric(paste0(first.num, last.num)) |> sum()
[1] 55386

Part 2 ⭐⭐

❓ What is my actual calibration number?

Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.

Code
valid <- c("one","two","three","four","five","six","seven","eight","nine", as.character(1:9))
valid.num <- c(1:9,1:9) 

N <- length(input)
new.cal.vals <- rep(0, N)

for(i in 1:N){
    x <- stri_locate_all_regex(input[i], valid)
    x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
    x.last  <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
    first <- which.min(x.first[,1])
    last  <- which.max(x.last[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] 54824

Walkthrough / Explainer

Part 1

Example Data

Code
exa <- c("1abc2","pqr3stu8vwx","a1b2c3d4e5f","treb7uchet")

Remove all characters from the string, then grab the first and last number.

Code
nums <- gsub("[a-z]", "", input)
first.num <- str_extract(nums, "^.")
last.num <- str_extract(nums, ".$")
cbind(nums, first.num, last.num)[c(1,4,300,304),] # verify
     nums    first.num last.num
[1,] "9"     "9"       "9"     
[2,] "57"    "5"       "7"     
[3,] "934"   "9"       "4"     
[4,] "83858" "8"       "8"     

Paste together to create the two-digit calibration number.

Code
calibration.value <- as.numeric(paste0(first.num, last.num))
cbind(nums, first.num, last.num, calibration.value)[c(1,4,300,304),] 
     nums    first.num last.num calibration.value
[1,] "9"     "9"       "9"      "99"             
[2,] "57"    "5"       "7"      "57"             
[3,] "934"   "9"       "4"      "94"             
[4,] "83858" "8"       "8"      "88"             

Submission value:

Code
sum(calibration.value) 
[1] 55386

Part 2

Example Data

Code
exa2 <- c("two1nine",
  "eightwothree",
  "abcone2threexyz",
  "xtwone3four",
  "4nineeightseven2",
  "zoneight234",
  "7pqrstsixteen", 
  "7eight7",     # 2 new test cases
  "3stuffthree")

Okay, so let’s define a list of valid ‘digits’, and their corresponding numeric values.

Code
valid <- c("one","two","three","four","five","six","seven","eight","nine", as.character(1:9))
valid.num <- c(1:9,1:9) 

And find where in the strings these valid values are located. stri_locate_first_regex find the indexes (positions) where there is a match to some pattern. I learned about this function from Gus Lipkin’s solution

Code
x <- stri_locate_first_regex(exa2[1], valid)
first <- which.min(x[,1])
last  <- which.max(x[,2])
cal.val <- as.numeric(paste0(valid.num[first], valid.num[last]))

So that gives me the correct first value. Now I need to apply this to each row. It’s late, I’m tired so I’m gonna loop it.

Code
n <- length(exa2)
new.cal.vals <- rep(0, n)
for(i in 1:n){
    x <- stri_locate_first_regex(exa2[i], valid)
    first <- which.min(x[,1])
    last  <- which.max(x[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
cbind(exa2, new.cal.vals)
      exa2               new.cal.vals
 [1,] "two1nine"         "29"        
 [2,] "eightwothree"     "83"        
 [3,] "abcone2threexyz"  "13"        
 [4,] "xtwone3four"      "24"        
 [5,] "4nineeightseven2" "42"        
 [6,] "zoneight234"      "14"        
 [7,] "7pqrstsixteen"    "76"        
 [8,] "7eight7"          "78"        
 [9,] "3stuffthree"      "33"        
Code
sum(new.cal.vals)
[1] 392

Hooray! This matches the example solution. Now to do this for my actual data.

Code
rm(new.cal.vals, x, first, last, i) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
    x <- stri_locate_first_regex(input[i], valid)
    first <- which.min(x[,1])
    last  <- which.max(x[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] 54759

❌ Nope. So I realized I was only using stri_locate_first_regex key is first. And so I was missing duplicates. I added a couple test cases 7eight7 and 3stuffthree, and sure enough the last 7 wasn’t getting caught.

Code
n <- length(exa2)
new.cal.vals <- rep(0, n)
for(i in 1:n){
    x <- stri_locate_all_regex(exa2[i], valid)
    y <- unlist(x) |> matrix(ncol=2, byrow=TRUE) #the output of _all_ was different
    first <- which.min(y[,1])
    last  <- which.max(y[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
cbind(exa2, new.cal.vals)
      exa2               new.cal.vals
 [1,] "two1nine"         "29"        
 [2,] "eightwothree"     "83"        
 [3,] "abcone2threexyz"  "13"        
 [4,] "xtwone3four"      "24"        
 [5,] "4nineeightseven2" "42"        
 [6,] "zoneight234"      "14"        
 [7,] "7pqrstsixteen"    "76"        
 [8,] "7eight7"          "77"        
 [9,] "3stuffthree"      "33"        
Code
sum(new.cal.vals)
[1] 391

Looks promising.

Code
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
    x <- stri_locate_all_regex(input[i], valid)
    y <- unlist(x) |> matrix(ncol=2, byrow=TRUE) 
    first <- which.min(y[,1])
    last  <- which.max(y[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] NA

Uhm…if there were some rows without numbers, this should have failed earlier…

Code
is.miss <- which(is.na(new.cal.vals))
input[is.miss][1:5]
[1] "9sixqnine9jk9six"             "58qtpqqz58888cmhs"           
[3] "88trnvjtqsmseight8"           "962sixoneonectfgpknl8nine"   
[5] "twotwosevenvkzzhrpgninecqvf9"

Yea… no those for sure have numbers. Well heck. What is my function doing?

Code
(x <- stri_locate_all_regex(input[is.miss[1]], valid))
[[1]]
     start end
[1,]    NA  NA

[[2]]
     start end
[1,]    NA  NA

[[3]]
     start end
[1,]    NA  NA

[[4]]
     start end
[1,]    NA  NA

[[5]]
     start end
[1,]    NA  NA

[[6]]
     start end
[1,]     2   4
[2,]    14  16

[[7]]
     start end
[1,]    NA  NA

[[8]]
     start end
[1,]    NA  NA

[[9]]
     start end
[1,]     6   9

[[10]]
     start end
[1,]    NA  NA

[[11]]
     start end
[1,]    NA  NA

[[12]]
     start end
[1,]    NA  NA

[[13]]
     start end
[1,]    NA  NA

[[14]]
     start end
[1,]    NA  NA

[[15]]
     start end
[1,]    NA  NA

[[16]]
     start end
[1,]    NA  NA

[[17]]
     start end
[1,]    NA  NA

[[18]]
     start end
[1,]     1   1
[2,]    10  10
[3,]    13  13
Code
(y <- unlist(x) |> matrix(ncol=2, byrow=TRUE))
      [,1] [,2]
 [1,]   NA   NA
 [2,]   NA   NA
 [3,]   NA   NA
 [4,]   NA   NA
 [5,]   NA   NA
 [6,]    2   14
 [7,]    4   16
 [8,]   NA   NA
 [9,]   NA   NA
[10,]    6    9
[11,]   NA   NA
[12,]   NA   NA
[13,]   NA   NA
[14,]   NA   NA
[15,]   NA   NA
[16,]   NA   NA
[17,]   NA   NA
[18,]   NA   NA
[19,]    1   10
[20,]   13    1
[21,]   10   13

Yea.. duplicate values of the same number makes for additional rows in the matrix. But then why didn’t it mess up with my examples? 🤔

Okay well let’s use head and tail via lapply to pull the first and last rows out of each list element.

Code
x <- stri_locate_all_regex(input[is.miss[1]], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last  <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last  <- which.max(x.last[,2])
as.numeric(paste0(valid.num[first], valid.num[last]))
[1] 96
Code
input[is.miss[1]]
[1] "9sixqnine9jk9six"

Third time is the charm? 🤞

Code
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
    x <- stri_locate_all_regex(input[i], valid)
    x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
    x.last  <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
    first <- which.min(x.first[,1])
    last  <- which.max(x.last[,2])
    new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] 54824

⭐⭐

Session info

Toggle
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 11 x64 (build 22000)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/Los_Angeles
 date     2023-12-02
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package    * version    date (UTC) lib source
 aochelpers * 0.1.0.9000 2023-11-30 [1] Github (EllaKaye/aochelpers@c2afc01)
 stringi    * 1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr    * 1.5.0      2022-12-02 [1] CRAN (R 4.3.1)

 [1] C:/Users/renta/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.1/library

──────────────────────────────────────────────────────────────────────────────