Code
library(aochelpers) # for data loading
library(stringr); library(stringi)
<- aoc_input_vector(1, 2023) input
December 1, 2023
On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.
valid <- c("one","two","three","four","five","six","seven","eight","nine", as.character(1:9))
valid.num <- c(1:9,1:9)
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] 54824
Remove all characters from the string, then grab the first and last number.
nums first.num last.num
[1,] "9" "9" "9"
[2,] "57" "5" "7"
[3,] "934" "9" "4"
[4,] "83858" "8" "8"
Paste together to create the two-digit calibration number.
nums first.num last.num calibration.value
[1,] "9" "9" "9" "99"
[2,] "57" "5" "7" "57"
[3,] "934" "9" "4" "94"
[4,] "83858" "8" "8" "88"
Submission value:
⭐
Example Data
Okay, so let’s define a list of valid ‘digits’, and their corresponding numeric values.
And find where in the strings these valid values are located. stri_locate_first_regex
find the indexes (positions) where there is a match to some pattern. I learned about this function from Gus Lipkin’s solution
So that gives me the correct first value. Now I need to apply this to each row. It’s late, I’m tired so I’m gonna loop it.
exa2 new.cal.vals
[1,] "two1nine" "29"
[2,] "eightwothree" "83"
[3,] "abcone2threexyz" "13"
[4,] "xtwone3four" "24"
[5,] "4nineeightseven2" "42"
[6,] "zoneight234" "14"
[7,] "7pqrstsixteen" "76"
[8,] "7eight7" "78"
[9,] "3stuffthree" "33"
[1] 392
Hooray! This matches the example solution. Now to do this for my actual data.
[1] 54759
❌ Nope. So I realized I was only using stri_locate_first_regex
key is first. And so I was missing duplicates. I added a couple test cases 7eight7
and 3stuffthree
, and sure enough the last 7 wasn’t getting caught.
n <- length(exa2)
new.cal.vals <- rep(0, n)
for(i in 1:n){
x <- stri_locate_all_regex(exa2[i], valid)
y <- unlist(x) |> matrix(ncol=2, byrow=TRUE) #the output of _all_ was different
first <- which.min(y[,1])
last <- which.max(y[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
cbind(exa2, new.cal.vals)
exa2 new.cal.vals
[1,] "two1nine" "29"
[2,] "eightwothree" "83"
[3,] "abcone2threexyz" "13"
[4,] "xtwone3four" "24"
[5,] "4nineeightseven2" "42"
[6,] "zoneight234" "14"
[7,] "7pqrstsixteen" "76"
[8,] "7eight7" "77"
[9,] "3stuffthree" "33"
[1] 391
Looks promising.
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
y <- unlist(x) |> matrix(ncol=2, byrow=TRUE)
first <- which.min(y[,1])
last <- which.max(y[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] NA
Uhm…if there were some rows without numbers, this should have failed earlier…
[1] "9sixqnine9jk9six" "58qtpqqz58888cmhs"
[3] "88trnvjtqsmseight8" "962sixoneonectfgpknl8nine"
[5] "twotwosevenvkzzhrpgninecqvf9"
Yea… no those for sure have numbers. Well heck. What is my function doing?
[[1]]
start end
[1,] NA NA
[[2]]
start end
[1,] NA NA
[[3]]
start end
[1,] NA NA
[[4]]
start end
[1,] NA NA
[[5]]
start end
[1,] NA NA
[[6]]
start end
[1,] 2 4
[2,] 14 16
[[7]]
start end
[1,] NA NA
[[8]]
start end
[1,] NA NA
[[9]]
start end
[1,] 6 9
[[10]]
start end
[1,] NA NA
[[11]]
start end
[1,] NA NA
[[12]]
start end
[1,] NA NA
[[13]]
start end
[1,] NA NA
[[14]]
start end
[1,] NA NA
[[15]]
start end
[1,] NA NA
[[16]]
start end
[1,] NA NA
[[17]]
start end
[1,] NA NA
[[18]]
start end
[1,] 1 1
[2,] 10 10
[3,] 13 13
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] 2 14
[7,] 4 16
[8,] NA NA
[9,] NA NA
[10,] 6 9
[11,] NA NA
[12,] NA NA
[13,] NA NA
[14,] NA NA
[15,] NA NA
[16,] NA NA
[17,] NA NA
[18,] NA NA
[19,] 1 10
[20,] 13 1
[21,] 10 13
Yea.. duplicate values of the same number makes for additional rows in the matrix. But then why didn’t it mess up with my examples? 🤔
Okay well let’s use head
and tail
via lapply
to pull the first and last rows out of each list element.
x <- stri_locate_all_regex(input[is.miss[1]], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
as.numeric(paste0(valid.num[first], valid.num[last]))
[1] 96
[1] "9sixqnine9jk9six"
Third time is the charm? 🤞
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
[1] 54824
⭐⭐
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16 ucrt)
os Windows 11 x64 (build 22000)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/Los_Angeles
date 2023-12-02
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
aochelpers * 0.1.0.9000 2023-11-30 [1] Github (EllaKaye/aochelpers@c2afc01)
stringi * 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)
[1] C:/Users/renta/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.1/library
──────────────────────────────────────────────────────────────────────────────
---
title: "2023: Day 1"
date: 2023-12-1
categories:
- regex
- loop
- stringr
- stringi
- peeked
draft: false
---
# Setup
[The original challenge](https://adventofcode.com/2023/day/1)
```{r}
library(aochelpers) # for data loading
library(stringr); library(stringi)
input <- aoc_input_vector(1, 2023)
```
# TLDR; Solutions
## Part 1 ⭐
::: {.callout-danger}
### ❓ What is my calibration number?
On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
:::
```{r}
nums <- gsub("[a-z]", "", input)
first.num <- str_extract(nums, "^.")
last.num <- str_extract(nums, ".$")
as.numeric(paste0(first.num, last.num)) |> sum()
```
## Part 2 ⭐⭐
::: {.callout-danger}
### ❓ What is my _actual_ calibration number?
> Your calculation isn't quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".
:::
```{r}
valid <- c("one","two","three","four","five","six","seven","eight","nine", as.character(1:9))
valid.num <- c(1:9,1:9)
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
```
# Walkthrough / Explainer
## Part 1
:::{.callout-exa icon=true}
Example Data
```{r}
exa <- c("1abc2","pqr3stu8vwx","a1b2c3d4e5f","treb7uchet")
```
:::
Remove all characters from the string, then grab the first and last number.
```{r}
nums <- gsub("[a-z]", "", input)
first.num <- str_extract(nums, "^.")
last.num <- str_extract(nums, ".$")
cbind(nums, first.num, last.num)[c(1,4,300,304),] # verify
```
Paste together to create the two-digit calibration number.
```{r}
calibration.value <- as.numeric(paste0(first.num, last.num))
cbind(nums, first.num, last.num, calibration.value)[c(1,4,300,304),]
```
Submission value:
```{r}
sum(calibration.value)
```
⭐
## Part 2
:::{.callout-exa icon=true}
Example Data
```{r}
exa2 <- c("two1nine",
"eightwothree",
"abcone2threexyz",
"xtwone3four",
"4nineeightseven2",
"zoneight234",
"7pqrstsixteen",
"7eight7", # 2 new test cases
"3stuffthree")
```
:::
Okay, so let's define a list of valid 'digits', and their corresponding numeric values.
```{r}
valid <- c("one","two","three","four","five","six","seven","eight","nine", as.character(1:9))
valid.num <- c(1:9,1:9)
```
And find where in the strings these valid values are located.
`stri_locate_first_regex` find the indexes (positions) where there is a match to some pattern.
I learned about this function from [Gus Lipkin's solution](https://adventofcode.guslipkin.me/2023/01/2023-01)
```{r}
x <- stri_locate_first_regex(exa2[1], valid)
first <- which.min(x[,1])
last <- which.max(x[,2])
cal.val <- as.numeric(paste0(valid.num[first], valid.num[last]))
```
So that gives me the correct first value. Now I need to apply this to each row. It's late, I'm tired so I'm gonna loop it.
```{r}
n <- length(exa2)
new.cal.vals <- rep(0, n)
for(i in 1:n){
x <- stri_locate_first_regex(exa2[i], valid)
first <- which.min(x[,1])
last <- which.max(x[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
cbind(exa2, new.cal.vals)
sum(new.cal.vals)
```
Hooray! This matches the example solution. Now to do this for my actual data.
```{r}
rm(new.cal.vals, x, first, last, i) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_first_regex(input[i], valid)
first <- which.min(x[,1])
last <- which.max(x[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
```
❌ Nope. So I realized I was only using `stri_locate_first_regex` key is **first**. And so I was missing duplicates. I added a couple test cases `7eight7` and `3stuffthree`, and sure enough the last 7 wasn't getting caught.
```{r}
n <- length(exa2)
new.cal.vals <- rep(0, n)
for(i in 1:n){
x <- stri_locate_all_regex(exa2[i], valid)
y <- unlist(x) |> matrix(ncol=2, byrow=TRUE) #the output of _all_ was different
first <- which.min(y[,1])
last <- which.max(y[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
cbind(exa2, new.cal.vals)
sum(new.cal.vals)
```
Looks promising.
```{r}
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
y <- unlist(x) |> matrix(ncol=2, byrow=TRUE)
first <- which.min(y[,1])
last <- which.max(y[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
```
Uhm...if there were some rows without numbers, this should have failed earlier...
```{r}
is.miss <- which(is.na(new.cal.vals))
input[is.miss][1:5]
```
Yea... no those for sure have numbers. Well heck. What is my function doing?
```{r}
(x <- stri_locate_all_regex(input[is.miss[1]], valid))
(y <- unlist(x) |> matrix(ncol=2, byrow=TRUE))
```
Yea.. duplicate values of the same number makes for additional rows in the matrix. But then why didn't it mess up with my examples? 🤔
Okay well let's use `head` and `tail` via `lapply` to pull the first and last rows out of each list element.
```{r}
x <- stri_locate_all_regex(input[is.miss[1]], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
as.numeric(paste0(valid.num[first], valid.num[last]))
input[is.miss[1]]
```
Third time is the charm? 🤞
```{r}
rm(new.cal.vals, x, first, last, i, y, N) # just cos
N <- length(input)
new.cal.vals <- rep(0, N)
for(i in 1:N){
x <- stri_locate_all_regex(input[i], valid)
x.first <- lapply(x, head, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
x.last <- lapply(x, tail, 1) |> unlist() |> matrix(ncol=2, byrow=TRUE)
first <- which.min(x.first[,1])
last <- which.max(x.last[,2])
new.cal.vals[i] <- as.numeric(paste0(valid.num[first], valid.num[last]))
}
sum(new.cal.vals)
```
⭐⭐
##### Session info {.appendix}
<details><summary>Toggle</summary>
```{r}
#| echo: false
sessioninfo::session_info(pkgs = "attached")
```
</details>