10.5 Scrape PDFs

You can scrape text from PDFs like this. Note that I added eval = F because of issues with GitHub Actions.

Code

library(pdftools)
d <- pdf_data("https://www.omegatiming.com/File/0001170100020304FFFFFFFFFFFFFF02.pdf")
is(d)
length(d)
head(d[[1]])

It gives a list of data frames, one data frame for each page, that contains each word on the page and its (x,y) location.