I hate that ScrollScraper was down so long! Please explain the
bible.ort.org image naming convention, in case you get a hit by a bus or something. :-)
In the original version of ScrollScraper, the software traversed the bible.ort.org website and figured out which verses it needed to include on-the-fly. It wasn't too important to precisely understand the ORT convention for naming the underlying GIF images which contain the Torah readings. Each of these GIF files contains three lines of Hebrew text.
In light of the demise of bible.ort.org in late 2022, it became much more important to understand the ORT convention. What follows is a reverse-engineered interpretation of the filenames, which I believe to be correct: ccvvqxyz.gif where:
|Q: Explain the Torah image map
From its inception c. 2005 through 2022, ScrollScraper worked by retrieving successive pages from the
website, and then assembling a Torah reading from the associated GIFs. It also optionally examined
those GIF images to figure out which sections were dark-blue and which were light-blue, and thereby
estimate which sections to hide by shading at the beginning and end of a reading.
Following the demise of
We've computed a global map for the entire Torah, which knows the start-and-end coordinates of each Torah verse, and also knows about white space between verses, and even within a verse.
Given that information and the lengths of those segments, and which segments belong to which verse, it's not difficult to interpolate how to partition the (TrueType) Hebrew text of each verse among those segments. Then if you place each Hebrew fragment in the same position as its corresponding GIF fragment, you've solved the ScrollScraper TrueType problem. Now the output is as clear as the hardcopy Tikkun sitting on your bookshelf.
For example, consider one image's worth of data (there are 6938 such images comprising the complete Torah), from Exodus:
That's not very human-readable, but let's examine that in a tabular format. Note that the coordinate system is from right (0) to left (444) because we're dealing with Hebrew:
We can also view that as a graphic, adjacent to the original ORT gif, as:
Here's how that verse looks with TrueType fonts. Try zooming-in with your web browser, or printing a copy, and compare the granular left side with the clear right side.
|Q: What's the most amazing technical factoid about ScrollScraper?
IMHO the most amazing thing is that all of the "Torah image map" and other resources which are pre-computed prior to running ScrollScraper are derived from only the ORT GIF images, their filenames, and the reverse-engineered file naming convention described above.
There's a special-case in the code for the smaller TrueType fonts required for the Shirat Hayam (Song of the Sea) and Deuteronomy 32. There are a handful of hand-curated tweaks for a few verses in Shirat Hayam, which provide adjustments to the aforementioned Torah image map. But that's it.
|Q: How can I fiddle with ScrollScraper on my own computer, and make code changes and technical suggestions?
ScrollScraper is now a Dockerized application, so all you need is a Docker environment installed on your computer such as
Docker for Desktop. Once you've installed Docker and downloaded or git-pulled the ScrollScraper source code repo you can run
docker build -t scrollscraper . to build the ScrollScraper Docker image (the first build will take about half an hour. Subsequent re-builds will be much faster). Then you can
docker run that image and
docker exec inside of the resulting Docker container, to start experimenting.
Once you've exec'd into the container, you can run: