Reverse Engineering using Chrome
As you may get already I am a curious person, my desire is always to know how things work. That’s why today we are going to debug “The New York Public Library” gallery. By using Chrome browser we intend to find out how the part of the detailed preview is implemented. And after that, we will create an application which will help us to get the chosen image in a better quality – deal?
This current example appeared here only because I was searching a head image for my About page. I was thinking – Hm… The development of some New York’s bridge would perfectly fit here, – because I am an engineer and figuratively speaking I build them every day, right? Where can I get it from though?
I tried to Google and I have found a web-site called “The New York Public Library: Digital Collections” after a few minutes of staring at the photos, I found out that it doesn’t let you download this particular image in a good resolution. I was devastated. – but wait a minute… Why can I preview it in a good quality then? – I thought about Google, they didn’t let my Ruby gem to live for long. Their developers have invented the algorithm that lets them create a special token to validate the request later… Then I was thinking about their backend and I have questioned myself whether they have implemented this algorithm also there? – How do they reuse it, do they?
Anyways I quickly looked at the Network session of my browser to understand what does it do when I click every time on the detailed preview… – It wasn’t unexpected though, I knew it has to fetch it from somewhere. And yes, it was downloading particular tile of image every time I was scrolling the page.
I looked over it and thought – Ah, okay. It makes sense… Although I don’t have an Adobe Photoshop to gather all of them… And by the way, how you guys separate your image? There must be some rule behind. And happily it turned out at least for this photo, – it was true. My assumptions were correct, it seemed that by path I can request particular image that interest me at most, – the tile of it if to be more precise.
So by changing in the path of URL – /0/A/B.png the value A I could change zoom of picture and by changing B I could select a distinct tile I wanted to download. By a quick scroll over them I understood that on the zoom categorised by 11 number, I have only 5 tiles horizontally (from 0 to 4) and 3 vertically (0-2). I decided to write a script.
Nevertheless a quick notice, of course I could do that only by using CURL or any other tool, however my idea was to cover the whole process from debugging until the automation of the steps and of course sharing it with you. – If you like the time I have invested on this article leave your likes and comments below – I would highly appreciate that.
We want our script or program if you will, to work just like the browser, right? – What can help us achieve this goal? Yes, it’s net/http library, although I don’t want to waste my time on building the abstraction around Net::HTTP and gonna use here the open-uri which is a simple wrapper that will allow me to get away from writing many code lines and to continue watching the Netflix with my girlfriend.
So let’s implement the first parts of our program. First of all we want to declare in the initialise method the arguments which are id of the image and the dimension of it, we could automate this part either – in order to avoid entering the dimension by hand every time, but I think I will go with the easiest way for now.
I define a private method that will help us to form the right url and fetch the correct tile in part of our algorithm.
The rest of our file is the publicly available, – save method of our class, which consist of the algorithm that basically saves the images on our computer.
– Easy! The first part of our program is complete, let’s go ahead and execute the program and see what happens. (Source code: Step 1)
Hold on! – You’ve said we are going to download the full picture and not some useless pieces of it! You may complain… And you would be right – But in fact what’s left so far is to merge the images together.
In order to do that we are going to use rmagick gem. I will need to use Magick::ImageList.new for this purpose and depends whether I need to combine them vertically or horizontally apply the flag for append method.
In few words, we are combining saved image – vertically and then perform the same action horizontally, plus we’re removing the trash left from the previous steps so by the end of this operation we’re getting the image with name of id we passed at the beginning.
– This implementation is ugly… Because first of all it doesn’t make any sense to store all the files and then to remove them, although it helped me to debug the process.
But yes, let’s remove the download method in the line where we used to save the file, instead of loading a file, we can pass the downloaded image as a String by using from_blob method. And absolutely the same we are doing for images that were combined vertically. For us it’s just enough to get the Array of resulted image_list.append(true) objects and by combining vertical results to get the whole picture.
Let’s take a look of what we’ve got now (Source Code: Step 2)
And Voilà! Thanks to computer science, we actually were able to produce all the steps backwards in order to get the result – cool…
Nevertheless, what could be better? I think – the main algorithm that lets you download the tiles, plus it would be a lot nicer to run it in loop and try to save the images without knowing the number of tiles. All I want to pass the N of zoom and patiently wait…
And that’s why we get rid of upto iterator here and go with an infinite loop that I break if only I get the empty array. In other words, I stop constantly going over and over again because I know if there is an empty collection or the set of data if you will, it’s only because nothing have been downloaded. – If nothing was appeared in the list now, – Why to go any further by trying to get something else?
You can’t get a raster picture like this which has missed spot. In the worst scenario, it would be white, black or even transparent. Anyways in example below where you can see X’s – they mean a missed piece. But are they still the part of the digital image? – Of course they are!
########
####XXXX
########
Therefore if data set consist of the other tiles, we retry loop from the beginning hence to download other parts of the vertical set, otherwise we stop the program.
The final source code can be found here. You’re welcome to modify and fork the program. It also would be a pleasure if you would help me to share and to make this article more popular. It would definitely help me to keep myself motivated and to keep you up fresh with a more interesting material.
Important: The idea behind this article is to show how to debug a website with educational purpose using Chrome browser. Then how to build an algorithm to download and merge the tiles. On the resource mentioned here you can download pictures in a good resolution such like head image of this article.
Thank you for your time! Happy coding and leave comments below, cheers!
P.S. You are also welcome to show my grammar and code mistakes.