phantomjs, about:blank, and –ssl-protocol

This was an odd one. On our current web data aggregation project we had a class of sites that were causing an attribute error in our python code. After some troubleshooting it turned out that the problem was in the response to our call out to phantomjs to render a page. We were expecting to get either an error or a valid response with a valid url, and instead we were getting a blank response with the url “about:blank.” Ok, knowing this made it easy to avoid the attribute error, but it didn’t get the data back. The real question was why were were getting about:blank.

Various posts on Stackoverflow and other places discuss this error in the context of the –ignore-ssl-errors phantomjs command line option. Apparently if you don’t tell phantom to ignore ssl errors, and you get some on a site, you can end up on about:blank. Fair enough, but we were already passing that option to phantom, so that wasn’t our issue.

I decided to fire up Fiddler on windows and tell phantom to use it as a proxy. This proved a little disconcerting, because when I did this the sites magically started working again. Clearly using Fiddler as a proxy was masking the issue somehow. I disabled the proxy to confirm that the problem returned, and it did. I then ran the process through fiddler again and checked the resulting output.

At first glance the data looked normal, but then I noticed that, while the offending sites were using ‘https:’ in their urls, and would redirect the browser to the ‘https:’ address if you tried ‘http:’, they were not returning an encoded response. Was this somehow the culprit? I confess that the idea of simply running a local proxy to make the issue go away did occur to me. Instead I decided to scan the phantomjs command line options, and there I saw the “–ssl-protocol” option. This setting defaults to ssl v3, but one of the acceptable alternatives is “any.” I added “–ssl-protocol=any” to our startup options for phantomjs, and the sites started working again. Well, three of them did. The fourth is still causing some javascript error, but I’ll count it a win anyway.

As for exactly why this worked, I haven’t had time to puzzle that out yet. If anyone has some ideas post them below!