This was an odd one. On our current web data aggregation project we had a class of sites that were causing an attribute error in our python code. After some troubleshooting it turned out that the problem was in the response to our call out to phantomjs to render a page. We were expecting to get either an error or a valid response with a valid url, and instead we were getting a blank response with the url “about:blank.” Ok, knowing this made it easy to avoid the attribute error, but it didn’t get the data back. The real question was why were were getting about:blank.
Various posts on Stackoverflow and other places discuss this error in the context of the –ignore-ssl-errors phantomjs command line option. Apparently if you don’t tell phantom to ignore ssl errors, and you get some on a site, you can end up on about:blank. Fair enough, but we were already passing that option to phantom, so that wasn’t our issue.
I decided to fire up Fiddler on windows and tell phantom to use it as a proxy. This proved a little disconcerting, because when I did this the sites magically started working again. Clearly using Fiddler as a proxy was masking the issue somehow. I disabled the proxy to confirm that the problem returned, and it did. I then ran the process through fiddler again and checked the resulting output.
As for exactly why this worked, I haven’t had time to puzzle that out yet. If anyone has some ideas post them below!