Greasemonkey and the Definition of Content

In web design (or more generally: user-interface design) you will often hear the terms “content” and “presentation.” These terms are useful devices for classifying the stuff that shows up on a web page when we browse to a URL, or on an application’s user interface. Content is the stuff you want to see, whether it is a number representing an account balance, or an image representing a funny moment in time. Presentation is the way in which that stuff is… er… presented on the web page. If the account number is shown in boldface Verdana 10pt with a 1px red border around it, or if it is displayed in Courier 12pt bereft of adornment, it is still the same account number. Only the way in which the user views it has changed. This factoring into data, function, and views has roots that stretch well back into the history of human development across many disciplines, including my own: software engineering. In fact, the evolution of “document” structure on the Internet parallels in many webways the evolution of structured, and later object-oriented and component-based, techniques for architecting software systems. Just as early unstructured programming languages tended to result in “spaghetti code” that carelessly interleaved data and the functions that operated on it into a single view, so did early HTML documents tightly weave content and presentation. Many still do, including the one that you are reading now.

Efforts to separate information from function and view stem from a basic and fascinating human trait: if we are forced to repeat a tiresome process many times we soon look for ways to make it easier and more repeatable with less effort. The history of printing offers a striking example of this tendency. From wood block engravings, to moveable type, to computer-aided layout and publishing, the driving force has been to make the process of getting new information into standardized forms as easy as possible. I was fortunate to work for my local newspaper while in high school, at a time when we composed our stories on CRT terminals, and then saw them printed out on camera-ready paper and physically glued onto a layout board, along with all the ads, borders, separators, headlines, and everything else that made up a page of the paper. The content was text that changed every day, and the presentation structure was the paper’s style guide and layout, which became well-known and pretty easily reproduceable by the long-term employees responsible for executing it. Still, the separation of content and structure was completely lost once the paper was glued to the board and taken to the camera room.

Current practice has done away with the layout board, and the little cans of spray-on adhesive. Now pages are composed entirely in software through electronic publishing technologies. Standard layout and style templates are applied to new content trivially. Most of these advances predated and informed to some extent the design of HTML and the HTTP-based web. When the web first began to become popular most web-pages were still using HTML to represent both content and structure. Text was enclosed in HTML “tags” that specified how it was to be rendered, and tables were used to gain some control over where on the page it was to be placed. A few years back a technology that was originally envisioned by Berners-Lee and the other early web designers called “style sheets” began to change the way pages were composed. Now HTML tags are often used to specify only what kind of element a piece of content is – this bit of text is a headline, that image is a home page link, etc. – and cascading style sheet syntax is used to say how elements are to be arranged and presented. The two categories of syntax are usually, but not necessarily, maintained in separate files. XML presents a similar, if more stark example. XML documents are simply collections of named data values. Again style sheets, XSLTs in this case, are used to tell a client how to render and present the XML data.

Where this gets interesting for me is in the idea of control. Suppose a business generates data, that the IT department causes to be maintained on servers and made available in the form of HTML or XML documents. Information designers working for specific departments that consume the data are able to create views of it that are useful in the context they work in. They may also attach functions that manipulate the data in ways that are specific to that context. Information processing at this level of modularization creates many valuable opportunities, but also requires the organization to be able to apply rules and policies in a more decentralized manner. Gone are the days when reports were run nightly on mainframes attended by devotees of arcane legacy systems, when it was easy to control what data was available, to whom, and in what form. The pipeline of pure information has spread its network out to the end users at the very root nodes of the corporate hierarchy. In response to the new capabilities a number of companies offer “document management,” “content management,” and “workflow” software that attempt to gain control over the information pipeline.

But what happens when the consumers are outside the business? An interesting technology for Firefox called GREASEMONKEY may cause that question to be asked more often. Greasemonkey is essentially a Javascript engine that allows users to create scripts on the client side that are able to manipulate the contents of a web page when it is displayed. Most web pages contain Javascript already, to accomplish tasks as diverse as running menu systems or displaying images. But the scripts in current pages were put there by the creators of the page, who had control over what was delivered to the end user. With Greasemonkey the end user can take some control over not just the presentation of the data, but functions that operate on it before it is viewed. The idea of syndicating content to users in raw form so that they can view it however they wish is not new; it’s happening right now with RSS news feeds across the blogosphere. But what will the consequences of this technology be for heavily branded sites, or sites that present sensitive interfaces to bank accounts, for example? Branding is almost by definition presentation, and its something that the average retail organization wants control over. Sites that present interfaces to important data have a vested interest in making sure that the overall interface structure, not to mention the data itself, is displayed to the user as the server sent it.

There is a lot that user-side scripting (DHTML, or Dynamic HTML, is another term for it) cannot do. For instance it cannot change what functions the server supports, or make it take some action it otherwise wouldn’t (unless by manipulating POST data, and in the case of sloppy server-side programming). But what it can do is cause enough for some serious thinking. Babak Nivi presents a few examples on his blog, and there is already a huge amount of comment going on about how revolutionary user-side scripting of websites could be. On his blog Phil Ringnalda talks about two battling scripts written for Amazon’s site: one which rewrites all the links, and another that tries to write them back. Nivi generally comes across in his writing as cheering on the new world of user control, despite preciently recognizing its potential impact. I understand this, and there are a lot of areas in which I would love to have this kind of control. Much of the information on the web is just that: information. I wouldn’t mind having more control over how it is presented. But as someone who has worked a long time in the transactional business side of the Internet I can’t but point out that some “websites” are interfaces to important business functions. We wouldn’t want users to have scripting control over ATMs. Some distinction is going to have to be made between sites that contain scriptable information, and sites that represent cohesive interfaces that should not be manipulated. The distinction will need to be made on the server, and enforced on the client. Otherwise what Nivi and others champion as the birth of “hypertext for the deep web” may become instead the death of eBanking and a host of other important online businesses.

Now This Gets Deep

A prof by the name of Colin Percival of Simon Fraser University in British Columbia has published a 12-page paper (PDF here) that shows how simultaneous execution of threads in the Intel Pentium hyperthreading model can lead to compromised security. If you can stomach wading through the details, it makes for a fascinating journey through processor internals. It helps if you can read assembly code and understand encryption. What he demonstrates is basically this: in the Pentium model simultaneously executing threads share access to the level 1 and level 2 memory caches. In the simplest exploit Professor Percival shows that two threads can use the timing of reads and writes from these caches to communicate bits between themselves at up to 400 kilobytes per second. That’s a fairly high-bandwidth channel, but the threads have to cooperate. In the piece de resistance he shows that a spy thread, working without the knowledge of the thread it is watching, can use the timing effects of level 2 cache misses to infer certain characteristics of the data being operated on, including important parts of the modulo arithmetic used in OpenSSL encryption key processing.