Logo Hochschule Luzern Design & Kunst

Subscribe newsletter

We will inform you about CA twice a year Plus 5 – 8 event announcements per year


Paglen Meets Agre

Imaging systems and grammars of action.

Position by Max Bruinsma

Max Bruinsma was core teacher Transmedia Storytelling at Camera Arts, from 2015 – 2021. He studied Art History, Architecture and Design History in Groningen and Amsterdam, the Netherlands. An art- and design critic since 1984, he has published in major international professional magazines. He was Editor in Chief of graphic design magazine Eye (London) and design magazine Items (Netherlands), and Supervising Editor of Iridescent, the peer-reviewed journal of design research (online). He has worked as editor, curator and teacher internationally, among others for the ExperimentaDesign biennale in Lisbon and the Utrecht Manifest biennale on social design in the Netherlands. Among Bruinsma’s many publications on design, art, new media and visual culture are his books Deep Sites, on early webdesign (2003), and Design for the Good Society, on social design (2015). In 2005 he received the prestigious Dutch Pierre Bayle Prize for design criticism.


There’s a lot that doesn’t meet the eye. We know it’s there, but we can’t see it. Things around the corner, for instance, or behind closed doors, or below the surface. Interestingly, all of these expressions, beyond just describing limitations of our perceptual faculties, have become ways of expressing more existential views about ‹what’s in the dark›. We project the near future as being ‹around the corner›; we are frustrated by not being privy to what goes on ‹behind closed doors›; we wonder whether we can actually understand a phenomenon without knowing what transpires ‹below its surface›.

So we imagine it. In our mind’s eye, or literally, by drawing a picture; we visualize what we cannot actually see. Since the dawn of humanity, next to depicting what we actually witness, we have made images of what cannot be seen, or of what we have lost sight of. Among the earliest surviving records of human(oid)s asserting their presence by leaving pictorial marks are stencils made by Neanderthals spitting paint over a hand pressed against the wall of a cave (see slider above).Dirk L. Hoffmann et al., «U-Th Dating of Carbonate Crusts Reveals Neandertal Origin of Iberian Cave Art», in: Science 359 (2018), no. 6378, pp. 912–915. DOI: 10.1126/science.aap7778The paint delineates an individual’s hand – the hand itself is absent. Imagine this individual’s wonder, more than sixty-six thousand years ago, after he took a step back and pondered the negative shadow of his own hand. It had been there, it had touched the rock – the image was not so much a picture of a hand as visual evidence of the fleeting act of touching. It was a record. Call it a very early instance of activity-data capture.

For all but a select few researchers, this ancient record of a Neanderthal individual’s presence in a remote corner of a Spanish cave becomes visible only through photographs – enhanced photographs, at that. To see that there is an image at all, and that it actually represents the act of making an image sixty-six thousand years ago, needs the trained eye of an archaeologist. For the rest of us, it becomes visible only after some serious image processing. So we see an image of digitally processed photographic data, which can be experienced as a visual record of somebody having touched a cave wall, eons ago. The camera is merely one of several technical and craft media involved in ‹making› this image, including the Neanderthal’s mouth, pigments and computer hardware and software.

Although this does not in itself mean that we are dealing with a ‹post-photographic› or even a ‹multimedia› image here, my description of it does suggest that we have meanwhile developed a rather expanded idea of what a ‹photograph› is. Once a medium that could assert an exclusive claim to objectivity regarding the depiction of reality, it has now become one of many technical media providing data for visualizing anything imaginable, factual or fictional. The factuality of the enhanced photograph of an image sprayed by mouth tens of thousands of years ago depends not so much on the mere fact that it was photographed, but on the story that accompanies this photograph, supported by verifiable data about the chemical conditions of the cave and its mineral surface, sound archaeological argumentation and transparent processing of the photographic data. In a sense, it flips an old adage, connected to the veracity of photographs, ‹seeing is believing›, to ‹believing is seeing›. Unless we believe the scientists’ narrative, all we’ll see is a photo of a cave wall with weird excrescences of mineral deposits.

In a series of short posts published in 2014 on the website of the Photomuseum Winterthur, artist and photographer Trevor Paglen elaborates his notion of «seeing machines». And he ponders: «What happens if we think about photography in terms of imaging systems instead of images?» The phrase ‹imaging systems› is interesting because it shifts the view – and our understanding of the concept – of what (photographic) images are from the end result (pictures) to the processes that produce this result and render it meaningful. From the image as a product to ‹imaging› as a socio-techno-cultural process. In his posts, Paglen delineates what he calls an «expansive definition of photography». In broad strokes he maps the terrain: «Seeing machines includes familiar photographic devices and categories like viewfinder cameras and photosensitive films and papers, but quickly moves far beyond that. It embraces everything from iPhones to airport security backscatter-imaging devices, from electro-optical reconnaissance satellites in low-earth orbit, to QR code readers at supermarket checkouts, from border checkpoint facial-recognition surveillance cameras to privatized networks of Automated License Plate Recognition systems, and from military wide-area-airborne-surveillance systems, to the roving cameras on board legions of Google’s ‹Street View› cars». Moreover, Paglen’s new definition contains not only the apparatus for ‹making› images, but also the resulting images themselves and the way they are interpreted by either humans or other machines or algorithms.

Thus, one might say, the definition of photography explicitly integrates narrative and rhetorical notions, which have always been part of photography’s discourse, but which were usually seen as contingent effects of the cultural use of photographs, rather than as constitutive of what photography is. In this expanded – or indeed ‹post-photographic› – concept, a photograph does not merely produce meaning; the image is also produced by the meanings we project onto it and by our understanding and use of the technologies and practices through which it becomes meaningful.

Paglen introduces the notion of ‹scripts› for this entanglement of technology and its cultural or economic or political or administrative use: «I think about a ‹script› as the basic and obvious function of an imaging system, its ‹style› of seeing, and the immediate relationships (between seer and seen, for example) it produces, and the obvious ways in which a seeing machine sculpts the world». He describes, for instance, the way an automated number plate reader (ANPR) «wants to see» the world – by not just photographing car number plates, but by connecting the data extracted from these images to information about the location of the vehicle, the owner and public or private records that will make this data meaningful in specific ways. Thus, states Paglen, «seeing machines create cultural, economic, and political footprints on society at large». The same goes for the common digital camera, I’d say, with its gridded viewfinder rectangle, which is part of a ‹script› that also includes Insta memes and YouTube tutorials that instruct camera users on how the medium, and not just the technical tool, «wants to see the world».

Grammars of action

Paglen’s notion of ‹scripts› closely resembles what computing and artificial intelligence scholar Philip Agre described as «grammars of action» within computer «capturing systems». Philip E. Agre, «Surveillance and Capture. Two Models of Privacy» (1994), in: The Information Society, 10:2, 101-127 Now Agre developed these concepts in the context of theoretical reflections on how computing systems deal with ‹information›, mostly within the framework of surveillance for administrative and business uses. But I think it is worthwhile to project these theoretical constructs onto how we see photographs and how we make and use them within narrative scripts which direct our interpretation of the pictures that, at the same time, serve as visible substantiation of the narrative.

In Agre’s terms, information is commonly seen as true – «that it corresponds in some transparent way to certain people, places, and things in the world». It is, as the etymology of the term ‹data› suggests, a ‹given›. Information, in order to be processed by computers, needs to have a certain structure, with rules that govern how each bit of information – each ‹given› – becomes meaningful in relation to others, so together they align into something we can make sense of and which we can accept as being ‹true›. These structures are called grammars in analogy to how grammar structures the alignment and variation of words in a correct sentence. Agre develops the concept of «grammars of action» to describe how computers (i.e. engineers and coders) structure data representing actions in the ‹real world›. In order to make sense of data ‹captured› from connected sources (cameras, sensors, other computers) or inputs (records of time, movement, location etc.), the computer program needs to be able to recognize specific objects, variables and relations between data. So the program projects a ‹grammar› onto what it ‹sees›. For our purposes, one could say that in photography, for instance, resolution, contrast and focus, among other things, determine to a large extent what the camera sees. Things smaller than the grains of the film or the pixels of the digital chip, contrasts below the threshold of differences in density that these granules or pixels can render etc. are not seen, just as things outside of the picture frame are not seen. Thus the camera ‹wants to see› the world in a framed, specifically conditioned way that the photographer needs to comply with – a grammar of action that fundamentally conditions the way we make and see photographs. Capture, therefore, in this case means a lot more than just mechanically registering light on film: it anticipates all the actions a photographer must perform in order to make a viable photo, and projects the criteria by which this photo will be judged as a ‹good› or ‹bad› photo back onto the photographer.

Agre stresses the impact of such grammars by pointing to «a kind of mythology» that is often constructed around them, «according to which the newly constructed grammar of action has not been ‹invented› but ‹discovered›. The activity in question, in other words, is said to have already been organized according to the grammar». Thus we say that photography, for instance, was merely ‹discovered› as a technology for rendering the world as we already see it. Agre, on the other hand, would insist that it rather constitutes «a reorganization of the existing activity, as opposed to simply a representation of it». This reorganization is what Paglen calls ‹script›. The scripts embedded within the very core of seeing machines reorganize the way we see the world, and impose this reorganized way of seeing on us, their users. In Agre’s context, the grammars of action that structure the way that computers make sense of data are projected back onto the computer’s users. If the user’s input does not match the program’s expectations, the computer says ‹no›. Seeing machines do something similar in structuring the way they ‹see› the world in a specific manner, which in turn reorganizes the way we experience it. A very funny example of this reorganization is Erik Kessels’ 2010 collection of attempts by amateurs to photograph their black dogs.Erik Kessels, In Almost Every Picture #9, Amsterdam 2010 The camera said ‹no› and produced vaguely dog-shaped black holes in pictures that, through this reorganization of the visible world, become quite uncanny.

Back to the cave painting. The digital photograph of the cave wall becomes a meaningful image for most of us only after we apply a specific grammar to its data that prioritizes certain aspects of the data and discards or reduces others. This reorganizes the way we see the image to the extent that we can now see the outlines of a hand. The story that accompanies the image convinces us that these outlines constitute a hand stencil that was left on the wall of the cave at least sixty-six thousand years ago. Could this image – both the hand stencil itself and the enhanced photograph of it – have been produced by any other means? Yes. Theoretically, the whole configuration of mineral residues could have been of a purely chemical nature, without the interference of any conscious acts by hominids. The chemical analysis and archeological argumentation of the scientists makes this theory very unlikely, though. More interestingly, the photo of the hand stencil could theoretically have been produced by other technologies. All kinds of sensors and scanning devices that ‹look for› specific wavelengths of reflected light or emitted radiation, for instance, could produce a similar or perhaps even better image than the enhanced photograph. We have, in short, developed quite an impressive array of seeing machines beyond the traditional camera, machines that we can use to translate any available data into images that pass for representations or ‹likenesses› of reality. Take the cave itself: we can combine ‹Lidar› or ‹Terrestrial Laser Scanning›, digital cameras, GPS data and animation software to map an interactive 3D model of the cave’s interior relative to its underground location, and get the experience of walking through it – similar to photos or films of it, but also quite different. An elaborate example of this ‹imaging› of subterranean space was done by a team of the National School of Surveying of the University of Otago, New Zealand, in scanning and modelling the tunnels and quarries built below the French town of Arras by New Zealand military engineers during the First World War. LiDARRAS project, 2017. A collaboration between the National School of Surveying, Otago, New Zealand; the École Supérieure des Géomètres et Topographes (ESGT Le Mans, France); the city of Arras; the Museum Carriere Wellington and alumni from Otago’s School of Mines.

The laser scanner produces a huge array of dots, each with specific location data relative to the laser’s source. With the implementation of the correct grammars, these individual dots can be aggregated to ‹point clouds› that can be rendered at high resolution as visual representations of 3D spaces. The imaging system can be used to model buildings or entire cities, or in this case the Arras tunnels. Combined with carefully calibrated configurations of images shot by visible light cameras, these renderings can acquire a level of detail that make them hard to distinguish from traditional photography. Or film – the location and distance data behind the renderings contained in the point cloud facilitate the visual experience of seamless movement through the space. Thus the visible world has become transparent to our seeing machines. We can virtually walk through walls and mountains and oceans knowing that what we see closely corresponds to the actual material facts that make up the visual experience. This amounts to a renewed claim to veracity on the part of digital imaging media: that their captured data have a one-to-one relationship with material reality. But there is an interesting – and I’d say fundamental – difference with the ‹objectivity› of the traditional camera, whose recordings we once trusted as immediate reflections of the visible reality before the lens. In the expanded field of photography, we have to assess and understand a seeing machine’s grammar of action before we accept its veracity. How else could we decide whether what we see is augmented or enhanced reality (i.e. basically ‹true›), or merely a ‹virtual reality› that only exists as artistic fantasy (i.e. basically ‹fake›)?

Disembodied entities

Photography has become an expanded network of imaging systems within which each system specializes in different ways of providing data for visually representing the measurable world or, in Paglen’s words, the way it «sculpts the world». This has significant consequences, not only for the way photography functions as a technology for generating images and as a library of cultural ‹scripts› for using them, but also for transmedial visual storytelling. The Arras story is a case in point. Next to a ‹conventional› visual story about the tunnels, told via a combination of texts and some ten thousand hi-res photos, there is a ‹making of› video that clarifies the way the tunnels were mapped and modelled, and the hundred gigapoint point cloud (i.e. a few dozen terabytes of data) which is used for the animated fly-through videos and for an interactive web viewer, in which visitors can virtually walk through the tunnel maze themselves. The ‹making-of› video is not just peripheral to the narrative. I think it’s essential: it allows us to understand the grammars of action that the seeing machines use to make us experience the tunnels’ interiors. It makes us understand, and therefore accept, some weird glitches in the visual experience. Walls become partly or totally transparent at some points, for instance, and we are not forced to follow the physical trajectory of the tunnels. At the same time, this ‹grammar› or ‹script› enables us to keep an overview of the entire network and its relation to the visible world above it. Obviously, the data can be used for a VR experience as well, which is now being developed for use in the Musée Carrière Wellington in Arras. From a transmedia storytelling viewpoint it is easy to imagine how such a variety of media and renderings of the huge data set could be used to create vivid audiovisual stories, with added archival material from the First World War and records of soldiers and locals who lived and fought in the region and used the tunnels for shelter. In all this we become disembodied entities, dematerialized beings floating through a semblance of the material world – which constitutes a rather dramatic reorganization of how we usually perceive the world.

Understanding the grammars of action or scripts embedded in each of the media involved is not only important for viewers or ‹users› or ‹experiencers› of the visual narratives; it is also crucial for makers. They should realise that the seeing machines they employ «sculpt the world» by enhancing certain aspects but also by reducing or discarding certain others. Information tends to get simplified into manageable categories by the grammars of action that computers (and seeing machines) use to make sense of the chaotic complexity of the raw data they capture.  The fundamental insight that whichever medium we use always leaves out much more than it shows has never been more relevant. For despite their claim to data-objectivity, or their mythology of truthfulness, in Agre’s view, today’s networked seeing machines have internalized biases within their grammars of action, which were once mainly associated with human agency. The camera didn’t lie, whereas the photographer or editor could. Now that seeing machines, including ‹smart cameras›, do more than simply capture visual data in the old sense and actually interpret these data for us or for other machines before ‹rendering› them as something that the imaging system judges to be an accurate image, we can no longer be so sure. Autocorrection software, for example, has been built into smartphone cameras for quite some time, and to the point that it has started to annoy many users. The terminology of these apps (‹beautification› and ‹slimming›, for instance) testifies to the kind of sociocultural biases built into the cam’s scripts. In October 2020 Google announced that they would try to be less judgmental. See Sarah Perez, «Google takes aim at ‹beauty filters› with design changes coming to Pixel phones», in: Techcrunch.com, 20 Oct. 2020, or search «selfie face correction» for more details.

All of this prompts a redefinition, not only of photography, but of what we mean by terms like ‹likeness› and ‹depiction›. How is a point cloud ‹like› the reality it depicts? How does it sculpt the world? Such questions are triggered by other imaging systems as well, as Paglen has extensively shown in his work on surveillance systems and machine vision. A very instructive example from his recent work is the series of portraits he generated using a machine learning system in which he stored facial recognition models of people he had collaborated with. This meant that the program would look for all kinds of features that it ‹knew› as characteristic of the particular person’s face. Paglen then had another program bombard the facial recognition system with random polygons, which were either recognized as potential matches for the stored features or discarded. Going back and forth, the image gradually evolved into something that the facial recognition program identified as a representation of the given person (see image above). Paglen observes that we end up with «a kind of latent portrait from ‹inside› the facial recognition software». The resulting image is quite instructive: comparing the machine’s portraits with actual photos of the subjects suggests that it operates with a different grammar of action than we do – I for one do not recognise A. C. Thomson in the generated portrait above, when I compare it to actual photos of Thomson. Which begs the question, who formulated the ‹grammar›? Actually, this is the central question of Paglen’s oeuvre: «What kind of judgments are built into technical systems? Why are they made that way? Who are they benefiting and at whose expense do they come?» Trevor Paglen, «Bloom», video statement for Pace Gallery, London, September 2020. Paglen lets the code ‹speak for itself›. Another artist, David Birkin, sabotages the code by inserting elements that do not fit its grammar of action. In a series of works he ominously titled Embedded, he disrupted the computer code of digitized images taken in times of conflict.

In the example here, he inserted the name of the photographer, Yosuke Yamahata, into the code of a photo that Yamahata took directly after the atomic bombing of Nagasaki. Yamahata died years later as a result of the radiation he was exposed to while taking this photo. The ‹ungrammatical› text within the photo’s code produces a glitch in the image that works as a compelling aesthetic emphasis of the invisible forces at work within the reality that the picture depicts.

Grammar as ideology

Already in the first ever photograph, Joseph Nicéphore Niépce’s famous 1826 photo of the view from his window, the story around the image is an essential ingredient of the image itself – in essence, it makes the image visible. Compared to the Maltravieso cave photo discussed above, the image that Niépce recorded on the original metal plate over a period of some eight hours is as invisible as the Neanderthal’s hand. Most of us know Niépce’s image in the enhanced version made by photography historian Helmut Gernsheim, who rediscovered the plate in 1952. So the ‹first› photo as we know it is an enhanced reproduction of a print of a photo, taken at a specific angle under specific lighting conditions, of a shimmering pewter plate with some nondescript shadows on it. Gernsheim, in other words, was an expert who understood the grammar of action embedded within Niépce’s seeing machine – a camera obscura and a pewter plate with a specific mix of chemicals which, under the right lighting conditions, produced a specific optical effect. The story that Gernsheim’s enhanced version is a reliable representation of an image made over a century before is based on more than direct technical translation – again, if we have no means of verifying this story, or if we are unwilling to believe its argument, the image does not exist.

Agre argues that «capture is never purely technical but always sociotechnical in nature». Returning to his assessment of the rhetoric or mythology that often accompanies the construction of a grammar of action – namely that the grammar would be merely a newly ‹discovered› and therefore reliable translation of how humans act in the real world anyway – he warns us that «if the capture process is guided by some notion of the ‹discovery› of a preexisting grammar, then this notion and its functioning should be understood in political terms as an ideology». This brings me back to my opening reflections on how we express our existential views about what’s in the dark. We imagine, we speculate, we test, we argue, we falsify… we conjure up stories that are meant to ‹capture› reality, including its invisible and abstract aspects, which we interpret based on ideologies that connect all of these aspects. We have developed machines that allow us a view of the world as it actually is, or so we hope, by constructing a layer of abstract data on top of the visible world, and we enmesh it with everything we can see. It is as if we have finally succeeded in reliably rendering what is outside Plato’s cave.In the Greek philosopher’s allegory, we dwell in a confined cave with no means of ever getting out. All we see of the world as it essentially is are shadows, which we take for ‹reality› because we ourselves have no way of standing in the light of truth.

With Plato’s metaphor in mind, we can appreciate Paglen’s and Agre’s critical views on seeing machines and grammars of action as critiques of the ‹escape from the cave› ideology that surrounds much of our new, technologically enhanced vision of the world. At the same time, we can use this insight to hack the wealth of available imaging systems and put them to unforeseen uses in narratives for which we tweak or rewrite their grammars of action. This, I think, is a major task for (trans)media makers today: employ the grammars of action implicit in the media you use not only for enhancing your recipients’ experience but also, and perhaps more importantly, for empowering them to critically assess the scripted reality we all live in. In short: don’t be satisfied with the mere making of images. Endeavour to make images possible.

By browsing our website, you agree to us using cookies, read here for more information.