ESSAY: SHANNON MATTERNThe new wave of urban data science
In downtown Brooklyn, not far from where I live, New York University recently launched a public-private research center dedicated to advancing “the science of cities.” That word,science, has a way of creeping into public discourse these days. When the inaugural class in Applied Urban Science and Informatics arrived at the Center for Urban Science + Progress this fall, the students were personally welcomed by Mayor Michael Bloomberg, whose Applied Sciences NYC initiative funded the center’s creation in 2012.
On the center’s website, director Steven Koonin acknowledges that today’s cities have to be more “efficient, resilient, sustainable.” And to get there, of course, they need data: “The digital age has produced an incredible ability to collect, store, and analyze data. Bringing this ‘big data’ to bear on societal problems — from clean air to transportation to healthcare — is at the heart of CUSP.” With its various academic, corporate and government partners— including, among many others, Carnegie Mellon University and the Indian Institute of Technology, Bombay; IBM, Cisco and power company Consolidated Edison; and a bevy of NYC municipal departments, including the police and fire departments — the center, and its all-white, all-male leadership team, perched high above Brooklyn’s MetroTech, “observes, analyzes, and models cities to optimize outcomes, prototype new solutions, formalize new tools and processes, and develop new expertise/experts.”
NYU thus joins many corporately branded pursuits of algorithmic urban efficiency — IBM’s Smarter Cities program, Cisco’s Smart+Connected Communities Institute, Samsung’s u-City initiative, Intel’s Sustainable Connected Cities project, Living PlanIT’s Urban Operating System, and various automobile companies’ efforts to envision new “urban futures.” These Big Data approaches have provoked popular concern about surveillance and privacy, and raised questions among urbanists about how we’ll account for all the informal urban movements and transactions that take place off the sensor grid and outside the formal economy.
Luckily, most cities now have a growing corps of smartphone-equipped agents on bikes out in the streets, tracking and mapping all things informal — either to fill the gaps in the “official” dataset, or to construct an alternative dataset for unofficial use. We have citizen scientists and public labs, urban explorers and infrastructural tourists generating and collecting their own data, in the form of quantitative readings of air quality, soil samplesfrom brownfields, or noise readings from industrial sites. Elsewhere, we have civic hackershacking away on open data, developing a cornucopia of apps and data visualizations to disseminate social and environmental justice, one iPhone at a time.
Public Laboratory for Open Technology and Science, Balloon Mapping Kit and Mobile Spectrometer, 2013. [Photos byJeffrey Warren]
Despite their apparent differences in scale and ideology, these two camps — the institutional and the individual, the corporate and collective, the big and little — are aligned at a foundational level. What links them is a way of conceptualizing and operationalizing the city: theirs is a city with an underlying code or logic, one that can be hacked and made more efficient — or just, or sustainable, or livable — with a tweak to its algorithms or an expansion of its dataset. They also seem to share a faith in instrumental rationality, or what Evgeny Morozov calls “solutionism” — as well as a tendency toward data fetishism and “methodolatry,” the aestheticization and idolization of method.
Cities Are Made of Data
The contemporary “smart cities” model of urbanism is predicated on the rise of private-sector development, the emergence of new construction techniques that allow for the rapid erection of “cities in a box,” and of course the availability of new technology — including, in particular, tools for generating, capturing and analyzing data, and feeding processed data back into the urban system. But this computational vision isn’t entirely new. We can trace today’s smart cities back to the durable metaphor of the city-as-machine. While the metaphor is commonly regarded as a modernist invention, cities have long relied on machinic modules; the grid plan, for instance, has for millennia served as a “machine” for efficient circulation. Urban historians have also conceived the city as a machine for information management. As Lewis Mumford writes in The City in History:
Through its concentration of physical and cultural power, the city heightened the tempo of human intercourse and translated its products into forms that could be stored and reproduced. ... By means of its storage facilities (buildings, vaults, archives, monuments, tablets, books), the city became capable of transmitting a complex culture from generation to generation, for it marshalled together not only the physical means but the human agents needed to pass on and enlarge this heritage. That remains the greatest of the city’s gifts. As compared with the complex human order of the city, our present ingenious electronic mechanisms for storing and transmitting information are crude and limited. Media theorist Friedrich Kittler observes that cities have historically been not only sites of data storage and transmission, but also of data-processing and formatting. “It is almost as if the historian of cities [Mumford] had forgotten his insight that part of the greatness of ancient Florence consisted in having erected, with the Uffizi, the first office building — a central bureau for data processing.” 
We’ve also developed, especially over the past few centuries, new modes of representation and tools of administration that have reinforced the conception of the city as a machine for rational management. Ola Söderström has explained how the development of new mapping and visualization techniques —the ichnographic plan’s stabilizing, “precise and totalizing” representation of urban space; the master plan’s use of Victorian social statistics to divide the city into various demographic “zones”; and urban social cartography’s depersonalized, aggregate representations of social districts, which allowed for their administration through “curative urban planning interventions” — have contributed to the increasing “rationalization” of the urban landscape.  And media scholar James Donald acknowledges the role that “population surveys, police records, sanitary reports, statistics, muck-raking journalism, and photography” played in rendering the city “an object of knowledge, and so an object of government.” 
Today’s cities are morphing into a new kind of machine, for a more networked form of information management. Now we have cities “developing new artificial nervous systems, to supersede those articulated metabolic systems of the 19th century,” as Dan Hill puts it. The former urban informatics leader at Arup (now CEO of the creative think tank Fabrica)writes:
These newer nervous systems, not centralised but distributed, and predicated on digital networks of networks in which every object is informational and every movement or behaviour is trackable, could combine to form a new kind of lattice-like informational membrane, hovering magically over the physical fabric of the city. Data Fetishism
While the notion of the city as a data-generating, storing, processing and formatting machine might not be new, the reduction of the city to those functions — which are increasingly automated — and the reification of that data, is distinct to our time. Imagine the CUSP researchers merging massive streams of data to map all real-time traffic, weather, energy use, mobile device use, financial transactions, criminal activity, etc. — producing on their multiple flat-screens a map so rich that it becomes a territory itself. Then imagine that processed data, filtered through algorithms, feeding back into and transforming urban space or affecting human behavior in real-time (or pretty darn fast): re-syncing the streetlights or rerouting cabs to areas of high cell-phone activity. Morozov has identified the impulse behind such approaches as “solutionism,” which recasts “complex social situations either as neatly defined problems with definite, computable solutions or as transparent and self-evident processes that can be easily optimized — if only the right algorithms are in place!” 
The default recourse to data-fication, the presumption that all meaningful flows and activity can be sensed and measured, is taking us toward a future in which the people shaping our cities and their policies rarely have the opportunity to consider the nature of our stickiest urban problems and the kind of questions they raise. Often they do not even stop to wonder if the blips — which “the system” flags as “snafus” or “clogs” — are really problems at all. Are all “inefficiencies” — having parent-teacher conferences, for example, rather than standardized electronic evaluations posted to a government website — necessarily obstacles to be overcome? What’s more, Morozov says, “In promising almost immediate and much cheaper results,” solutionist techniques “undermine support for more ambitious, more intellectually stimulating, but also more demanding reform projects.”  If we can simply automate the depersonalized dispensation of social welfare, there may not be sufficient motivation to get our hands dirty digging for root problems like poverty, unequal access to healthcare and information services, and socioeconomic disparity in school performance.
Is there an ethos, a value system, driving these data-generated processes, or is it all just algorithms? Of course, we wouldn’t say that there’s no ideology inherent in the algorithms themselves, but the computers powering these Big Data projects run billions of operations that cumulatively produce substantive transformations in the urban landscape, with little regard for underlying values. As Orit Halpern and her colleagues argue, in their study of Songdo, South Korea, as a site for “testbed urbanism,” the rise of such programs “marks a turn against the faith in liberal subjectivity, denigrates the place of older political processes in decision making … and operates at a level far beneath consciousness.” 
Mark Foster Gage has identified a similarly “solutionist” and ethically compromised approach in what he calls “research architecture,” which often relies heavily on data collection and visualization and assumes “a legitimate cause-and-effect relationship between cursorily observed problems and their subsequent architectural solutions.” Research architecture too often cultivates the “mistaken assumption that we are always more powerful in dealing with social injustice or inequality in our role as architects than in our roles as citizens or activists.”  How many of today’s urban designers and policymakers and public servants see themselves as more powerful, and efficient, in dealing with urban problems as data scientists than as activists or critics or citizens or humanists?
Recall Mumford’s claim, in 1961, that “our present ingenious electronic mechanisms for storing and transmitting information” are crude in comparison with “the complex human order of the city.” More than 50 years later, even our exponentially more ingenious electronics are incapable of running algorithms that can fully describe and predict the urban sociocultural ecology. As John Thackara notes, so many of our urban resources, like health care, depend on interactions that are “relational, embodied, and context-dependent.” Trust is an important part of those exchanges, and “trust is not an algorithm.” 
Even quantitative metrics like energy use are not as simple as they seem. Sarah Bell points out that we can’t simply monitor energy use with infrared cameras to track buildings’ heat loss; we also have to consider cultural norms, including dress codes that require men to wear suits in the hottest months of summer and thereby necessitate excessive air conditioning.  “The intelligence of cities lies in the individual and collective minds of people who live there, not merely in the technologies they deploy,” Bell states. “Smart city technologies can provide useful knowledge about urban services and systems, but intelligent implementation requires critical understanding of what they amplify and what they reduce.”
Kate Crawford, of Microsoft Research, agrees that data scientists would benefit from better qualitative analysis of their quantitative data.  Those making and using urban data should follow the example of social scientists and humanists, pausing regularly to consider where the data come from, and how they’re derived and analyzed. “We know that data insights can be found at multiple levels of granularity,” Crawford writes, “and by combining methods such as ethnography with analytics, or conducting semi-structured interviews paired with information retrieval techniques, we can add depth to the data we collect. We get a much richer sense of the world when we ask people the why and the how not just the ‘how many.’”
Earlier this year, NYU launched the Marron Institute on Cities and the Urban Environment, funded by a $40 million gift from Lightyear Capital Chairman Donald Marron, who, not coincidentally, founded Data Resources Inc., the world’s largest source of economic data, in 1969. The new institute ties together three NYU programs: the interdisciplinary Institute for Public Knowledge, the business school’s Urbanization Project and CUSP. Ostensibly it will provide an opportunity for CUSP researchers to critically frame their data through the lenses of the social sciences and humanities, and for those fields to explore more deeply the potential of data-driven methodologies. Announcing the new venture — again, beside Mayor Bloomberg — university president John Sexton acknowledged that “Cities are more than just infrastructure and technology; they are also social interactions, culture, and neighborhoods.”
Yet it’s not enough simply to get social scientists and humanists in on the action of extracting data from urban residents. Those residents themselves should play a greater role in determining how, and if, they’re exploited in the production of urban data; and they might want to generate data of their own, to feed back into the city, as agents of what Adam Greenfield calls “read/write urbanism.”  Dan Hill envisions a future of grassroots engagement in urban governance:
Urban information design emerges in a call-and-response relationship with informatics, filtering and describing these patterns for the benefit of citizens and machines. The invisible becomes visible, as the impact of people on their urban environment can be understood in real-time. Citizens turn off taps earlier, watching their water use patterns improve immediately. ... Road systems can funnel traffic via speed limits and traffic signals in order to route around congestion. ... Citizens can not only explore proposed designs for their environment, but now have a shared platform for proposing their own. They can plug in their own data sources, effectively hacking the model by augmenting or processing the feeds they’re concerned with. We, the urban publics, aren’t limited merely to responding to the system — e.g., turning off the tap when a bathroom console tells us we’ve used more water this month than last. We have the potential to “hack” the official urban network with user-processed and -formatted data.
Still, it seems to me, even if we “citizens” are generating the data — via D.I.Y. science projects, field surveys, hackathons, etc. — we’re still facing the city as a computational problem. And the master’s tools will never dismantle the master’s house.  If we gather lots of (mostly well-educated male) programmers, armed with expensive machinery, and put them in a room with a tank of coffee, their version of “social change” will almost always involve finding the right open data set and hacking the crap out of it. Not only does the hackathon reify the dataset, but the whole form of such events — which emphasize efficiency and presume that the end result, regardless of the challenge at hand, will be an app or another software product — upholds the algorithmic ethos.
We’d do well to think more about the motivations and ideologies behind, and methodologies implied by, these quick-attack “-thons” and “sprints” and “slams.” “Most companies think that if you can just get hackers, pizza, and data together in a room, magic will happen,” contends DataKind’s Jake Porway. Hackathons often lead with the data, from which participants then retroactively construct a question or problem. Instead, Porway argues, it’s imperative to begin with a clearly defined problem — one articulated through consultation with specialists who understand not only how the data were derived, but also how those data reflect, or fail to reflect, on-the-ground urban realities — and then involve those same specialists in evaluating the end results. We might also consider the possibility that there are no “results” to evaluate: perhaps no app is the right approach. Perhaps a hackathon could end with the admission that the data offer no solutions — that particular urban challenges simply can’t be “conditioned” or “normalized” into algorithmically-tuned efficiency.
Part of data’s appeal, we must acknowledge, is aesthetic. Data lends itself to presentation in sexy visualizations and packaging in sleek apps. Philosopher of science Paul Feyerabend, in Against Method, reminds us that science — and this applies particularly well to data science — proceeds not only by rational, quantifiable means, but also through “irrationalmeans such as propaganda, emotion, ad hoc hypotheses, and appeal to prejudices of all kinds.”  Data are, as intellectual historian Daniel Rosenberg notes, “rhetorical.” “Data” is the plural of the Latin datum, which is the neuter past participle of the verb dare, to give. Thus, “a ‘datum’ in English … is something given in an argument, something taken for granted.”  There’s no such thing as “raw” data; all data are formed through the means by which they are derived and presented. And there’s an aesthetic dimension to their derivation and presentation.
The exhausting ubiquity of data visualizations — thousands more of which are sure to emerge from CUSP — has made it clear that data, in their “givenness,” are inherently aesthetic. Yet in many recent citizen science, public lab and design research projects, even the methods for generating data are stylized; we see the rise of an aesthetics of measurement. Researchers seem to be fascinated by the sensory and affective dimensions of measuring things — the fact that measurement isn’t a purely objective task — and, to feed their passion, they’re designing a host of measurement tools as objets d’art: lovely little bento boxes of tools, fanciful surveying equipment, deliciously weird Tom Sachs-ishvisioning machines. Speaking of Sachs: we can certainly see the influence of the artist’s own modus operandi, knolling, or the ordered arrangement of objects, in many of these projects.
Consider Venue, “a portable media rig, interview studio, multi-format event platform, and forward-operating landscape research base” that recently toured the continent documenting “overlooked yet fascinating sites through the eyes of the innovators, trendsetters, entrepreneurs, and designers at the forefront of ideas today.” By “record[ing] and survey[ing] each site through an array of both analog and high-tech instruments,” the Venue team aimed to “assemble a cumulative, participatory, and media-rich core sample of the greater North American landscape.” In an interview with MAS Context, Venue’s Nicola Twilley confessed that their use of various instruments was in large part poetic, aesthetic: “a self-conscious gesture to the fact that the devices that you choose to bring along with you already are embedding assumptions onto the landscape.”  Venue partner Geoff Manaugh added that “once you have the instruments to measure the landscape, you start paying attention to that thing that you maybe would have not otherwise thought about or noticed.” The instrument embodies a mode of observation that conditions how one engages with the landscape and what data one collects.
Venue participants also kept a logbook in which they recorded site variables such as ground wind direction and speed, solar wind direction and speed, sun spots and barometric pressure, and they tagged the posts on their research blog with this metadata. Twilley admitted to “amassing big amounts of data about the landscape and not even knowing how to make sense of it” — generating data not for sense-making but perhaps for all those irrational means Feyerabend talks about. Venue’s pastiche of tools and methods from journalism and geology makes measurement an aesthetic endeavor.
Consider also the Los Angeles Urban Rangers’ 2006 project, Interstate: The American Road Trip, which was “intended to facilitate sharpened observational skills for reading 21st century roadside geographies.” A key component of the project was their Interstate Road Trip Specialist Field Kit, an objet d’art itself, which contained tools whose methodological utility was explained in an accompanying Field Guide, which I discussed earlier in this journal. Then there is David Garcia, of Lund University, whose Svalbard Architectural Expedition in the Arctic utilized a range of beautiful, custom-designed surveying instruments, which tested the sound-absorption properties of snow and the insulation properties of ice and snow tiles, as well as the translucency of those tiles; investigated how the extreme landscape and climate impact the visible appearance of light; and warned when polar bears were approaching. This data will implicitly feed into the creation of more contextually responsive design.
We might trace these recent “toolkit” projects back to precursors like Fluxus game kits, and to the use of cultural probes in design research. Eric Paulos and Tom Jenkins have designed “urban probes” that are meant to “bypass many classical design approaches — opting instead for rapid, nimble, often intentional encroachments on urban places rather than following a series of typical design iteration cycles.” Probes, they write, are a “fail-fast approach,” a means of “conducting rapid urban application discovery and evaluation metrics.”  Or, as Kirsten Boehner, William Gaver and Andy Boucher explain, probes “open up possibilities, rather than converging toward singular truths”; they favor “playfulness, exploration and enjoyment.”  The new wave of designerly toolkits are similarly speculative, generative, meant to stimulate new ideas rather than deduce facts. Venue and company have adopted what sociologists Cecilia Lury and Nina Wakeford would consider “inventive methods,” methods whose thoughtful application has the potential to “address a problem and change that problem as it performs itself,” but whose impact can’t be given in advance. 
The specific aesthetic qualities of these projects seem to follow from the aesthetics of administration, archival aesthetics and the aesthetics of the lab. Undoubtedly they are also inspired by the circuit bending and “make your own tools” movements. But still, I wonder what has made measurement and data collection — often with analog tools — so cool, so worth aestheticizing, in this age of sentient technologies and Big Data. Perhaps it’s partly because, in contrast with the machines automatically harvesting mountains of data, these toolkits allow for a slower, more intentional, reflective, site-specific, embodied means of engaging with research sites and subjects. In some cases, they also allow researchers todesign their methods. Kirsten Boehner and her colleagues note that the design of a probe or methodological toolkit reflects the character of both the research site or subject and the researchers; and at the same time, it serves rhetorically to elicit a particular kind of user engagement.
In the appendix to his 2010 book, Political Aesthetics, Crispin Sartwell proposes 52 potential research projects that would “encourage the greatest possible variety of methodologies.” Proposal #2 is a study of the “political aesthetics of measurement.”  That’s precisely what we need here, since both camps of urban research I’ve discussed — the Big Data initiatives of CUSP and IBM, and the citizen science projects of D.I.Y. data-collectors and manipulators — are simultaneously aesthetic and political. Similarly, we need to consider the relationships between (1) data collection, which is foregrounded in much of this work; (2) method; and (3) methodology.  In many cases, concern with the aesthetics of measurement and data overpowers considerations of how that measurement functions as a method. By method I mean “the techniques or procedures used to gather and analyze data related to some research question or hypotheses.” Methodology refers to the “strategy, plan of action, process or design lying behind the choice and use of particular methods; and the connection of the choice and use of methods to the desired outcomes.” 
Of course we are free to use tools for tools’ sake and gather data in an exploratory fashion, as part of inventive, speculative research. But to combat fetishism of the tools and the data we also have to think harder about what it all adds up to — or what we want it all to add up to — and select our tools in support of larger epistemological and theoretical goals. We would do well to pause and question the nature of our urban problems, and consider our strategies for gaining better understanding of those problems, before jumping to the conclusion that data have the answers.
I am not a data scientist, but I do work in fields in which the methods and ideals of “scientific” data collection have a growing appeal. And sometimes the most readily apparent or accessible way — for students in particular — to gain entry to those complex practices is to take on the aesthetics of measurement: to devise a clever data collection system, to accumulate a reassuringly big pile of data, and to massage that data into a persuasive visualization. That’s a worrisome trend. This isn’t to say that engagement with the affective or stylistic dimensions of measurement precludes engagement with its larger methodological functions; Feyerabend has shown us otherwise. Rather, I hope these concerns are brought into alignment: that the methodological packaging suits the purpose, the form serves the function, the knolling serves the knowledge.
To isolate these concerns, and to focus only on measurement for measurement’s sake — or its scientific “look” — feels a bit like methodolatry, a neologism composed, as you might expect, by mixing “method” and “idolatry.” Valeria Janesick defines methodolatry as “a preoccupation with selecting and defending methods to the exclusion of the actual substance of the story being told.”  One manifestation of methodolatry is thefetishization of method, or a preoccupation with method to the extent that it directs one’s research, perhaps even driving the questions one asks. Medical scientists speak of the “worship of the clinical trial,” and we of course see plenty of examples in urban and design research in which the data lead the way. Another manifestation of methodolatry is theidolization of method — the adoration of measurement’s image or representation: the knolled toolbox, the hacked perceptual machines, the scientific flowchart, the seductive data visualization.
Or perhaps these methodolatrous projects, in their aestheticization of measurement, are calling our attention to presumptions about scientific rigor, parodying our algorithmic impulses, tacitly asking questions about the ideology of a pervasive culture of measurement and assessment. Perhaps, despite their implicit alliance with CUSP and Cisco and the like, our citizen data gatherers want to highlight the “givenness,” the rhetorical nature of that data, to show its inherent irrationality, to demonstrate that the “science of cities” is also, necessarily, an art.