In a burst of cherry blossoms, the blog is resurrected!
(It's a Sekiro joke. That's what the post is about.)
There are certain models that have become ubiquitous in AAA game design over the last five years, inspired by MMOs: randomized gear (both from enemy drops and loot boxes), player and enemy level progression, and crafting. In "live" games like Destiny, these gameplay loops are meant to keep the game sticky, making sure there's always a goal just out of reach. But they've become common in single-player games as well, a way to extend the length of campaigns that are increasingly expensive in an era of HD assets.
I generally find this off-putting, and I suspect many of the designers do as well. In Control, for example, grafting a loot system onto one of Remedy's quirky, cinematic stories feels extraneous, and the game seems well aware of that fact — the ubiquitous shelters containing exactly one (1) loot box could not possibly be more desultory and half-hearted compared to the exuberant weirdness of Dr. Casper Darling's musical stylings.
All of this provides context for my extremely mixed feelings about Sekiro: Shadows Die Twice. It's the first From Software "souls-like" title that I played to completion, and it'll probably be the last. I don't know that I would recommend it to anyone. But in its purity and its rejection of the "live game" grind, there is also something admirable that sticks with me.
The core mechanic in Sekiro is the relationship between two meters, vitality and posture, possessed by every combat entity in the game, including the player character (the Wolf). Vitality is basically a standard health bar. Posture is similar to a fighting game's guard meter: it increases as a character defends against attacks or when an attack is parried with perfect timing, and when it maxes out, the guard is broken. In the player's case, this leaves you open to attacks, but for enemies it means you can land an instant-kill "deathblow." Even still, vitality isn't useless since posture recovery speed is directly related to health.
Different enemies have different thresholds for their posture and vitality, and the interplay between these two meters is what makes Sekiro's combat interesting and aggressive. Since timed parries inflict extra posture damage (and reduce the posture you'd normally take from defending), many battles revolve around punishing enemy attacks to bring down vitality and slow posture recovery, then baiting counter-attacks strings in order to break posture and land a deathblow. At its best, it feels like a great samurai battle: fast, dynamic, and deadly.
The problem, honestly, is that it's too deadly. To win a fight in Sekiro, you need to learn an enemy's attack patterns well enough that you can parry or dodge them, a task that's made more complicated in fights where a mid-string parry may change the pattern. But mistakes are incredibly punishing: it's not unusual for many bosses to kill the Wolf in two or three hits, and a surprising percentage of them are one-shots. Realistically, you're only going to learn one or two patterns per run before you get to stare at the loading screen and start over.
And that deadliness only really goes one way. Yes, the deathblow is an instant kill — but only after you slowly tick down the target's vitality and then engage in risky exchanges of posture damage. Compare this to Bushido Blade, an obvious inspiration but one where everyone (player and opponent alike) could die in an instant. In Sekiro, you might as well be fighting with a butter knife compared to everyone else. I wouldn't mind the perfectionism so much if it felt like I got more impact out of it.
From Software is known for games where growth comes from player skill, and not from a mechanic or in-game reward. They're also known for masochism and cheap shot tactics. Sekiro feels like the purest expression of both. The result is an experience that I respected more than I enjoyed it. For all that it's thrilling when everything clicks, those moments are punctuation in long stretches of frustration — running the same route over and over, getting just a little bit farther before a lucky shot or a botched parry sends you back to the checkpoint.
This is, of course, part of the appeal for long-term Souls aficionados: your brain remembers the highs, and tends to ignore the long valleys of frustration between them. It's not for me. But I appreciate what it's trying to do, and the pressure it hopefully exerts on modern design. There is a middle ground between loot boxes and "get good," and maybe with HD development becoming increasingly unsustainable, the AAA industry will finally find it.
With Super Tuesday wrapped up, I feel pretty confident in writing about Betty, the new ArchieML parser that powered NPR's new election liveblogs. Language parsers are a pretty fundamental computer science discipline, which of course means that I never formally learned about them. Betty isn't a very advanced parser, compared to something that can handle a real programming language, but it's still pretty neat — and you can't say it's not battle-tested, given the tens of thousands of concurrent readers who unknowingly consumed its output last week.
ArchieML is a markup language created at the New York Times a few years back. It's designed to be easy to learn, error-tolerant, and well-suited to simultaneous editing in Google Docs. I've used it for several bigger story projects, and like it well enough. There are some genuinely smart features in there, and on a slower development cycle, it's easy enough to hand-fix any document bugs that come up.
Unfortunately, in the context of the NPR liveblog system, which deploys updated content on a constant loop, the original ArchieML had some weaknesses that weren't immediately obvious. For example, its system for marking up multi-line strings — signalling them with an :end token — proved fragile in the face of reporters and editors who were typing as fast as they could into a shared document. ArchieML's key-value syntax is identical to common journalistic structures like Sanders: 1,000, which would accidentally turn what the reporter thought was an itemized list into unexpected new data fields and an empty post body. I was spending a lot of time writing document pre-processors using regular expressions to try to catch errors at the input level, instead of processing them at the data level, where it would make sense.
To fix these errors, I wanted to introduce a more explicit multi-line string syntax, as well as offer hooks for input validation and transformation (for example, a way to convert the default string values into native types during parsing). My original impulse was to patch the module offered by the Times to add these features, but it turned out to be more difficult than I'd thought:
Okay, I thought, how hard can it be to write my own parser? I was a fool. Four days later, I emerged from a trance state with Betty, which manages to pass all the original tests in the repo as well as some of my own for my new syntax. I'm also much more confident in our ability to maintain and patch Betty over time (the ArchieML module on NPM hasn't been updated since 2016).
Betty (who Wikipedia tells me was the mechanic in the comics, appropriately enough) is about twice as large as the original parser was. That size comes from the additional structure in its design: instead of a single pass through the text, Betty builds the final output from three escalating passes.
Essentially, Betty trades concision for clarity: during debugging, it was handy to be able to look at the intermediate outputs of each stage to see where something went wrong. Each pipeline section is also much more readable, since it only needs to be concerned with one stage of the process, so it uses less global state and does less bookkeeping. The parser, for example, doesn't need to worry about the current object scope or array types, but can simply defer those to the assembler.
If you'd told me a few years ago that I'd be writing something this complicated, I would have been extremely surprised. I don't have a formal CS background, nor did I ever want one. Parsing is often seen as black magic by self-taught developers. But as I have argued in the past, being able to write even simple parsers is an incredibly valuable skill for data journalism, where odd or proprietary data formats are not uncommon. I hope Betty will not just be a useful library for my work at NPR, but also a valuable teaching tool in the community.
Starting last January, I tracked every book I read in a spreadsheet, including author information, length, genre, and whether or not I'd read it before. Several of these are inexact measures: page count is of course largely meaningless when most of my books were on a Kindle, and assigning a single genre to a book is often reductionist. But I was curious how it would go.
All told, I read 138 books this year, totalling 51,432 pages. That seems like a lot, but you have to remember: I read a lot of crap — disposable science fiction and mystery novels make up a lot of my media diet. In fact, 54.4% of the titles on my list are some kind of speculative fiction, followed in frequency by non-fiction (19.6%), literary fiction (10.1%), and mystery (4.4%).
I could have done better when it comes to reading authors from different backgrounds. Although 72.5% of the books I read were by women, only 31.9% were by people of color. Combining the two is more dire: women of color made up 24.6% of the authors I read, slightly more than white men (20.3%) but behind white women (47.8%). Men of color were not well-represented in my reading, at 7.3%. And looking at the specific backgrounds, there's relatively few black authors of either gender in the sheet.
One hundred books is enough for them to blur together a bit, especially when — as I said — most of them are pretty pulpy. Many of my favorites have been well-lauded: Celeste Ng's Little Fires Everywhere or Jia Tolentino's Trick Mirror, but there are a few that I haven't seen recognized elsewhere.
Laurie J. Marks' Elemental Logic series has been in progress for most of two decades (starting with Fire Logic in 2002, proceeding through Earth Logic and Water Logic, and wrapping up this year with Air Logic). The long gestation might explain why these phenomenal books have flown under the radar. Where most fantasy yarns peak in a big fight that solves everything, the Logic books are preoccupied with the fallout of wartime and occupation, the trauma it leaves, and the slow and difficult process of recovery. They argue that there's no easy solution, just a lot of painstaking work. Even so, they're full of life, and not as grim as I make them sound. I can't believe these aren't better known.
W.E.B. Du Bois's Data Portraits was my greatest source of professional inspiration this year: comprised of visualizations that he assembled for the 1900 Paris Exposition, it's a rich collection of data storytelling from a time before the field had much in the way of guidance or conventions. Du Bois hoped to show the fullness of black lives in America, as well as the oppression they faced. I love the unorthodox choices he made when displaying outliers or broad data ranges, which you don't see very much in a time when we've largely automated visualization.
Esme Wang's The Collected Schizophrenias made me more uncomfortable than almost anything I read in 2019 — it's not as simple as humanizing or excusing mental illness, or even explaining it. Wang writes frankly about the horrors of being institutionalized, but also the horrors of needing to be for her own safety, and the safety of the people around her. This is not a book with neat answers for anyone.
How to be an Antiracist is the other book that I think about regularly: part memoir, part history, and part theory, Ibram X. Kendi wrote a book that thinks deeply about racism and what it means to actively fight it — to be "antiracist," not just "not racist." Given the failures of American newsrooms to deal responsibly with coverage of race and class, in large part because they don't understand the "antiracist" vs. "not racist" distinction and err toward the latter, I'd recommend it to any reporter or editor.
Finally, Arkady Martine's A Memory Called Empire was one of the last titles I read this year, but I don't think my high opinion is just recency bias at work: it turns out that "diplomatic murder mysteries set in byzantine empires" is exactly my jam. Memory reminded me a lot of Ann Leckie's Ancillary books and the way they re-examined space opera from the point of view of the bureaucracy. In this case, it's the story of a new ambassador whose predecessor died from intrigue-related complications. There's a sequel on the way, apparently, which I'm very much anticipating.
Now that I've measured a year of books, I think next year I'll put the spreadsheet away — or choose a different subject, like cinema. I don't think this changed my habits substantially, but I could feel the temptation to read more (or read differently) in order to add more rows to the list. In 2020, I'm going to be spending a lot of time in spreadsheets for work, so I'd rather keep my virtual bookshelf separate.
Earlier this week, a member of the Google developer relations team ported Caret to the web. He's actually the second person from Chrome to do this — a member of the browser team created a separate port last month. The reasons for this are simple: Caret is a complete application with a relatively small API surface, most of which revolves around file I/O. Chrome has recently rolled out trial support for the Native Filesystem API, which lets web apps open and edit local files. So it's an ideal test case.
I want to be clear, Google's not doing anything wrong here. Caret is licensed under the GPL, which means pretty much anyone can take it and do whatever they want, as long as they give me credit for the code I wrote and distribute the source, both of which are happening here. They haven't been rude about it (Ben, the earlier developer, very kindly reached out to me first), and even if they were, I couldn't stop it. I intentionally made that decision early on with Caret, because I believe giving the code away for something as fundamental as a text editor is the right thing to do.
That said, my feelings about these ports are extremely mixed.
On the one hand, after a half-decade of semi-active development, Caret has found a nice audience among students and amateur hackers. If it's possible to expand that audience — to use Google's market power to give more students, and more amateurs, the tools to realize their own goals — that's an exciting possibility.
But let's be clear: the reason why a port is necessary is because Google has been slowly removing support for Chrome Apps like Caret from their browser, in favor of active development on progressive web apps. After building on their platform and watching them strip support away from my users on Windows and OS X, with the clear intention of eventually removing it from Chrome OS after its Android support is advanced enough, I'm not particularly thrilled about the idea of using it to push PR for new APIs in Chrome (no other browsers have announced support for Native Filesystem).
People have ported Caret before. But it feels very different when it's a random person who wants to add a particular feature, versus a giant tech corporation with a tremendous amount of power and influence. If Google wants to become the new "owner" of Caret, they're perfectly capable of it. And there's nothing I can do to stop them. Whether they're going to do this or not (I'm pretty sure they won't) doesn't stop my heart from skipping a beat when I think about it. The power gradient here is unsettling.
Lately, a group of journalism students at Northwestern University here in Illinois came under fire for an apology for and retraction of their coverage of protests against former Attorney General Jeff Sessions. This includes the usual suspects, like Bari Weiss, the NYT columnist who regularly publishes columns in the biggest paper in the world about how she's being silenced by critics, but also a number of legitimate journalists concerned about self-censorship. But the editorial itself is quite clear on why they took this step, including one telling paragraph:
We also wanted to explain our choice to remove the name of a protester initially quoted in our article on the protest. Any information The Daily provides about the protest can be used against the participating students — while some universities grant amnesty to student protesters, Northwestern does not. We did not want to play a role in any disciplinary action that could be taken by the University. Some students have also faced threats for being sources in articles published by other outlets. When the source in our article requested their name be removed, we chose to respect the student’s concerns for their privacy and safety. As a campus newspaper covering a student body that can be very easily and directly hurt by the University, we must operate differently than a professional publication in these circumstances.
You may disagree with the idea that journalists should take down or adjust coverage of public events and persons, but it is legitimately more complicated than just "liberal snowflakes bowing to public pressure." No-one is debating that the reporters can take pictures of public protests, or publish the names of those involved. But should they? Likewise, when a newsroom's community is upset about coverage, editors can ignore the outcry, or respond with scorn. It shouldn't be surprising that certain audiences turn away or become distrustful of a paper that does so.
The relationships in this situation, as with various ports of Caret, are complicated by power. In both cases, what would be permissible or normal in one context is changed by the power differential of the parties involved, whether that's students to the paper to the university, or me to Google, or data journalists to the people in their FOIA requests, or tech workers to their employers' government contracts.
Most newsrooms don't think very much about power, in my experience, or they think of it as something they're supposed to check, not something they possess. But we need to take responsibility for our own power. It's possible that the students at the Daily Northwestern overreacted — if you protest in public, you should probably expect that pictures are going to be taken — but they're at least engaging with the question of what to do with the power they wield (directly and, in the case of the university's discipline system, indirectly). Using power in ways that have a real chance of harming your readers, just on principle and the idea that "that's what journalists do," is tautocracy at work.
As much as anything, I think this is one of the key generational shifts taking place in both software and journalism. My own sympathies tend toward a vision of both that prioritizes harm reduction over abstractions like "free speech" or "intellectual property," but I don't have any pat answers. Similarly, I've become acclimated to the idea of a web-based Caret port that's out of my hands, because I think the benefits to users outweigh the frustration I feel personally. I can't do anything about it now. But I will definitely learn from this experience, and it will change how I plan future projects.
When I was a kid in Lexington, Kentucky, I remember that grocery stores would have a little video rental section at the front of the store, just a few shelves stocked with VHS tapes. I used to be fascinated by the horror movies: when my parents were checking out, I would often walk over and look at the box art, which had its own special, lurid appeal. It was the age of golden plasticky, rubbery practical effects. I could have stared at the cover for Ghoulies for hours, wondering what the movie inside was like.
This year, for the first time, I decided to celebrate Shocktober: watching a horror movie for every day in the month before Halloween. In particular, I tried to watch a lot of the movies my 7-year-old self would have wanted to see. It turns out that these were not generally very good! My full list is below, with the standouts in bold.
One thing that becomes obvious very quickly is how inconsistent the horror genre is: not only is it extremely prone to fashion, but also to drought. The mid-to-late 80s had a lot of real stinkers — either "comedy" horror like House, nonsense slashers like My Bloody Valentine, or just mistakes (Children of the Corn, which is amateurish on almost every level). I suspect this parallels a lot of the CG goofball period of the late 2000s (Darkness Falls Hollow Man, They).
On the other hand, there are some real classics in there. Black Christmas predates Halloween by four years, and not only probably inspired it but is also a much better movie: more interesting characters, better sense of place, and a wild Pelham 123-style investigation. Candyman and Hellraiser are both fascinating, complicated movies packed with indelible imagery. And Halloween 3 manages to feel like a companion piece to They Live, trading all connection to the mainline series for a bizarre riff on media paranoia.
Somewhere in the middle is Chopping Mall, a movie that's somehow so terrible, so perfectly 1986, that it becomes compulsively watchable. Its effects are bad, the characters are thinly drawn and largely there for gratuitous nudity, and its marketing materials wildly overpromise what it will deliver. It's perfect, I love it, and I name it the official movie of Shocktober 2019.
The past few months, I've mostly been writing in public for NPR's News Apps team blog, with posts on the new Dailygraphics rig (and setting it up on Windows), the Mueller report redactions, and building a scrolling audio story. However, in my personal time, I decided to listen to some podcasts. So naturally, I built a web-based listener app, just for me.
I had a few goals for Radio as I was building it. The first was my own personal use case, in which I wanted to track and listen to a few podcasts, but not actually install a dedicated player on my phone. A web app makes perfect sense for this kind of ephemeral use case, especially since I'm not ever really offline anymore. I also wanted to try building something entirely using Web Components instead of a UI framework, and to use modern features like import — in part because I wanted to see if I could recommend it as a standard workflow for younger developers, and for internal newsroom tools.
Not everything was as smooth. I'm on record for years as a huge fan of Web Components, particularly custom elements. And for an application of this size, it was a reasonably good experience. I wrote a base class that automated some of the rough edges, like templating and synchronizing element attributes and properties. But for anything bigger or more complex, there are some cases where the platform feels lacking — or even sometimes actively hostile.
For example: in the modern, V1 spec for custom elements, they're not allowed to manipulate their own contents until they've been placed in the page. If you want to skip the extra bookkeeping that would require, you are allowed to create a shadow root in the constructor and put whatever HTML you want inside. It feels very much like this is the workflow you're supposed to use. But shadow DOM is harder to inspect at the moment (browser tools tend to leave it collapsed when inspecting the page), and it causes problems with events (which do not cross the shadow DOM boundary unless you alter your dispatch code).
There's also no equivalent in Web Components for the state management that's core to most modern frameworks. If you want to pass information down to child components from the parent, it either needs to be set through attributes (meaning you only get strings) or properties (more bookkeeping during the render step). I suspect if I were building something larger than this simple list-of-lists, I'd want to add something like Redux to manage data. This would tie my components to that framework, but it would substantially clean up the code.
Ironically, the biggest hassle in the development process was not from a new browser feature, but from a very old one: while it's very easy to create an audio tag and set its source to any sound clip on the web, actually getting the list of audio files is often impossible, thanks to CORS. Most podcasts do not publish their episode feeds with the cross-origin header set, so the browser's security settings shut down the AJAX requests completely. It's wild that in 2019, there's still no good way to make a secure request (say, one that transmits no cookies or custom headers) to another domain. I ended up running the final app on Glitch, which provides basic Node hosting, so that I could deploy a simple proxy for feed data.
For me, the neat thing about this project was how it brought back the feeling of hackability on the web, something I haven't really felt since I first built Caret years ago. It's so easy to get something spun up this way, and that's a huge incentive for creating little personal apps. I love being able to make an ugly little app for myself in only a few hours, instead of needing to evaluate between a bunch of native apps run by people I don't entirely trust. And I really appreciated the ways that Glitch made that easy to do, and emphasized that in its design. It helps that podcasting, so far, is still a platform built on open web tech: XML and MP3. More of this, please!
A proposal for responsible and ethical publication of personally-identifiable information in data journalism
Thanks to Helga Salinas, Kazi Awal, and Audrey Carlsen for their feedback.
Over the last decade, one of the goals of data journalism has been to increase accountability and transparency through the release of raw data. Admonitions of "show your work" have become common enough that academics judge our work by the datasets we link to. These goals were admirable, and (in the context of legitimizing data teams within legacy organizations) even necessary at the time. But in an age of 8chan, Gamergate, and the rise of violent white nationalism, it may be time to add nuance to our approach.
This document is concerned primarily with the publication of personal data (also known as personally-identifiable information, or PII). In other words, we're talking about names, addresses or contact info, lat/long coordinates and other geodata, ID numbers (including license plates or other government ID), and other data points that can be traced back to a single individual. Much of this is available already under the public record, but that's no excuse: as the NYT Editorial Board wrote in 2018, "just because information is public doesn't mean it has to be so easy for so many people to get." It is irresponsible to amplify information without thinking about what we're amplifying and why.
Moreover, this is not a theoretical discussion: many newsroom projects start with large-scale FOIA dumps or public databases, which may include exactly this personal data. There have been movements in recent years to monetize these databases--creating a queryable database of government salaries, for example, and offering it via a subscription. Even random public records requests may disclose personal data. Intentionally or not, we're swimming in this stuff, and have become jaded as to its prevalence. I simply ask: is it right for us to simply push it out, without re-examining the implications of doing so?
I would stress that I'm not the only person who has thought about these things, and there are a few signs that we as an industry are beginning to formalize our thought process in the same way that we have standards around traditional reporting:
In her landmark 2015 book The Internet of Garbage, Sarah Jeong sets aside an entire chapter just for harassment. And with good reason: the Internet has enabled new innovations for old prejudices, including SWATting, doxing, and targeted threats at a new kind of scale. Writing about Gamergate, she notes that the action of its instigator, Eron Gjoni, "was both complicated and simple, old and new. He had managed to crowdsource domestic abuse."
I choose to talk about harassment here because I think it provides an easy touchstone for the potential dangers of publishing personal information. Since Latanya Sweeney's initial work on de-anonymizing data, an entire industry has grown up around taking disparate pieces of information, both public and private, and matching them against each other to create alarmingly-detailed profiles of individual people. It's the foundation of the business model for Facebook, as well as a broad swathe of other technology companies. This information includes your location over time. And it's available for purchase, relatively cheaply, by anyone who wants to target you or your family. Should we contribute, even in a minor way, to that ecosystem?
These may seem like distant or abstract risks, but that may be because for many of us, this harassment is more distant or abstract than it is for others. A survey of "news nerds" in 2017 found that more than half are male, and three-quarters are white (a demographic that includes myself). As a result of this background, many newsrooms have a serious blind spot when it comes to understanding how their work may be seen (or used against) underrepresented populations.
As numerous examples have shown, we are very bad as an industry at thinking about how our power to amplify and focus attention is used. Even if harassment is not the ultimate result, publishing personal data may be seen by our audience as creepy or intrusive. At a time when we are concerned with trust in media, and when that trust is under attack from the top levels of government, perhaps we should be more careful in what data we publish, and how.
Finally, I think it is useful to consider our twin relationship to power and shame. Although we don't often think of it this way, the latter is often a powerful tool in our investigative reporting. After all, as the fourth estate, we do not have the power to prosecute or create legislation. What we can do is highlight the contrast between the world as we want it to be and as it actually is, and that gulf is expressed through shame.
The difference between tabloid reporting and "legitimate"journalism is the direction that shame is directed. The latter targets its shame toward the powerful, while the former is as likely to shame the powerless. In terms of accountability, it orients our power against the system, not toward individual people. It's the difference between reporting on welfare recipients buying marijuana, as opposed to looking at how marijuana licensing perpetuates historical inequalities from the drug war.
Our audiences may not consciously understand the role that shame plays in our journalism, but they know it's a part of the work. They know we don't do investigations in order to hand out compliments and community service awards. When we choose to put the names of individuals next to our reporting, we may be doing it for a variety of good reasons (perhaps we worked hard for that data, or sued to get it) but we should be aware that it is often seen as an implication of guilt on the part of the people within.
I want to be very clear that I am only talking about the public release of data in this document. I am not arguing that we should not submit FOIA or public records requests for personal data, or that it can't be useful for reporting. I'm also not arguing that we should not distribute this data at all, in aggregated form, on request, or through inter-organizational channels. It is important for us to show our work, and to provide transparency. I'm simply arguing that we don't always need to release raw data containing personal information directly to the public.
In the spirit of Maciej Ceglowski's Haunted by Data, I'd like to propose we think of personal data in three escalating levels of caution:
When creating our own datasets, it may be best to avoid personal data in the first place. Remember, you don't have to think about the implications of the GDPR or data leaks if you never have that information. When designing forms for story call-outs, try to find ways to automatically aggregate or avoid collecting information that you're not going to use during reporting anyway.
If you have the raw data, don't just throw it out into the public eye because you can. In general, we don't work with raw data for reporting anyway: we work with aggregates or subsets, because that's where the best stories live. What's the difference in policy effects between population groups? What department has the widest salary range in a city government? Where did a disaster cause the most damage? Releasing data in an aggregate form still allows end-users to check your work or perform follow-ups. And you can make the full dataset available if people reach out to you specifically over e-mail or secure channels (but you'll be surprised how few actually do).
In cases where distributing individual rows of data is something you're committed to doing, consider ways to protect the people inside the data by anonymizing it, without removing its potential usefulness. For example, one approach that I love from ProPublica Illinois' parking ticket data is the use of one-way hash functions to create consistent (but anonymous) identifiers from license plates: the input always creates the same output, so you can still aggregate by a particular car, but you can't turn that random-looking string of numbers and letters back into an actual license plate. As opposed to "cooking" the data, we can think of this as "seasoning" it, much as we would "salt" a hash function. A similar approach was used in the infosec community in 2016 to identify and confirm sexual abusers in public without actually posting their names (and thus opening the victims up to retaliation).
Once upon a time, this industry thought of computer-assisted reporting as a new kind of neutral standard: "precision" or "scientific" journalism. Yet as Catherine D'Ignazio and Lauren Klein point out in Data Feminism, CAR is not neutral, and neither is the way that the underlying data is collected, visualized, and distributed. Instead, like all journalism, it is affected by concerns of race, gender, sexual identity, class, and justice.
It's my hope that this proposal can be a small step to raise the profile of these questions, particularly in legacy newsrooms and journalism schools. In working on several projects at The Seattle Times and NPR, I was surprised to find that although there are guidelines on how to ethically source and process data, it was difficult to find formal advice on ethical publishing of that same data. Other journalists have certainly dealt with this, and yet there are relatively few documents that lay out concrete guidelines on the matter. We can, and should, change that.
At the end of this month, in keeping with the horrifying march of time, The Matrix turns 20 years old. It's hard to overstate how mind-blowing it was for me, a high-schooler at the time, when the Wachowski sisters' now-classic marched into theaters: combining entirely new effects techniques with Hong Kong wire-work martial arts, it's still a stylish and mesmerizing tour de force.
The sequels... are not. Indeed, little of the Wachowski's post-Matrix output has been great, although there's certainly a die-hard contingent that argues for Speed Racer and Sense8. But in rewatching them this month, I've been struck by the ways that Reloaded and Revolutions almost feel like the work of entirely different filmmakers, ones who have thrown away one of their most powerful storytelling tools. By that, I mean the fight scenes.
The Matrix has a few set-piece fight scenes, and they're not all golden. The lobby gunfight, for example, doesn't hold up nearly as well on rewatch. But at their best, the movie's action segments deftly thread a needle between "cool to watch" and "actively communicating plot." Take, for example, the opening chase between Trinity, some hapless cops, and a pair of agents:
In a few minutes, we learn that A) Trinity is unbelievably dangerous, and B) however competent she is, she's utterly terrified by the agents. We also start to see hints of their character: one side engaged in agile, skilled hit-and-run tactics, while the authorities bully through on raw power. And we get the sense that while there are powers at work here, it's not the domain of magic spells. Instead, Trinity's escape bends the laws of time and space — in a real way, to be able to manipulate the Matrix is to be able to control the camera itself.
But speaking of rules that are can be bent or broken, we soon get to the famous dojo training sequence:
I love the over-the-top kung fu poses that start each exchange, since they're such a neat little way of expressing Neo's distinct emotional progress through the scene: nervousness, overconfidence, determination, fear, self-doubt, and finally awareness. Fishburne absolutely sells his lines ("You think that's air you're breathing now...?"), but the dialog itself is almost superfluous.
The trash-as-tumbleweed is a nice touch to start the last big brawl of the movie, as is the Terminator-esque destruction of Smith's sunglasses. But pay close attention to the specific choreography here: Smith's movements are, again, all power and no technique. During the fight, he hardly even blocks, and there aren't any fancy flips or kicks. But halfway through, after the first big knock-down, Neo starts to use the agent's own attack routines against him, while adding his own improvisations and style at the end of each sequence. One of these characters is dynamic and flexible, and one of them is... well, a machine. We're starting to see the way that the ending will unfold, right here.
What do all these fight scenes have in common? Why are they so good? Well, in part, they're about creating a readable narrative for each character in the shot, driving their action based on the emotional needs of a few distinct participants. Yuen Woo Ping is a master at this — it's practically the defining feature of Crouching Tiger, Hidden Dragon, on which he did fight direction a year later and in which almost every scene combines character and action almost seamlessly. Tom Breihan compares it to the role that song-and-dance numbers play in a musical in his History of Violence series, and he's absolutely right. Even without subtitles or knowledge of Mandarin, this scene is beautifully eloquent:
By contrast, three years later, The Matrix Reloaded made its centerpiece the so-called "burly brawl," in which a hundred Agent Smiths swarm Neo in an empty lot:
The tech wasn't there for the fight the Wachowskis wanted to show — digital Keanu is plasticky and weirdly out-of-proportion, while Hugo Weaving's dopplegangers only get a couple of expressions — but even if they had modern, Marvel-era rendering, this still wouldn't be a satisfying scene. With so many ambiguous opponents, we're unable to learn anything about Neo or Smith here. There's no mental growth or relationship between two people — just more disposable mooks to get punched. "More" is not a character beat. But for this movie and for Revolutions, the Wachowskis seemed to be convinced that it was.
At the end of the day, none of that makes the first movie any less impressive. It's just a shame that for all the work that went into imitating bullet time or tinting things green, almost nobody ripped off the low-tech narrative choices that The Matrix made. Yuen Woo Ping went back to Hong Kong, and Hollywood pivoted to The Fast and the Furious a few years later.
But not to end on a completely down note, there is one person who I think actually got it, and that's Keanu Reeves himself. The John Wick movies certainly have glimmers of it, even if the fashion has swung from wuxia to MMA. And Reeves' directorial debut, Man of Tai Chi is practically an homage to the physicality of the movie that made him an action star. If there is, in fact, a plan to reboot The Matrix as a new franchise, I legitimately think they should put Neo himself in the director's chair. It might be the best way to capture that magic one more time.
Sunshine wasn't particularly loved when it was released in 2007, despite a packed cast and direction by Danny Boyle. In the years since, it has somehow stubbornly avoided cult status — before its time, maybe, or just too odd, as it swings wildly between hard sci-fi, psychological drama, survival horror, and eventually straight-up slasher flick by way of Apocalypse Now. But it's intensely watchable and, I would argue, underappreciated, especially in comparison to writer Alex Garland's follow-up attempts on the same themes.
"Our sun is dying," Cillian Murphy mutters at the start of the film, and the tone remains pretty grim from there. The spaceship Icarus II is sent on a desparate trip to restart the sun by tossing a giant cubic nuclear bomb into it — a desparate quest, made all the more desparate by the fact that nobody on the mission seems particularly stable or well-suited to the job. Boyle sketches out each crew member quickly but adeptly, giving each one a well-defined (if sometimes precious) persona, like the neurotic psychologist, the hot-tempered engineer, or the botanist who cares more for her oxygen-producing plants than the people onboard (or, viewers suspect, the mission itself). NASA would never put these people in a small space for more than a day, but they're a marvel of small-scale human conflict almost from the very start.
That approach to character is emblematic of Sunshine's construction, which is really less of a plot and more of a set of simple machines rigged in opposition to each other. An early miscalculation in the position of the ship's sun shield leads to a series of cascading crises, each of which provides both physical challenge as well as ratcheting tension among the crew from dwindling resources. Yet there's only one real plot twist in the whole thing: the murderous captain Pinbeck of Icarus I, driven mad by his own journey toward the sun. Everything else is established clearly and methodically, with ample recall and signposting — it's the rare science fiction movie that doesn't cheat. Even Pinbeck's slasher-esque rampage shows up in clues for savvy viewers, who can clock a missing scalpel and scattered bloody handprints on rewatch.
Similar to an obvious inspiration (and personal favorite), Alien, one of the film's greatest special effects is the cast. Boyle gets a lot of mileage out of Cillian Murphy's After Effects-blue eyes, but you can't go wrong with Chris Evans, Michelle Yeoh, Benedict Wong, and Rose Byrne. Still, for my money, Cliff Curtis is the film's MVP: as the doctor/psychologist Searle, he's both bomb-thrower and mediator in equal measures. His obsession with the sun leaves him visibly burned, like a Dorian Gray painting of the crew's mental health. And yet, unlike Pinbeck (who he clearly parallels), Curtis manages to keep his perspective straight and a wry sense of humor — he may love the light, but he's not blinded by it.
So why isn't Sunshine canonized, especially in a climate-change world where "our sun is dying" passes for optimism? Why is it considered a misfire, when Garland's flawed Annihilation was seen as a cult hit in the making? It's still not clear to me. Maybe it just got lost in the shuffle: 2007 was a good year for movies, including There Will Be Blood for the serious film aficianados and The Bourne Ultimatum or Death Proof for surprisingly well-crafted genre fans. Or maybe it's also just too close to its nearest relatives: too easy to write off as "Event Horizon without the schlocky fun" or "Solaris, but for stupid people." Either way, it feels overdue for reconsideration.
This post was originally written as a lightning talk for SRCCON:Power. And then I looked at the schedule, and realized they weren't hosting lightning talks, but I'd already written it and I like it. So here it is.
I want to talk to you today about election results and power.
In the last ten years, I've helped cover the results for three newsrooms at very different scales: CQ (high-profile subscribers), Seattle Times (local), and NPR (shout out to Miles and Aly). I say this not because I'm trying to show off or claim some kind of authority. I'm saying it because it means I'm culpable. I have sinned, and I will sin again, may God have mercy on my soul.
I used to enjoy elections a lot more. These days, I don't really look forward to them as a journalist. This is partly because the novelty has worn off. It's partly because I am now old, and 3am is way past my bedtime. But it is also in no small part because I'm really uncomfortable with the work itself.
Just before the midterms this year, Tom Scocca wrote a piece about the rise of tautocracy — meaning, rule by mulish adherence to the rules. Government for its own sake, not for a higher purpose. When a judge in Nebraska rules that disenfranchising Native American voters is clearly illegal, but will be permitted under regulations forbidding last-minute election changes — even though the purpose of that regulation is literally to prevent voter disenfranchisement — that's tautocracy. Having an easy election is more important than a fair one.
For those of you who have worked in diversity and inclusion, this may feel a little like the "civility" debate. That's not a coincidence.
I am concerned that when we cover elections with results pages and breaking alerts, we're more interested in the rules than we are in the intended purpose. It reduces the election to the barest essence — the score, like a football game — divorced from context or meaning. And we spend a tremendous amount of redundant resources across the industry trying to get those scores faster or flashier. We've actually optimized for tautocracy, because that's what we can measure, and you always optimize for your metrics.
But as the old saying goes, elections have consequences. Post-2016, even the most privileged and most jaded of us have to look around at a rising tide of white nationalism and ask, did we do anything to stop this? Worse, did we help? That's an uncomfortable question, particularly for those of us who have long believed (incorrectly, in my opinion) that "we just report the news."
Take another topic, one that you will be able to sell more easily to your mostly white, mostly male senior editors when you get back: Every story you run these days is a climate change story. Immigration, finance, business, politics both internal and domestic, health, weather: climate isn't just going to kill us all, it also affects nearly everything we report on. It's not just for the science stories in the B section anymore. Every beat is now the climate beat.
Where was climate in our election dashboard? Did anyone do a "balance of climate?"
Isn't that an election result?
What would it look like if we took the tremendous amount of duplicated effort spent on individual results pages, distributed across data teams and lonely coders around the country, and spent it on those kinds of questions instead?
The nice thing about a lightning talk is that I don't have time to give you any answers. Which is good, because I'm not smart enough to have any. All I know is that the way we're doing it isn't good enough. Let's do better.
[SPARSE, SKEPTICAL APPLAUSE]