Posted by: structureofnews | January 5, 2017

Seeing Patterns

What are machines good at, what are people good at, and how can we get the most out of pairing the best of both worlds?

It’s not like I haven’t written about this topic before – not least about not trying to make machines be poor copies of humans – but it’s just that the list of things machines are good at keeps getting longer.  (And so maybe we should be thinking of how to make them better copies of humans, and worry about what jobs will go away – but that’s the subject for another post.)

Exhibit A is how-machines-are-getting-better-at-more-things is an excellent NYT Magazine piece from a couple of weeks ago by Gideon Lewis-Kraus, entitled The Great A.I. Awakening. If you haven’t read it yet, you should stop here and read it.  It’s very good.

At a basic level, it’s the story of how Google used neural networks to push the quality of Google Translate to an astonishingly good level, and in a very short period of time. But the broader story is about how neural networks – essentially, systems for recognizing patterns – have come into their own, and are powering machines to do a host of things never before thought possible: High-quality translations, image recognition, and so on.

And if they can do all those things well – imagine what could they do for journalism.

Just look at how good Google Translate has become with the help of the neural network built by a team called Google Brain.  First, a passage from Ernest Hemingway’s “The Snows of Kilimanjaro,” translated to Japanese and then back to English via Google Translate, pre-neural networks:

Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.

And now after neural networks:

Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” in Masai, the house of God. Near the top of the west there is a dry and frozen dead body of leopard. No one has ever explained what leopard wanted at that altitude.

That’s pretty damn good.

And as Gideon notes, it’s not simply about translation:

Once you’ve built a robust pattern-matching apparatus for one purpose, it can be tweaked in the service of others. One Translate engineer took a network he put together to judge artwork and used it to drive an autonomous radio-controlled car. A network built to recognize a cat can be turned around and trained on CT scans — and on infinitely more examples than even the best doctor could ever review. A neural network built to translate could work through millions of pages of documents of legal discovery in the tiniest fraction of the time it would take the most expensively credentialed lawyer. The kinds of jobs taken by automatons will no longer be just repetitive tasks that were once — unfairly, it ought to be emphasized — associated with the supposed lower intelligence of the uneducated classes. We’re not only talking about three and a half million truck drivers who may soon lack careers. We’re talking about inventory managers, economists, financial advisers, real estate agents. What Brain did over nine months is just one example of how quickly a small group at a large company can automate a task nobody ever would have associated with machines.

And worrying about those jobs, and those people who will be automated out of them, is an important, pressing issue.  But so too is thinking about how best to harness all this new-found capability in the service of journalism (assuming, of course, your newsroom has access to some smart computer scientists and a ton of data….)

Much of journalism – or at least, good, unobvious journalism – is about being able to recognize patterns.  Why is that obscure lawyer in the Cayman Islands on the board of all these offshore entities?  Why do company CEOs seem to get stock options at the stock’s lowest point in the year?  How do campaign contributions relate to politicians’ positions?

To be sure, these aren’t the sorts of things a computer can automatically trawl for – at least not at the moment.  But there is lots of data that it can hunt through, regularly and tirelessly, and present to a reporter as tips and hints, even with the reams of data and teams of computer science PhDs.

Financial data is probably the easiest place to start – has a company’s stock hit any kind of milestone, have a certain percentage of insiders sold stock in the last two weeks, how many analysts have upgraded or downgraded the stock in the last month, and so on.  Basic information that’s already available, but a pain to check.

Well, we can automate that.

As I noted a little while back,

Shouldn’t we be building systems that do that automatically in newsrooms?  Before a reporter writes a market report, shouldn’t an algorithm be checking that day’s close against a database of market data, alerting him or her to new records, 52-week highs, and so on?  Similarly, shouldn’t smart algorithms be trawling through databases and regularly throwing up insights for beat reporters to follow up on – or dismiss?

Meanwhile, a lot of the focus on automation in newsrooms is about trying to generate stories.  There’s real value there, of course, not least in being able to create news-on-demand and personalized news.  But trying to get a machine to write a coherent, readable – and complete – narrative can be a pretty daunting task.  (Which is all the more reason to admire the work of Narrative Science, Automated Insights and others for the automated stories they turn out.  Reuters doesn’t do so badly on this front, either.  Just saying.)

But isn’t the greater value – at least today, and perhaps as an intermediate step – in using machines to surface nuggets of information, harnessing computers’ ability to churn through reams of data at speed, rather than in turning out reasonable facsimiles of human writing?

Without access to neural networks, of course, newsrooms can only do so much in terms of harnessing machines to do pattern recognition.  But there’s already value that can be gained in relatively simple searches – the kind of thing a simple spreadsheet or program can do.  And if neural networks get more democratized in the future, that will simply add to the complexity of the searches available to reporters.

There are some kinds of searches that machines aren’t good at, of course.  What they do best at is in looking for regular, somewhat predictable patterns in large data sets – identifying pictures of cats, or tumors.  They do less well in small data sets or in asking brand new questions about the data.

But that’s what we have humans for.

And hopefully that keeps reporters employed and journalism vibrant. Until the machines take over.



  1. […] It’s been a really exciting road building Lynx Insight – a tool that marries the best of what machines do (automated data analysis, pattern recognition, simple sentence and language generation) with what […]

  2. […] we don’t want it to write whole stories.  What we want it to do is analyze data, because that’s what we think machines are good at.  That analysis gets turned into sentences, because that’s what humans are good at […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: