MIT Technology Review
Reuters is scooping its rivals using intelligent machines that mine Twitter for news stories.
“The advent of the internet and the subsequent information explosion has made it increasingly challenging for journalists to produce news accurately and swiftly.” So begin the research and development team at the global news agency Reuters in a paper on the arXiv this week.
For Reuters, the problem has been made more acute by the emergence of fake news as an important factor in distorting the perception of events.
Reuters’ new Reuters Tracer system can mostly automate the identification of breaking global news by sampling the Twitter data stream both randomly and from human-curated sources, and then applying a clustering algorithm to find multiple conversations that could signify news events.
In the next step, Tracer uses several algorithms to classify and prioritize events, including by location via a database of cities and keywords. Veracity is determined by tracing the source via the earliest tweet in the conversation mentioning the topic and any sites it points to, and then Tracer consults a database of known fake news producers; the final step is to compose a headline and summary, and circulate the news across Reuters.
Tracer processes 12 million tweets daily, feeding approximately 20 percent into about 6,000 clusters that are categorized as different types of news events. Tracer’s developers say it can cover about 70 percent of authentic news stories from 2 percent of Twitter data. Read the full report
DCL: These automated news mining techniques applied to social media were pioneered by the Canadian government 20 years ago. In that instance they were applied to local news articles, pharmacy sales, cell phone activity records, and other data sources. The Canadians were able to detect an outbreak of a bird flu epidemic in China that the local governments were attempting to hide. In every case what we have here is real time Complex Event Processing!