While we would all like for our journalistic evidence to be delivered to our doorsteps in nicely-formatted spreadsheets, more often than not that is not the case. Instead, information often comes as a large stash of (unstructured) documents. When these collections grow, reading through all of the documents stops being an option.
This workshop will discuss alternatives: what tools and technologies are available for the automated analysis of large document sets? How can you learn about the recurring topics of a document stash automatically? How can important concepts, people and companies be traced across the result of a leak? Level: intermediate.