Take your computers out! Scraping is fun!

“Take your computers out! Come on, trust me! You are going to like it!”. This is how Marco Tulio Pires, co-founder of Jornalismo++ São Paulo, started his session on data journalism on 7 April during the tenth edition of the International Journalism Festival in Perugia, Italy.

Pires gave a crash course on easy point-and-click scraping for journalists who need help with HTML codes, scraping and are eager to learn some of the tools they can use in their everyday work.

“Scraping gives power back to journalists to collect data from sources that are otherwise unreachable”, said the director of the School of Data.

According to Pires, scraping is structure that journalists want and need, but do not always get. Instead of structured and clean data, media professionals usually receive messy data which makes it more difficult for them to extract information and find good and valuable stories hidden in the chaos.

“Scraping is transforming data that is unstructured, made for humans to data that is machine-readable, so that we can analyze, process and visualize”, explained Marco Pires and introduced four main tools to scrape information -Webinspector, Google Sheets, IFTTT and Web Scraper.

Pires concluded that “there is no best way to scrape a page” and that journalists need to approach from multiple angles. As the expert explained, there are certainly more powerful tools available out there which can be used to scrape more complicated data, such as data on PDF files and advised the journalists to refer to online tutorials on the topic.

By Stanislava Gaydazhieva