Catégorie : Stages.
Veracity assessment framework for discovering social activities in urban big dataset
Philipp Brandt, SciencesPo
Soror Sahri, Université Paris Cité, LIPADE, diNo
Motivation
Digital technologies provide access datasets that have been unfamiliar to social scientists, including behavioral traces (e.g., point of sales, geolocation data, social media scrapings, CCTV recordings), machine-readable texts, and code and data repositories. These secondary data sources produced without research goals in mind require new technical skills and computing capacities to manage their scale and content. A particular recent trend for social scientists is to understand the potential of big data in complementing traditional research methods and their value in making decisions. Several major issues have to be closely investigated around big data in social sciences, including political polarization, viral information diffusion, and economic performance. The veracity and value characteristics of big data are the main concerns for social scientists [1].
This master internship will focus on urban data, particularly the NYC taxi dataset, to develop technical procedures that help social scientists deal with this and similar urban datasets. Social scientists have used the NYC dataset in the past and yet left many dimensions unexplored. Most problematically, they have not yet provided a technology that allows for fast, flexible data access and a strategy for ensuring the quality of the data. Once such an infrastructure is in place, the NYC taxi dataset can lead to better understanding of core questions in the social sciences, such as economic decision-making and labor mobility, as well as a strategy for how social scientists can work with novel datasets.