Their paper reveals that initially they designed a bot to scrape profile data, but that this first method was dropped because it was “a decidedly non-random approach to find users to scrape because it selected users that were suggested to the profile the bot was using.” This implies that the researchers created an Ok Cupid profile from which to access the data and run the scraping bot.Since Ok Cupid users have the option to restrict the visibility of their profiles to logged-in users only, it is likely the researchers collected—and subsequently released—profiles that were intended to be publicly viewable.is an all-too-familiar refrain used to gloss over thorny ethical concerns.The most important, and often least understood, concern is that even if someone knowingly shares a single piece of information, big data analysis can publicize and amplify it in a way the person never intended or agreed.The application of big data and analytics means big bucks for those in the business of love. From an academic perspective, the Smithsonian describes a whole new level of customer engagement in a blog post “How big data has changed dating.” In this article, Dan Slater, who wrote “Love in the Time of Algorithms: What Technology Does to Meeting and Mating,” describes how a preference could be inferred by analyzing someone’s online behavior. Amy Webb attributes her success in online dating and the inevitable meeting of her husband to her mathematical savvy.
Stats such as shorter women and taller men received more attention, and curvy women had a higher sex drive than slender women were backed with both data and approach.
The “already public” excuse was used in 2008, when Harvard researchers released the first wave of their “Tastes, Ties and Time” dataset comprising four years’ worth of complete Facebook profile data harvested from the accounts of cohort of 1,700 college students.
And it appeared again in 2010, when Pete Warden, a former Apple engineer, exploited a flaw in Facebook’s architecture to amass a database of names, fan pages, and lists of friends for 215 million public Facebook accounts, and announced plans to make his database of over 100 GB of user data publicly available for further academic research.
The “publicness” of social media activity is also used to explain why we should not be overly concerned that the Library of Congress intends to archive and make available all public Twitter activity.
In each of these cases, researchers hoped to advance our understanding of a phenomenon by making publicly available large datasets of user information they considered already in the public domain.