Academic studies about Wikipedia

Production A minority of editors produce the majority of persistent content Studies from 2005 to 2007 found that a small minority of editors produce most of the edits on Wikipedia, and that the distribution of edits follows a power law with about half of the total edits produced by 1% of the editors. Another 2007 study found that 'elite' editors with many edits produced 30% of the content changes, measured in number of words. These editors were also more likely to add, rather than delete, content. A 2007 study from the University of Minnesota used reader-based measures that weighted content based on the number of times it was viewed (a persistent word view (PWV)). This study analyzed trillions of word views between September 2002 and October 2006 and concluded that 0.1% of the Wikipedia community (4,200 editors) produced 44% of the word views during this time. The editors concluded that, Analyzing the entire edit history of English Wikipedia up to July 2006, the same study determined that the influence of administrator edits on contents has steadily diminished since 2003, when administrators performed roughly 50% of total edits, to 2006 when only 10% of the edits were performed by administrators. This happened despite the fact that the average number of edits per administrator had increased more than fivefold during the same period. This phenomenon was labeled the "rise of the crowd" by the authors of the paper. An analysis that used as metric the number of words edited instead of the number of edit actions showed a similar pattern. Because the admin class is somewhat arbitrary with respect to the number of edits, the study also considered a breakdown of users in categories based on the number of edits performed. The results for "elite users", i.e. users with more than 10,000 edits, were somewhat in line with those obtained for administrators, except that "the number of words changed by elite users has kept up with the changes made by novice users, even though the number of edits made by novice users has grown proportionally faster". The study concludes: Reliability An Argumentation conference paper (2010) assessed whether trust in Wikipedia is based on epistemic or pragmatic merits. While readers may not assess the actual knowledge and expertise of the authors of a given article, they may assess the contributors' passion for the project, and communicative design through which that passion is made manifest, and provide a reason for trust. In details, the author argued that Wikipedia cannot be trusted based on individual expertise, collective knowledge, or past experience of reliability. This is because anonymity and pseudonymity prevent knowledge assessment, and "anti-expert culture" makes it unlikely that this will change. Editing Wikipedia may largely be confined to an elite group of editors, without aggregating "wisdom of the crowd" which in some cases lowers the quality of an article anyway. Personal experiences and empirical studies, confirmed by incidents including Seigenthaler biography controversy, point to the conclusion that Wikipedia is not generally reliable. Hence, these epistemic factors do not justify consulting with Wikipedia. The author then proposed rationale to trust Wikipedia based on pragmatic values, which roughly can be summarized into two factors. First, the size and activity around Wikipedia indicates that editors are deeply committed to provide the world with knowledge. Second, transparent developments of policies, practices, institutions, and technologies in addition to conspicuous massive efforts, address the possible concerns that one might have in trusting Wikipedia. The concerns raised include the definition of provided knowledge, preventing distorted contributions from people not sharing the same commitment, correcting editing damages, and article quality control and improvement. Health information Health information on English Wikipedia is popularly accessed as results from search engines and search engine result page, which frequently deliver links to Wikipedia articles. Independent assessments of the quality of health information provided on Wikipedia and of who is accessing the information have been undertaken. The number and demographics of people who seek health information on Wikipedia, the scope of health information on Wikipedia, and the quality of the information on Wikipedia have been studied. There are drawbacks to using Wikipedia as a source of health information. Bias A 2010 study has shown that the average contributor to the English Wikipedia is an educated, technically inclined white male, aged 15–49, from a developed, predominantly Christian country. This systemic bias in editor demographic results in cultural bias, gender bias, and lack of information about the Global South. There are two broad types of bias, which are implicit (when a topic is omitted) and explicit (when a certain POV is supported in an article or by references). Geographical bias Research conducted in 2009 by the Oxford Internet Institute showed that geotagged articles in all language editions of Wikipedia covered about half a million places on Earth. However, the geographic distribution of articles was highly uneven: most articles are written about North America, Europe, and East Asia, with very little coverage of large parts of the developing world, including most of Africa. Another 2009 study of 15 language editions determined that each edition was highly "self-focused", with emphasis on the geographic "home region" of that language. Gender bias The gender bias on Wikipedia has been widely discussed. The study found that biographies that do exist are considerably more likely to be nominated for deletion than existing articles of men. A 2021 paper recommended addressing a "sweet spot" within the encyclopedia's bias where existing scholarship includes reliable, peer-reviewed sources that offer a more complete POV than existing Wikipedia articles. The study suggested that incorporation of these sources would offer better representation for excluded or marginalized POVs, and that the possibilities for potential improvement are "massive." ==Natural language processing==