Jimmy Wales is reminiscing over childhood encyclopedias of the physical kind.
The Wikipedia founder recalls that his first set of books was purchased by his mum from a door-to-door salesperson in 1960s Alabama. “It was a World Book Encyclopedia for Children. And they would send out an annual update every year – so the article on the moon, for instance, was rewritten after the moon landing. “And they would have these stickers for each update and you would go into the main body of the encyclopedia and stick the stickers in. And I used to help my mum do that. Which was me editing an encyclopedia when I was two.”
What happened to these books is the cause of some family debate now, he adds.
“My father thinks they’re in the storage shed while my mother thinks they were sold at a garage sale. I hope they find them; they might be worth something now…”
Wales’s heavily-used online version meanwhile – the world’s fifth most visited website – just keeps on expanding, achieving 274 billion page views a year (up 12% from the previous year) with 10 billion views from the UK alone.
Not bad for a site that, in its early days, relied on financial support from one of Wales’s other online content ventures – Bomis – which generated revenue by offering free and paid-for adult content, including erotic images of ‘Bomis Babes’.
Today, Wikipedia offers up 60.5 million articles and is active in over 318 languages, with over 100 being tested to see if they have viable communities from which to launch their own chapter on the site.
While its charitable status (via The Wikimedia Foundation) means that Wales is not exactly a tech billionaire, the scale of the site’s popularity has garnered his foundation a level of influence that few can dream of.
The foundation’s advocacy page details how it protects users’ privacy and speech, educating government and policy makers around the world about the importance of the free and open internet.
Last year, following the Russian invasion of Ukraine, the foundation worked with its partners when sanctions were imposed against the Russian government to ensure that Russian people could continue to access the site.
It often goes to court often to fight against censorship and promote open licenses to ensure its materials can be used and revised globally.
In 2012 the site blacked out its English-speaking pages in protest to a proposed change in US copyright laws which, it argued, would inhibit people’s access to online information.
Wales recalls: “The US House of Representatives phone system crashed that day and they immediately backed down.”
Wales is explaining the wikimodel to a crowd of software developers at OpenUK’s first State of Open Conference 2023 at QE II Centre in Westminster earlier this month (where we meet) – a community to which he has leant much support to over the years.
Wales attributes the site’s enduring popularity over 22 years to its singular vision of offering the world ‘the sum of human knowledge’ through an open community of volunteers, rather than jumping on the latest tech bandwagon.
Besides the occasional but now familiar vision of Jimmy’s face popping up to ask for donations, there are no pop-up ads, there’s no exploitation of user data. There’s no tracking of who reads the site’s pages.
The non-profit’s revenues, mainly annual charitable donations, were in the region of $150m in 2021-22. According to Wales, any profits are invested back into software development and site operations; accounting, finance; legal, comms.
Given the site’s traffic, Wikimedia Foundation has a relatively low staff count of 700 – none of whom curate, write, edit or moderate the content.
That part is left to an international community of so-called Wikipedians – an anonymous open community of volunteer users and editors who make almost all the site’s publicly accessible content moderation decisions.
According to Wales, the foundation only steps in when there’s a site-wide problem. “For instance, sometimes there are trolls that go around. and start attacking hundreds of languages on Wikipedia.
“Each language would ban them but small languages can’t deal with it so we will step in and implement a total ban and issue the technical measures to support that,” he explains.
The only other circumstance in which the foundation will intervene is health and safety – “if someone is in immediate danger and something requires immediate attention”.
Open source under threat
Wales took care to detail exactly how the site works to open source developers attending the State of Open Conference, because it lays the foundation for the issue he really wants to talk about: that the open internet is “under threat”.
It’s not under threat from hostile states such as Russia, China, or, most, recently Pakistan – a nation whose censor blocked the website for about two days before the prime minister called for its reinstatement. But from countries “who ought to be leaders in openness – like the UK” he argues.
State of Open’s inaugural 2023 conference at the QEII in Westminster
“Misguided regulators working with little or no genuine understanding of the Internet or content moderation in the real world,” he adds.
Wales is making reference here to the UK’s often-bashed Online Safety Bill which is currently going through Parliament.
He acknowledges that the UK government’s troubled-but-well-meaning bill – which has passed through the hands of four prime ministers in as many years – is trying to hold big tech platforms like Facebook and Twitter to account.
The entrepreneur himself has been the subject of vile slander and abuse on Twitter, but he argues that the Online Safety Bill in its current form is harmful to the open internet and that the government’s “simplistic, top down approach” ignores the way that the wider web works.
While “Big Tech” firms like Facebook and Twitter rely on paid staff and flawed algorithms to moderate content, often behind closed doors, much of the internet operates under the same model as Wikipedia.
“For us, the community monitors and polices the content,” he claims. “The administrators close the debate or decide it. All their actions are transparent, and they are held accountable by the community. You can lose your admin rights if you are not doing it the right way. There are checks and balances, it’s not perfect, but it works well.”
Wikipedia’s community-based moderation could come under pressure from top-down content moderation policies that require a centralised actor, he fears.
Wales also notes that any legislation which only considers the models of the top four social media players also neglects smaller wiki players, those managed by communities and even other would-be social media competitors.
“Due to our size and power we’re likely to find a way to endure, which is true of a lot of other players in this, but we have to fight for the ecosystem and the things that we believe in. But I worry about the principles of the open internet for the future.
“For all the smaller organisations. If social media is regulated on the assumption that it is all like Facebook or Twitter, then you will impose on upstart competitors the rules that force them into that same model.”
And yet, it’s hard to argue against the idea of a safer internet. Also, like many who have criticised the bill, Wales does not attempt to offer up alternative proposals – instead arguing that many of the problems highlighted by the bill’s supporters are wider societal issues.
“The fashion industry and women’s magazines have promoted quite unhealthy ideas about the way young girls looks, but now it’s that’s being blamed on Instagram.
“But you can’t solve issues like this online until you think about our culture, which is not an easy answer for a legislator that’s trying to do something about this.”
For Wikipedia, one glaring ele-bot in the room is ChatGPT. The trendy generative AI can theoretically already scrape all of Wikipedia and several more sources to provide the same answers people are looking for.
Right now, Wales describes Open AI’s ubiquitous technology as “a lot of fun”.
He adds: “I was showing it to my daughter, and I typed ‘what’s the difference between Jeff Bezos and Jimmy Wales?’. First it gave me a really boring 200-word summary about two entrepreneurs.
“So, I typed ‘Do that again make it shorter and funnier’. And it said: ‘Jimmy Wales and Jeff Bezos are like a Wikipedia page and an Amazon page – one of them is free and the other is going to cost you.’”
He adds that Wikipedians are currently “very much in debate about what these tools mean for the community and whether it poses an opportunity or a problem for us.”
“A simple scenario is that if you can write something that looks a lot like a Wikipedia entry but that makes up facts, or citations, which it can do, then that’s a problem. We are still very far from an AI being able to just write Wikipedia from scratch. Because first of all they can’t even get it to tell the truth.
But AI could also have its uses for the Wiki man. “Suppose we have a short ‘stub’ article. For example, someone came to me and said that an article has been deleted because it’s just a stub. And I look into it and it’s a Black History topic of historical value on a conference that happened in the 1920s that led to the foundation of Black colleges and universities across America.
Wikipedians are debating whether ChatGPT is “a problem or an opportunity”
“I didn’t have time to write the article and the editor didn’t have access to the database I subscribe to that contained the information – but suppose I could say ‘here are 20 sources and here’s the text – please AI can you pull out some of the facts and write it in the Wikipedia format and then I’ll check your work’. That would offer a big productivity boost for us.”
For now, Wikipedia will stick to relying on people, not algorithms, to generate content. But for a tech company, there’s a surprisingly sparce amount of granular data on who this voluntary community comprises of. There’s also the question of whether all influence is equal – or whether some editors are more powerful than others.
In its own pages on the subject, English-speaking Wikipedia counts its number of registered users at about 45m although only a fraction – 130.5K – are considered ‘active’ – those which have edited pages within the last 30 days.
Online magazine Vice reports on an independent study by the Purdue University Data Storytelling Network which found that 77 % of articles were written by a hardcore one per cent of Wikipedia editors.
There’s also an unknown but relatively large number of unregistered users which the site has no information on at all. Wales frames this lack of internal data on the community responsible for generating content on the internet’s fifth most visited site as a positive thing – it is not trying to harvest user data.
“One of the broader approaches to the role they play in the community is that we gather very little data; Facebook knows exactly how old everyone is because they target ads at people, for example, but we have no idea,” he says.
The company does run surveys “from time to time” which provide some info on demographics – putting the number of male editors at about 80%. In a recent US Wikipedia survey, 89% of Wikipedia’s contributors described themselves as white.
These biases seem heavy for such an influential site, one that lobbies governments on the right to seek and share knowledge. If this knowledge is written predominantly by white men then how accurate a world view does it represent?
“Yeah, it’s not good. We do think it’s a problem and it’s been a focus for a few years but we’ve not managed to move the needle on this much,” he admits.
gender wage gap
Women generate only 20% of content on Wikipedia, according to its own survey
The founder acknowledges that this gender bias is also reflected in the way content is curated – and part of the reason for the male bias is that it grew out of the open-source community, which is still predominantly male.
This systemic bias is especially worrying in light of ChatGPT and other AIs – that might utilise the largest encyclopedia ever created as training data.
Last year Helen Pankhurst became the latest activist to call out Wikipedia on its gender bias in a Guardian article after analysing the lack of biographies of football players on Wikipedia that are women, in the light of the England football team’s success at the Euros.
She found there were more entries on Wikipedia about male football and footballers than there are about women in their entirety. Currently only 18% of content in all Wikimedia projects including blogs on Wikipedia are of women.
Internal projects trying ‘move the needle’ include Women in Red which aims to increase the number of female biographies and editors through edit-a-thons and other initiatives.
Another initiative, Project rewrite, takes a broader view, calling for the wider community of journalists, academics, thought leaders and individuals to increase their coverage of women, particularly women who are Black, Indigenous, and People of Colour.
The latter project’s webpage links to an inspiring interview with four female Wikipedians , but also points to areas of the site where harassment and abuse against women still thrives – despite community-based moderation.
Will these initiatives be enough to broaden the diversity of Wikipedia’s editors? The Purdue University study lead author told Vice that the key wasn’t broadening the number of editors but focussing on the people at the top, whom they claim are more influential.
But would tampering with this section of highly productive Wikipedians lead to – in the short term at least – a decrease in the level of content generation? It highlights one of the challenges in becoming more inclusive when all your contributors are volunteers, working for free, in their own time.
But the foundation could do more to monitor diversity and measure impact: data gathering that seeks to address equality by monitoring its progress, rather than commercialising it, isn’t an abuse of privacy if it’s collected in the right way.
Wikipedia has proven an invaluable resource in my own work over the years, but the foundation needs to start treating its site-wide diversity issue with the same urgency as it does when it lobbies governments on the thorny issue of regulation.
Because no matter how beloved, if a site starts to reflect a world view that’s possibly 20 or so years out-of-date, it will start to feel irrelevant, antiquated.
And no one wants to see Wikipedia banished to some dusty, far-flung corner of the internet – like those World Book Encyclopedia for Children tomes that may, or may not, reside in the Wales family’s storage shed.