Wikidata gives wings to open knowledge

April 2019

How do voice assistants like Alexa and Siri know so much? How can search engines tell you the height of Mount Kilimanjaro (5,895 meters) so quickly and so accurately? Now more than ever, it is because they have access to more than 60 million open data records via Wikidata.

Wikidata is a project of Wikimedia, the non-profit organization that also runs the online encyclopedia Wikipedia. For six years, volunteer contributors to Wikidata have been structuring data so that it can be read and edited by both humans and machines.

This ensures that information can fly freely between the Web and other technology platforms. As more people interact with the internet, not just through the Web and websites like Wikipedia, but through speaking and listening to devices, this is becoming increasingly important.

Machines can understand Wikidata because it parses information you would normally read in a Wikipedia article into separate blocks. For example: “Paris is the capital city of France.” “Paris has a population of 2,206,488.” “Paris’ coordinates are 48°51’23.68″N, 2°21’6.58″E.”

By structuring this information and giving every entry a unique ID, Wikidata gives more than 5,000 websites, archives, libraries and databases a shared backbone: if you update one entry, other entries where the information is referenced will automatically be updated too, in every language.

Wikidata isn’t the only initiative to organize, or try to organize, the Web’s data. Similar projects have struggled due to its vastness. So what makes Wikidata successful? “Community is the biggest asset for Wikimedia,” says Lydia Pintscher, Wikidata’s Product Manager. “Without our partners and contributors, and the people who use the data, it wouldn’t be there.”

Indeed, Wikidata’s community of tens of thousands of volunteer contributors have provided more than 850 million collective edits over the years.

Lydia_Pintscher_-_1 — Lydia Pintscher, 2015. Photo by Victor Hamilton Grigas. (CC BY-SA 3.0)

Wikidata is also unique in that it is a completely open public domain resource: Their application of the Creative Commons CC0 Public Domain Dedication to all of Wikidata’s data enables people and companies to use Wikidata freely and without copyright restrictions for whatever they like, from voice assistants to search engines.

For Wikidata, this open-access, no-citation approach means people benefiting from its information usually won’t know where it comes from — or, that Wikidata depends on volunteers and donations to conduct its work, including updates and quality controls.

For big tech companies offering services on top of Wikidata and other Wikimedia properties, it ups the duty for them to help sustain the resource for everyone. “As companies draw on Wikipedia for knowledge – and as a bulwark against bad information – we believe they too have an opportunity to be generous,” wrote Wikimedia’s executive director, Katherine Maher, in an op-ed in WIRED in 2018 calling for companies to pay back to the community.

Companies including Google, Amazon and others have met this call to varying degrees (Amazon naming Wikipedia as part of the reason for Amazon Alexa’s success) but the vast majority of Wikimedia’s resources come from donations by more than six million individuals who on average give $10 USD. In 2018, only 4% of funding came from corporations.

For the health of the internet, open access to knowledge and information is essential. For institutions, companies, organisations and individuals with small and large data sets to share with the world, Wikidata is where it can really grow wings.

There is no access to plenty of data, I agree.

But what seems to be missing perhaps even more is a technology to get the best view of the *readily accessible data*.

In the light of the damage seen everyday, caused by fake news and misinformation, what humans probably especially need, is a 'no-barierrs-to-use' solution which would aid them in reaching the *best knowledge* on important public matters, and break out of the closed sources of narratives they grew dependent on.

The way I see it, is that web browsers, being, after all, "user agents", should always strive to keep their users best-informed. So, especially if a piece of information that the user is reading (in the simplest case, a sentence making a claim) has been successfully criticized in another source, the user agent should made him aware of that.
Obviously, the full complexity of criticism poses plenty of questions on how to deal with the long chains of counter-arguments, find 'the real truth' and bring it to the top of the rankings, and many more, but I think this would be a nice problem to have.
For now, i see be nothing wrong with providing the building blocks that get us there:

***
Try to devise a web standard for cross-referencing parts of the unstructured web content - especially parts of historical publications, and especially for the purpose of highlighting contentious information, and connecting it to the best information available presently.
***

Sadly, I am not aware of any public activity to give that, while the social media giants disappoint (especially because they are the most capable software houses in the world) by moving away from the openness, but resorting to the most primitive tool - censorship.
It could be that it is simply too big of a task even for them, but I tend to think that the apathy has something to do with the fact that a data model where publications and critique are next to each other could mean a suicide, as it largely obsoletes their main revenue stream - advertising.

What data do you wish the world had more access to?

Slawomir Brzezinski · 5 years ago

There is no access to plenty of data, I agree.

But what seems to be missing perhaps even more is a technology to get the best view of the *readily accessible data*.

In the light of the damage seen everyday, caused by fake news and misinformation, what humans probably especially need, is a 'no-barierrs-to-use' solution which would aid them in reaching the *best knowledge* on important public matters, and break out of the closed sources of narratives they grew dependent on.

The way I see it, is that web browsers, being, after all, "user agents", should always strive to keep their users best-informed. So, especially if a piece of information that the user is reading (in the simplest case, a sentence making a claim) has been successfully criticized in another source, the user agent should made him aware of that.
Obviously, the full complexity of criticism poses plenty of questions on how to deal with the long chains of counter-arguments, find 'the real truth' and bring it to the top of the rankings, and many more, but I think this would be a nice problem to have.
For now, i see be nothing wrong with providing the building blocks that get us there:

***
Try to devise a web standard for cross-referencing parts of the unstructured web content - especially parts of historical publications, and especially for the purpose of highlighting contentious information, and connecting it to the best information available presently.
***

Sadly, I am not aware of any public activity to give that, while the social media giants disappoint (especially because they are the most capable software houses in the world) by moving away from the openness, but resorting to the most primitive tool - censorship.
It could be that it is simply too big of a task even for them, but I tend to think that the apathy has something to do with the fact that a data model where publications and critique are next to each other could mean a suicide, as it largely obsoletes their main revenue stream - advertising.
Jamshed · 5 years ago

Genuine and authentic data on markets, business, technological advances and last but not the least world politics without political interference.

View more comments

2 comments