DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows users to ask expressive queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.

The DBpedia Dataset Edit

Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates, categorisation information, images, geo-coordinates and links to external Web pages. This structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content.

The DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages, 180,000 external links into other RDF datasets, 207,000 Wikipedia categories and 75,000 YAGO categories.

The DBpedia project uses the Resource Description Framework as a flexible data model for representing extracted information and for publishing it on the Web. As of September 2007, the DBpedia dataset consists of around 103 million RDF triples, which have been extracted from the English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian versions of Wikipedia.

The DBpedia dataset is available under the terms of the GNU Free Documentation License.

The DBpedia dataset is interlinked on RDF level with various other Open Data datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets. As of June 2007, DBpedia is interlinked with the following datasets: GeoNames, Musicbrainz, CIA World Fact Book, DBLP, Project Gutenberg, DBtune Jamendo and Eurostat as well as US Census data. See DBpedia website and W3C SWEO Linking Open Data Community Project for details about interlinked datasets.

Accessing the DBpedia Dataset Edit

The DBpedia dataset can be accessed using three different access mechanisms:

