Skip to content

Help

Where is the data sourced from?

The aim of OrgRef is to aggregate relevant data which is free and open from a variety of online resources, but initially almost all of the OrgRef data has been sourced from either Wikipedia or ISNI, via a combination of automated crawling and manual spot-checking.

The dataset has not been researched and compiled 'by hand', nor will that happen in future. The aim of the project is to collect and share existing open data which is already "out there", not to compile a new resource from scratch.

How many records are there?

There are currently over 31,000 records in the OrgRef dataset, and we expect this to grow gradually over time.

The aim of the project is not to be completely comprehensive, but to share information about the most significant organizations which are involved with academic content.

It's also worth bearing in mind that, the academic journals market might not be as big as you think!

What if I find a mistake?

The project remains 'in beta', so you might discover errors in the data, important organizations which are missing, or irrelevant organizations which are included.

Please if you do find any issues, to help us to improve the quality of the dataset over time. The more feedback we receive, the better we can make OrgRef, for everybody's benefit.

Remember that the majority of the dataset has been derived from Wikipedia, so if a mistake in OrgRef reflects an inaccuracy in Wikipedia, please consider correcting it within Wikipedia. The OrgRef crawler is re-checking records regularly, so any corrections will be picked up automatically within a matter of weeks.

Why isn't more information included?

We're launching OrgRef with an initial set of core fields (listed below), but we are planning to add more fields of information in future.

For example, we are already working on harvesting city information and state/county names for non-US records, and on classifying each organization (as 'University', 'Hospital', etc.). We'll release those fields as soon as we think they are good enough to share.

If you think there is a particular type of information which it would be useful to add - and which is available online from a free and open resource - then please .

Will the dataset be kept up-to-date?

Yes - DataSalon are committed to managing and improving the OrgRef dataset long-term. The latest version will always be made available via the Download page, and updated versions will be published at least once per month.

Which fields are in the dataset, and what do they mean?

The following table documents each field in the OrgRef dataset:

Name The name of the organization. The precise name form used is in most cases derived from the relevant Wikipedia article title.
Country ISO country code.
State United States state code. Note that this field is currently only populated for US organizations. We are working on adding state information for other countries.
Level This states 'Org' for primary organizations, 'Grp' for groupings of organizations (e.g. systems and consortia), or 'Sub' for subsidiaries (e.g. departments and faculties). Note that linking information for related organizations in the dataset is not yet available.
Wikipedia English Wikipedia page link, where known.
Wikidata Wikidata page link, where known.
VIAF VIAF page link. Note that not every organization in the dataset has a VIAF link, but we are working on improving coverage.
ISNI ISNI page link. Note that not every organization in the dataset has an ISNI link, but we are working on improving coverage.
GRID GRID page link. Note that not every organization in the dataset has a GRID link, but we are working on improving coverage.
Website URL of the organization's own website, where known.
ID Unique ID for each record in the OrgRef dataset. In many cases we have re-used the numeric ID from Wikipedia as the OrgRef ID, although that is not always the case, and it is not safe to assume that the OrgRef ID and the Wikipedia ID always match.