Use Open Data for Library Research Support: A Glance

The Linking Open Data cloud diagram (updated 2017-01-26)

Open data often used by researchers fall into two categories: open government data and scientific data. Open government data is a tremendous resource shared by U.S government and some foreign governments committed to open government movement, as well as international organizations.  Science data is another major category of open data. Driven by the requirements of funding agencies, the science community started to share research data in the 1950s, which has accumulated a rich collection of data for share and reuse purposes. Of course, there are overlapping areas between these two categories, such as climate data and GIS data.

Open Data Opportunities

As more open data is available to the world, it starts to play an increasingly important role in academic research. Being able to find and use this data brings great opportunities to academic research.

First, open data offers rich information for research purpose.

  • The large scale of government data opens a wide range of information covering maps, land, statistics, budgets, spending, business, legislation, transport, trade, health, education, crime, environment, election, contracts, etc. U.S as well as other open government participants, OECD member countries, and some other developing countries in Asia and South America are to some extent (at least trying) to open part of the government data for the purpose of facilitating economic development and better understanding of the societal needs.
  • The rich science data collected in the past 60 years and beyond provides a valuable resource for research collaboration cross disciplines, countries and races. Good examples are share of biological and medical data for cancer treatment research, and share of climate data for the research of global climate change.

Second, centralized information about data access lowers the challenge for people to find data. For government data, there are online data access portals for users to browse and search data by industries ( for U.S, for U.K, for France, for Germany, etc). For science data, there are plenty of data repositories hosting open data for a variety of subjects. For instance, Nature website has a list of recommended data repositories for science data.

Third, the advancement of data manipulation tools are getting easier to learn, and therefore, greatly empowers people for data work, especially some open source free tools, including R, D3, etc.

Open Data Challenges

With the abundance of open data resources available, there are still challenges finding and using open data.

  1. A considerable amount of open government data is spread out on Internet but not systematically indexed. Some of it is not even put at a publicly visible window for viewing and using. This is particularly true for state and local level government data;
  2. Many government data sources are not complete or updated timely;
  3. Some government data portals are not stably accessible (again, state or local level) due to irregular maintenance. Users would now and then encounter broken URLs, pages not loading properly, etc;
  4. Open data sources, both government data and science data, often have data quality issues, including machine readable, in bulk, and open licensed. These issues can cause viewer not being able to use it due to legal or technical difficulties;
  5. Science data are spread out. It is hard to effectively search if the user is not familiar with different repositories;
  6. Most data is not linked data, and therefore, sometimes causes ambiguity and incompleteness issues for reuse.

Tips for Library Support the Use of Open Data

Despite of these challenges, open data still offers great opportunities to the research community. As research facilitation units, academic libraries could seize this opportunity of helping researchers effectively search for and use open data.

Tips for searching data:

  1. If you are a subject librarian and familiar with open data sources in your field, great, just go and find it;
  2. If you are a subject librarian but not perfectly sure about the data for a specific project, try to go to some big national and international professional organizations’ websites. They usually have some leads for data sources;
  3. If you are a data librarian but not a subject librarian, an easy trick (not guaranteed success), just try “xxx open data” in a search engine. For example, “GIS open data”, “bioengineer open data”. There are usually some useful resources in the research results.

Open data is deposited online in a variety of data formats. After locating the data, the next challenge for many researchers, especially researchers without wide experience with online data work, would be to open it. There are two ways to find out the data format:

  1. Some files have data format in the file name. An example is: sampledata.rdf. Here “rdf” is the file format;
  2. If the data format is not shown in the file name, users can right-click the file name, and then
  • Property for Windows
  • Get Info for Mac

There file format could be found;

  1. Go online search for file reader, for example “rdf reader”. There you should find a variety of applications that can open, read, and even convert the file if needed. Download and install one to open the file.

The next step would be to clean and format data for use. Some open source tools are available for this step of data work, including OpenRefine and Data Wrangler for formatting.

Due to the nature of this post, I will not talk into details about these tools. Please feel free to comment this post or contact me if you are interested in this topic and would like to have a conversation about it. My contact information could be found at GSU library website.



