GW’s Expert Finder
On Feb 1, GW’s Expert Finder launched. Expert Finder is an implementation of VIVO, a researcher discovery platform. The project is a collaboration between the Division of Information Technology and GW Libraries. As one of the software developers on the project, I want to take this opportunity to discuss some noteworthy aspects of our implementation. In particular, we have made some choices that emphasized rapidly deploying VIVO using a minimal amount of resources.
In any VIVO implementation, data is aggregated from a number of existing sources, including campus information systems, researcher information systems, and human resource systems. At GW, the majority of our data comes from Banner, our campus information system, and Lyterati, a faculty management system.
With this sort of data, various quality issues are to be expected. At GW, our data is no different. Here are some examples:
- My position is listed as “Uv Librarian FT” (a full-time librarian, I think) in “UN LIB TECH/RESEARCH SRVCS” (Gelman Library).
- My alma mater, Amherst College, has entries for: Amherst College; Amherst College, Amherst, Massachusetts; Amherst College, Amherst, MA; and Amherst College, Amherst MA. You can imagine how many variations there are for GW!
- The full citation for a journal article is “The Contraceptive Mandate and Religious Liberty. Pew Forum on Religion in Publick Life. 2013.” Notice the misspelling and lack of co-authors, issue, volume, pages, and a DOI.
VIVO implementers employ a variety of strategies to mitigate data quality issues. These strategies include:
- Fixing the existing data source so that it produces cleaner data.
- Creating a new data source. For example, many VIVO implementers also implement Symplectic Elements to provide a reliable source for faculty publications.
- Have researchers correct their own data either in the source system or directly in VIVO.
- Data cleansing and normalization, both manually performed by a person and automated by software.
More than data cleansing and normalization, many VIVO implementers also perform data enhancement. Data enhancement involves collecting additional data to improve upon the existing data. So, for example, some institutions get a complete citation for journal articles by looking them up in Crossref or disambiguate article authorship using Harvard’s Disambiguation Engine.
For GW’s implementation, we have also employed some of these strategies to mitigate data quality issues. In particular, we have:
- Worked with our partners at Entigence to make changes to Lyterati that encourage cleaner data entry by faculty.
- Placed a heavy emphasis on researchers correcting their own data in the source system.
- Created an interface for non-faculty staff to enter some of their data.
Notably absent from this list is data cleansing, normalization, or enhancements. Other than some minor fixes like the format of phone numbers, the data is loaded exactly as received from the source systems. This is deliberate, as the project charter for Expert Finder specifically excludes “cleanup of data.” The strategy of not performing data cleansing, normalization, or enhancements trades off a significant savings in resources (both for initial implementation and on an ongoing basis) for some data quality.
We use the strategy of suppressing some data in the VIVO interface to partially compensate for not cleaning or normalizing the data,. In particular, we remove links and lists where the less clean data would be revealed and/or prevent VIVO from working properly. So, for example, publications are listed on a researcher’s detail page, but the publications are not linked to the publication’s detail page. Also, a user can get to the list of organizations, but the lists of people, research, and events has been removed. This approach does a reasonable job of presenting the data for a researcher, but obviously at the expense of reduced functionality for discovery.
Also worth drawing attention to is our approach for non-faculty staff. GW faculty publication, education, and funding is collected in Lyterati, but there is no existing system for non-faculty staff (like us librarians). Rather than create a new system for this, we are asking non-faculty staff to create and populate ORCID records. Using orcid2vivo, we then retrieve this information using ORCID’s API and load into VIVO. The data from ORCID tends to be very high quality. For an example of a researcher detail page with data from ORCID, see https://expert.gwu.edu/display/justinlittman.
These approaches have allowed GW to rollout our Expert Finder in a relatively short timeframe (about a year) with a minimal amount of resources (part of a project manager and business analyst and a fraction of 3 software developers and a sys admin). We look forward to feedback from the GW community, as well as the opportunity to assess Expert Finder based on actual usage by users. This will allow us to make adjustments as necessary so that Expert Finder can showcase the expertise at GW and enable collaboration.