Demonstrating Library Value: A Practical Application of Citation Analysis And Web-Scraping Techniques is a poster in the 2020 PNLA Virtual Poster Session. We encourage you to engage in discussion by leaving a comment on the page. The author of the poster will respond to comments the week of August 4-7, 2020.
Presenters: Laura Baird & Lynda Irons
Track: Academic Library
Abstract:To measure the Libraries’ impact on Pacific University researchers, we analyzed citations from their publications. By using the cited references of Pacific-published works from Web of Science, our discovery layer (Primo), browser automation, and scripting, we found that 78.0 percent of Pacific-cited works are available through the Libraries. Free or open source content accounted for 17.3 percent of availability, and 60.7 percent were provided by paid library subscriptions. Our analysis included top journals and databases used as well as the number of unique content in each database or aggregator. We will share our analysis techniques, findings, and practical applications.
Poster:
About the Presenters:
Laura Baird is the Systems and Applications Librarian. Her research interests include universal design and accessibility.
Lynda Irons is the Research and Instructional Services Librarian. Her research interests include assessment, demonstration library impact.
Laura/Lynda – Glad to see this here! I would love to continue our conversation from the Alliance meetings regarding how you might see this research integrating with your institutional repository.We also need to demonstrate value of our IR.
I have always wanted to do a citation analysis of dissertation/thesis works cited lists. Thoughts on how you might approach that with your research technique?
You bring up a good question. Applying a similar strategy to an IR could potentially have a high impact as well. The main challenge with a similar (automated) strategy for the analysis of IR works cited sections may be data quality and/or standards, which is likely to have more variety within and between institutional repositories. After the cited works list is created, we could reuse our strategy to check the availability of works.
We did consider an analysis, and were able to ingest PDFs, extract text data, and identify the works cited or reference section with a fairly high degree of accuracy using Ruby and various Ruby libraries. However, our IR accepts works with various formats and citation styles. When we attempted to parse citations (identify the start and end of the citation, identify the journal title, article title, and author, etc.), there was not enough consistency to accurately identify a work.
If we decide to pursue this in the IR we may use one of the following strategies:
We started migrating our IR right after we completed this citation analysis project, so we have not chosen one of these approaches, but I would be interested to hear from others if they have tried any of these or have suggestions for other strategies.
I sort of feel like IRs should backward design their design to support a use like this. Perhaps an IR should be just more then a place to deposit something but a platform to actually analyze and do some citation data/metadata research.
Very nice! I remember when I was doing institutional research regarding the previous university that I worked for via Web of Science and I discovered that there was quite a bit of variation in how WoS listed the university name in the “Organization-Enhanced” field or author info fields. This made finding all of our institutional references a bit tricky. Did you run into any problems like this? If so, what strategy did you take to alleviate the problem?
Thank you! It was a challenge in that many institutions have both Pacific and University in their organizational name, but are not affiliated with Pacific University, in Oregon. That was one of the reasons we decided to add terms related to campus locations, despite the risk that some publications might not appear in the search. After restricting the search, we did an informal comparison of results to determine whether the results had an appropriate level of specificity (not missing publications, but not including non-Pacific publications).