NERCOMP: Institutional Repositories: How are They Evolving
Sept. 26, 2006
Speakers discussed five digital repository experiences/tools:
* ContentDM at Mt. Holyoke
* DigitalCommons at UConn
* DigiTools at Brandeis
* dSpace at Harvard Science Libraries
* Fedora at Tufts
Some overall conclusions:
* Institutional is correct: for the most part the collections grew from small funded projects where the emphasis was on local creation and access. Though the results are on the web there is little regard for connections to other institutions’ collections, cross-collection access, or tie ins to other possibilities.
* Repositories is also correct: most collections were built as a storage facility for individual objects. Search within a collection, find an object, look at an object. Very little concern about connections between multiple objects within or without the collection. Some OAI implementation but the model was very bricks-and-mortar library-like.
Attendees appear to have been mostly library folk with some CIO/Acad Computing directors as well. A handful of “techies.”
1) Content DM, Bryan Goodwin, Mt. Holyoke
Original project created for Western Civ. course. Combines ArtStor with in-house scanned objects. Objects downloaded from ArtStor so they can produce combinations of objects for specific classes.
Specific course metadata is added to make that happen, for example, they have created a field for “week one” “week two” etc. They didn’t immediately standardize on format: some items tagged “week 1” “week two” etc. oy.
ArtStor had promised stable URLs but then changed the URLs without notification due to internal political pressure. ArtStor images are jpgs but their metadata is stored as HTML. The project team had to cut and paste each objects elements into ContentDM.
2) Digital Commons/ProQuest, Jonathan Nabe, Agriculture and Natural Resources Librarian, UCONN
Outsourced their collection efforts to ProQuest because they didn’t want to get involved in server maintenance.
Pros: training, support, it works as described
Cons: have to do updates in multiple places, uses proprietary programming language, poor reporting,
Big Con: really limited metadata (not even Dublin Core): seems to be only author, title, random uncontrolled keywords, abstract
Promised enhancement: RS feeds, LDAP authentication, personal research pages…real soon now…
3) DigiTools (ExLibris), Susan Pyzynski, Brandeis
Began with a $250,000 IMLS grant to digitize the Daumier Lithograph Collection. Despite the $$ they didn’t want to use an Open Source product because they did not want to devote additional tech support to it. They already had ExLibris’s Aleph product. (They had heard that some organization had made so many customizations to dSpace that they couldn’t upgrade to the next version.)
DigiTools is a cataloging program with thumbnails tacked on. So it allows for great metadata, good searching, but limited display and doesn’t do much else.
Upcoming version 3 promises to be more like a robust digital repository program.
Pros: metadata expressed/stored in XML, robust rights/security/access, can now support complex objects (ex: video running in one window, transcription running in the other), LDAP authen., can set up groups, students by course (manually, not tied to SIS)
Cons: No Mac entry client, no usage tracking, only depositor or staff–no robust entry divisions
v. 3 may incorporate SCORM, may have WebCT tie ins
4) dSpace, Michael Leach, Physics Research Library, Harvard
Goal: “support the transformation of scholarly communication” (Hooray–someone thinking outside the institutional silo!)
* test a new model of participation
* 5 digital objects: articles (pre-, post-, and current-print); theses; video (streaming and static); serials; datasets; learning objects, broadly defined
* study implementation of Dublin Core
* study work flow and work load analysis
* test scaling
* research interface
* user needs analysis
* marketing and publicity
* policy development
* $$ Open Source
* accepted by libraries
* collections-based, not subject-based (they wanted to mirror Harvard’s highly decentralised dept. structures)
* integrated with what they have (Apache, Tomcat)
* metadata support (plus possible METS support in future)
Faculty response: all over the map (“but I already have my stuff on my web site”), but archival persistence became a big selling point along with search capabilities
Publisher response: trying to integrate dSpace with publishers sites so when faculty submit to publisher a simple button click will also dump materials into dSpace. cool.
Policies: they are working on these up front. What faculty rights? what publisher rights? etc.
Work flow issues: will be published soon (and request that other dSpace users publish theirs), also realigning some tech positions to accommodate
Training: working on best practices
Metadata: expanding what is “in the box”
Tech Challenge: unresolved firewall/handles issues
Other: looking for Harvard consolidation, consortial opportunities, may outsource it to BioMed Central after all
5) Fedora, Eliot Wilczek, Tufts
Fedora is obviously the most sophisticated in terms of concept. Generally its strongest point is its weakest: it is not a system but an architecture. You can swap out your own front-end, management, storage, dissemination, access, validation, and search functions always assuming you already have them.
It’s repository “plumbing.”
One surprise: they’ve gone from METS to FOXml, their own system.
Policy Issues: using a matrix approach, from highly formatted, top level, top accessibility objects to simple storage of wacky formatted objects.
Sustainability: won’t happen without institutional commitment
Only Fedora and dSpace guys were familiar with TEI
DigiTools 3 and now dSpace might go with METS
Check out what Coalition for Networked Info is doing (Clifford Lynch)
Should UVM consider a “grey literature” repository?